True ML Talks #16 - Machine Learning Pipeline @ Digits

By TrueFoundry

Published: August 6, 2024

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

ML Usecases in Digits

Digits is a financial management software company that uses AI to automate accounting tasks for operators. By automating tasks such as transaction classification, outlier detection, and fraud detection, Digits helps operators to double their customer base and improve their response time to customers.

Customer onboarding: Digits uses AI to help new customers set up their account and connect their bank accounts quickly and easily.
Transaction classification: Digits uses AI to automatically classify transactions, saving accountants time and ensuring accurate categorization.
Outlier detection: Digits uses AI to detect outliers in transactions, helping accountants to quickly identify and investigate unusual transactions.
Reporting: Digits provides accountants with a variety of reports generated using AI to save time and get the insights they need quickly.

ML Models Used by Digits:

Classification models: Classify transactions into different categories, such as meals, travel, and inventory.
Prediction models: Predict future outcomes, such as customer churn and fraud.
Generative models: Generate text, such as questions to ask customers and messages to send to customers.
Similarity-based models: Find similar patterns in transactions and mimic those patterns.

ML Journey at Digits

Digits needed to move to deep learning and NLP models to address the challenges of accounting subjectivity. Also Digits had a strong foundation in data engineering and Kubernetes, which would be essential for building and scaling a successful ML platform.

The team began by introducing TFX for ML pipeline orchestration and TF Serving for model serving. This allowed Digits to build and deploy ML models in a scalable and reliable way.

Next, The team focused on developing similarity-based pipelines. These pipelines are able to accurately classify transactions and identify outliers, even when the data is ambiguous or incomplete. This is because similarity-based pipelines find similar patterns in transactions and mimic those patterns. This approach is more effective than using global machine learning models, which can give inconsistent results depending on the accountant's interpretation of the data.

Digits' ML pipelines are now used to power a variety of features, including transaction classification, outlier detection, and fraud detection. As a result, Digits is able to provide its customers with valuable insights and help them to automate tasks, improve accuracy, and save money.

Orchestrating ML Training on Kubernetes

Digits' approach to ML training is well-organized and efficient. The use of Kubernetes for orchestration allows Digits to scale its training operations up or down as needed. The use of TensorFlow Transform for preprocessing and the training platform in Google Cloud projects provides Digits with the tools and resources it needs to train complex models quickly and efficiently. The use of a validation set and a model registry ensures that Digits is shipping high-quality models to production.

Digits orchestrates ML training on Kubernetes using the following steps:

ETL process: Digits uses an ETL process to gather artifacts from around the system and bootstrap datasets on a continuous basis.
Data validation and schema creation: Digits validates the statistics of the datasets and creates schemas.
Preprocessing: Digits uses TensorFlow Transform to preprocess the data.
Training: Digits trains the models in Google Cloud projects using the training platform.
Evaluation: Digits evaluates the trained models using a validation set.
Model registry: Digits ships the trained models to a model registry.
Deployment: Digits uses a CI/CD system to deploy the trained models to production.

Managing GPU Resource Allocation in ML Training at Digits

In the realm of GPU resource allocation for ML training, Digits employs a comprehensive approach involving both manual and automated procedures. This strategy encompasses:

Manual Processes: Digits sets clear GPU usage boundaries for teams and projects to uphold equitable allocation while averting overutilization. Additionally, it champions open communication among ML engineers, fostering resource awareness and mitigating conflicts.

Automated Processes: Digits maintains vigilance through continuous GPU usage monitoring, issuing timely alerts should usage exceed predefined thresholds to facilitate early issue identification and resolution. A queuing system ensures fair GPU allocation, adhering to a first-come-first-served basis.

Best Practices: Digits encourages ML engineers to plan GPU utilization proactively, ensuring resource availability and conflict minimization. Leveraging cloud resources provides flexibility, ensuring adequate GPU access even during periods of high demand. Promoting transparency in GPU utilization nurtures trust and cooperation among team members, ultimately enhancing resource management.

Utilizing TensorFlow Profiler for Training Run Analysis at Digits

At Digits, TensorFlow Profiler takes center stage in the analysis of training runs, providing valuable insights for optimizing ML models:

Digits diligently logs every training run through TensorFlow Profiler, allowing for the tracking of performance trends over time.

Vital metrics including training duration, memory consumption, and accuracy are meticulously tracked, facilitating meaningful performance comparisons across diverse models and configurations.

TensorFlow Profiler equips Digits with the capability to systematically compare the performance of various training runs, thereby assisting in the judicious selection of the most suitable model and configuration to address specific problem domains.

Benefits:

Improved Performance: TensorFlow Profiler identifies and addresses performance bottlenecks, leading to significant enhancements in training speed and accuracy.
Cost Reduction: Enhanced training performance reduces the overall cost of ML model training for Digits.
Increased Transparency: Detailed performance insights provided by TensorFlow Profiler enhance Digits' understanding of ML model training and help identify potential issues early on.

Optimizing Validation Sets for Similarity-Based ML Pipelines

When crafting validation sets for similarity-based ML pipelines, consider these key factors:

Objective: Define the model's objective—what constitutes similarity between data points? Once this objective is clear, the validation set can be populated with known similar and dissimilar examples.
Context: The validation set should mirror the model's real-world application. For instance, if the model recommends products to customers, it should include items customers often buy together.
Size: Strike a balance—your validation set should be statistically significant but manageable. A general guideline is to make it at least 10% of the training set's size.
Variability: To bolster the model's robustness, ensure your validation set encompasses diverse data points.
Operators Impact: The number of operators can bias the validation set towards specific industries. To mitigate this, incorporate examples from various verticals and industries.

Challenges and Optimizations of Similarity-Based ML Pipelines

Similarity-based ML pipelines have a number of unique challenges and optimizations, compared to traditional ML pipelines.

Challenges:

Choosing a loss function: There are a variety of different loss functions that can be used for similarity-based ML models. Choosing the right loss function is important for ensuring the accuracy and reliability of the model.
Structuring the training data: The way the training data is structured depends on the chosen loss function. It is important to structure the training data in a way that is efficient and effective.
Optimizing for performance: Similarity-based ML models can be computationally expensive to train. It is important to optimize the training process for performance.

Optimizations:

Use a GPU: GPUs can significantly accelerate the training of similarity-based ML models.
Profile the model: Profiling the model during training can help to identify bottlenecks and areas where the training process can be improved.
Preprocess the data: Preprocessing the data can improve the performance of the model and reduce the training time.
Reduce the input tokens: If using a language model, reducing the number of input tokens can improve the performance of the model and reduce the training time.

Digits Uses TensorFlow Extended and Vertex AI Pipelines for Similarity-Based ML Pipelines

Digits uses TensorFlow Extended (TFX) and Vertex AI Pipelines for similarity-based ML pipelines. TFX is a Google-developed, open-source end-to-end platform for building, deploying, and managing ML pipelines. Vertex AI Pipelines is a fully managed cloud service for managing ML pipelines.

TFX provides a number of components that are useful for building similarity-based ML pipelines, including:

TFX Data Validation: Validates the quality and consistency of the training data.
TFX Transform: Preprocesses the training data, including handling missing values, converting data types, and scaling features.
TFX Model Analysis: Evaluates the performance of trained models on a held-out validation set.
TFX Serving: Deploys trained models to production.

Vertex AI Pipelines makes it easy to run and manage TFX pipelines at scale. Vertex AI Pipelines provides a number of features that are useful for similarity-based ML pipelines, including:

Automatic scaling: Vertex AI Pipelines can automatically scale the resources used to run pipelines, based on the demand.
Monitoring and alerting: Vertex AI Pipelines provides monitoring and alerting features that can help to identify and resolve problems with pipelines.
Version control: Vertex AI Pipelines provides version control features that make it easy to track and manage changes to pipelines.

Digits Uses Vertex Endpoints for Model Registry and TF Serving for Productionization

Digits uses Vertex Endpoints for model registry and TF Serving for productionization.

Vertex Endpoints is a fully managed cloud service for deploying and managing machine learning models. It provides a number of features that make it a good choice for model registry, including:

Centralized management: Vertex Endpoints provides a central place to store and manage models.
Version control: Vertex Endpoints provides version control features that make it easy to track and manage changes to models.
Access control: Vertex Endpoints provides access control features that make it easy to control who can access and deploy models.

TF Serving is a high-performance, production-ready TensorFlow serving system. It provides a number of features that make it a good choice for productionization, including:

High performance: TF Serving can serve models at high throughput and low latency.
Scalability: TF Serving can be scaled to handle large numbers of requests.
Reliability: TF Serving is designed to be reliable and production-ready.

Digits uses CI/CD to automate the deployment of models to Vertex Endpoints. When a model is registered in the model registry, the CI/CD system is triggered. The CI/CD system then builds a TF Serving model and deploys it to a Vertex Endpoint.

Benefits:

There are a number of benefits to using Vertex Endpoints and CI/CD for productionization:

Scalability: Vertex Endpoints can automatically scale the resources used to serve models, which makes it easy to handle large numbers of requests.
Reliability: Vertex Endpoints is designed to be reliable and production-ready.
Automation: CI/CD automates the deployment of models, which reduces the risk of human error and makes it easy to deploy models frequently.

How Digits Detects Automatically When Models Need to Be Retrained

Digits uses a combination of techniques to automatically detect when models need to be retrained:

Monitoring model predictions: Digits monitors the predictions of models in production. If the predictions start to become inaccurate, this may be a sign that the model needs to be retrained.
Tracking model performance metrics: Digits tracks a number of model performance metrics, such as accuracy, precision,and recall. If these metrics start to degrade, this may be a sign that the model needs to be retrained.
Validating data snippets: Digits periodically validates data snippets from production. This helps to identify any data drift that may be occurring. If data drift is detected, this may be a sign that the model needs to be retrained.
Reviewing model outputs: Digits has an internal review platform where employees can review the outputs of models. This helps to identify any cases where the model is not making accurate predictions. If such cases are identified, this may be a sign that the model needs to be retrained.

Once Digits detects that a model needs to be retrained, it uses CI/CD to automate the retraining and deployment process. The CI/CD system builds a new TF Serving model using the latest training data and deploys it to a Vertex Endpoint.

Example:

The following is an example of how Digits' automatic model retraining process works:

A model in production makes a prediction that is inaccurate.
Digits' monitoring system detects the inaccurate prediction and sends a notification to the CI/CD system.
The CI/CD system triggers a new training job.
The training job trains a new model using the latest training data.
The CI/CD system deploys the new model to a Vertex Endpoint.
The new model is now used to make predictions in production.

The Importance of Collaboration Between ML Engineers and Designers

Machine learning (ML) engineers and designers often work in silos, which can lead to problems when trying to bring ML models to production. ML engineers may develop models that are accurate but not user-friendly, while designers may create interfaces that are visually appealing but do not collect feedback on model predictions.

To address these challenges, it is important for ML engineers and designers to collaborate closely. This can be done by:

Working together on product requirements: ML engineers and designers should work together to define the product requirements for ML models. This will help to ensure that the models are developed to meet the needs of the users and that the design of the interfaces is compatible with the models.
Sharing feedback: ML engineers and designers should regularly share feedback with each other. This will help to identify any potential problems with the models or the interfaces early on.
Creating feedback loops: ML engineers and designers should create feedback loops to collect feedback from users on the performance of the models and the usability of the interfaces. This feedback can be used to improve the models and the interfaces over time.

Advice for Building ML Platforms

Efficiency: Focus on building efficient MLOps pipelines for specific applications, such as those that require proprietary data or high levels of privacy and security.
API-first: Consider using pre-trained models from OpenAI, Anthropic, Bard, and other providers for generic tasks.
Consultation: Focus on consulting with other team members on how to use these APIs and solve specific domain-specific problems.

Generative AI @ Digits

Generative AI has the potential to revolutionize many industries. Here are some of the use cases for generative AI at Digits:

Boosting communication between accountants and operators: Generative AI can be used to generate estimates of questions and answers, which can save both parties time and effort.
Internal hosting of large language models: Digits has its own infrastructure for hosting large language models, which allows it to do so in a secure and privacy-oriented manner.
Using API-based access to generative AI models: There is potential for combining API-based access to generative AI models with similarity-based machine learning to provide a tremendous product experience.

There are privacy and security concerns associated with generative AI, and it is important to address these concerns in a responsible way. We as a community can find ways to develop and use generative AI in a way that is safe and beneficial to everyone.

Casting and model registry hosting landscape will change significantly in the coming years to accommodate the needs of large language models. - Hannes

Read our previous blogs in the True ML Talks series:

‍

True ML Talks #14 - LLMs, RL @CX Score Co-Founder

Deep dive into LLM and Reinforcement Learning. We talk with Ashwin, Co-Founder at CX Score about the trends in the LLM and ML space.

TrueFoundry Blog TrueFoundry

‍

Keep watching the TrueML youtube series and reading the TrueML blog series.

TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.

Discuss About your ML Pipeline Challenges with us here

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now