True ML Talks #8 - ML Platform @ Intuit

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

We are back with another episode of True ML Talks. In this, we dive deep into Intuit's ML Platform NumaFlow, and we are speaking with Vigith Maurice.

Vigith is the principal engineer at Intuit for the AI ops platform. As anyone who has used TurboTax, Credit Karma, Mint, QuickBooks and Mailchimp, Intuit is the global technology platform that helps you achieve financial confidence.

📌

Our conversations with Vigith will cover below aspects:
- ML Usecases in Intuit
- NUMA Approach for Real-time Anomaly Detection
- Insights on Argo Workflows
- Deploying ML Models with Kubernetes and Numaflow
- Retraining systems, Numaflow vs Flink
- Security and Compliance Measures in AIOps at Intuit
- MLOps vs AIOps
- Teaser for Vigith's KubeCon Presentation

Watch the full episode below:

Usecases of ML @ Intuit

The Operational-Oriented ML Use Case

Intuit's AIOps team has a different use case for ML, which is focused on the operational side of the business. This use case aims to detect and resolve platform problems quickly, reducing the mean time to detection and resolution. Some of the aspects of this use case include:

Building an Operational Data Lake: Vigith's team built an operational data lake that collects real-time data from every layer of the runtime system, with a focus on pure non-PII anonymized metrics.
Real-Time Analysis: The team analyzes this data with a latency of less than a minute to detect and create alerts or incidents based on the severity of the problem.
High Throughput and Low Latency: The ML approach used by Vigith's team is different from traditional customer-oriented ML, as they deal with a high scale of 250 Kubernetes clusters and a billion events injected daily for analysis and processing.
Predicting Anomalous Events: The team's system follows the data mesh principle and schematizes the entire system to provide a unified approach for analyzing data at scale, helping them to predict anomalous events from resource CPO, security, and other areas

The Customer-Oriented ML Use Case

At Intuit, several ML use cases focus on improving the customer experience. Some of these use cases include:

Fraud Detection: Using ML algorithms to detect fraudulent activities, such as identity theft, fake invoices, and phishing scams.
Document Scanning: Using ML models to scan documents and extract important information automatically, such as receipts, invoices, and tax forms.
Forecasting: Using ML techniques to predict future trends, such as sales, demand, and revenue.
Document Search: Using ML algorithms to improve search accuracy and relevance, making it easier for customers to find what they are looking for.

‍

‍Intuit’s Radhika Kannan on how the company is leveraging AI to enhance customer experience

As a global technology platform company, Intuit helps customers and communities overcome their most important financial challenges.

Analytics India Magazine Amit Raja Naik

‍

Building a Scalable Platform for Real-time Anomaly Detection: A NUMA Approach

Real-time anomaly detection systems need to process massive amounts of unbounded data streams. Traditional machine learning (ML) systems work on a request-response model where the payload is processed to produce a prediction. However, a real-time anomaly detection system requires an asynchronous, directed acyclic graph (DAG)-based pipeline that can handle different data formats and language-agnostic operations.

Intuit built a scalable platform for real-time anomaly detection that uses a NUMA (New, Unique, and Mature Architecture) approach. The NUMA approach includes two parts: Numalogic, a set of models that have been vetted and used every day, and the NUMAflow platform, which runs the Numalogic models.

The DAG-based pipeline in the NUMAflow platform includes a source (an unbounded data stream), vertices (language-agnostic operations), and a sink (anomaly score output). The pipeline includes a pre-processing step for feature engineering, an inference step, and a post-processing step to normalize scores to a human-readable format.

The platform is highly scalable and cost-efficient, using load calculations to determine the number of processing units required. The system can scale up or down to zero processing units based on the volume of incoming data. The platform is built to handle node and port migrations, auto-scaling, and system failures.

Overall, the NUMA approach and the NUMAflow platform provide a highly efficient and effective solution for real-time anomaly detection systems.

📌

Architecture for scaling down to zero in AIOps systems:
AIOps systems require the ability to scale resources up and down based on the amount of data being processed in real time. In order to achieve this, the scheduling logic and the data processing logic are separated. This is done by deploying a custom Kubernetes controller which has an inbuilt auto-scaling algorithm. This algorithm is able to understand the processing rate of a vertexand the time taken to process a message and uses this information to automatically adjust the resources allocated to the system.

The use of a custom controller is different from the native Kubernetes Horizontal Pod Autoscaler (HPA), which is not able to scale down to zero. By using a custom controller, the AIOps system is able to scale down to zero when it is not processing any data, which helps to avoid unnecessary resource wastage.

To enable independent scaling of each vertex, the system uses a buffer between two vertices. This buffer helps to ensure that the data is processed efficiently and allows each vertex to be scaled independently based on its specific requirements. This is important because different processes in an AIOps system may have different resource requirements, and scaling them independently helps to optimize resource usage.

One of the coolest feature is the ability to scale down and it's a must for us. - Vigith

Open Source Ecosystem and AIOps: Insights on Argo Workflows

Argo workflows have become a popular tool for managing machine learning workflows, with Intuit contributing significantly to its development. The success of Argo lies in its open-source nature, allowing for feedback and contributions from users worldwide. By opening up the software, ideas, and innovations flow in from the community, enabling Intuit to improve its solutions based on the users' feedback.

When compared to other DAG orchestrators like Airflow, Argo is suitable for training tasks but is batch-oriented. Users requested an equivalent system that could handle streaming data. Intuit responded by creating Numaflow, a streaming-oriented system. The two systems, Argo and Numaflow, can merge to create an always-on inference system for real-time data processing. With Numaflow, the company has re-architected the Argo system to incorporate more features and improve its functionality. The open-source approach has proven to be beneficial for Intuit and the entire community, enabling a collaborative effort to improve AIOps workflows.

You can read more about Argo Workflows here:

‍

‍Argo Workflows - The workflow engine for Kubernetes

Apache Flink

‍

Deploying ML Models with Kubernetes and Numaflow

Deploying machine learning (ML) models with Kubernetes and Numaflow can be a challenging task, especially considering the latency and traffic patterns that vary significantly. Intuit uses an unique serving system used in the operational AI ops platform. When data is received, the inference process is similar to any other user-defined function (UDF), regardless of whether it involves protobuf to data conversion or inference. Numaflow provides an SDK for different languages, with Python being the most complex to support because of its behavior at high throughput, which requires multi-process and procedural Python. For other languages, this is not an issue.

To create a handler function, the user only needs to write a function that specifies how to handle a message provided by Numaflow. The function takes a message and returns a flat map, which serves as input and output, respectively. The signature of the function applies to any vertex, no matter the task.

When it comes to models, they are pulled and cached based on the problem statement. A message is received, processed, and returned as inference, which is pushed to the next vertex. Depending on the use case, the model can be stored in different ways. For high throughput, heavily decentralized architecture, a key is used. For centralized architecture, a reference is put in DynamoDB to S3. In general, the goal is to simplify the process for an ML engineer, who only needs to change the class name, as the rest is abstracted out.

The platform uses gRPC instead of REST, and depending on the problem statement, a combination of techniques is used to manage the model lifecycle. MLflow is used to manage the lifecycle when it's suitable, while other techniques are used for a more decentralized architecture where MLflow is not an option. The key takeaway for an ML engineer is to write a handler function that takes input and output and lets the system take care of the rest.

You can read more about Numaflow here:

‍

‍Numaflow - Data/streaming processing platform on Kubernetes

Apache Flink

‍

Retraining systems, Numaflow vs Flink

The retraining system used by Numaflow varies depending on the use case. For more complex cases with 20 requests per second, Numaflow deploys a full-blown Argo workflow with multi-steps to fetch data and update the model store. For lighter systems, Numaflow uses a User-Defined Function (UDF) that executes a function to achieve the desired outcome.

Difference between Numaflow and Flink

Processing Speed: Numaflow prioritizes decoupling message processing speed from latency, while Flink focuses on high throughput with low latency, making it better suited for high-throughput data processing.
This difference in throughput is due to the fact that Numaflow is designed for heavy number crunching and input/output (I/O) intensive activities, while Flink is better suited for high-throughput data processing.
Data Serialization Format: Flink uses its own efficient and well-defined serialization format, while Numaflow uses a black box approach that makes it difficult to define hashcodes and equals for efficient message storage and retrieval.

You can read more about Apache Flink here:

‍

‍Use Cases

Use Cases # Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive features set. Flink’s features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency g…

Apache Flink

‍

Security and Compliance Measures in AIOps at Intuit

Intuit has strict security measures in place, including application-level encryption algorithms.
The AIOps system at Intuit follows a watertight compartmentalization approach, with each namespace being isolated and encrypted with TLS for data at rest and in transit.
The AIOps team at Intuit follows the security principles of Argo, an open-source project under CNCF, for encrypting data at all layers, including metrics endpoints.
The AIOps system for customer data at Intuit has even tighter security constraints, with well-audited and well-kept data that even users cannot access. Operational data is decoupled from customer data for this reason, but security measures are still in place.

MLOps vs AIOps

Machine Learning Operations (MLOps) and Artificial Intelligence Operations (AI Ops) are two terms that are often used interchangeably, but they actually have distinct principles and processes.
MLOps primarily focuses on managing the model lifecycle, whereas AI Ops is more centered on the operational domain.

In AI Ops, we typically use technologies like HyperLogLog and latency-based sketches, which are designed to work with operational data. These technologies can have error percentages of around 0.89 and allow for approximations. We also rely on statistical significance to detect and isolate problems, with the goal of reducing Mean Time to Resolution (MTTR).

In contrast, MLOps leverages different technologies like ML Flow and other heuristics to manage the lifecycle of a model. At Intuit, they've also developed patterns like future management to optimize the model lifecycle. Their goal in MLOps is to manage the entire model lifecycle, from training to deployment, monitoring, and optimization.

Teaser for Vigith's KubeCon Presentation: Customer-Centric AI Ops with Anomaly Detection

Vigith's upcoming presentation at KubeCon is all about customer-centric AI ops and anomaly detection. The focus is on alerting based on the customer's experience rather than the system's, which means building complex dependency graphs based on tracing data and isolating anomalies rather than just detecting them.

The platform uses a collection of dimensions and metrics to perform composite key anomaly detection on time series data, allowing for pinpointing anomalies at a very specific level. The aim of this project is to provide a generalized solution for anomaly detection, making it a "Do It Yourself anomaly" system.

Vigith's presentation will showcase the platform's capabilities and demonstrate how it has been successfully implemented at Intuit for AI ops. Don't miss this opportunity to learn about the latest advancements in customer-centric AI ops and anomaly detection.

Read our previous post in the TrueML Series

‍

‍True ML Talks #7 - Machine Learning Platform @ Edge

In this blog, we dive deep into Edge’s ML Platform. Understand their ML architecture, how ML is used in the talent acquisition industry.

TrueFoundry Blog TrueFoundry

‍

Keep watching the TrueML youtube series and reading the TrueML blog series.

TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.

Discuss About your ML Pipeline Challenges with us here

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now