True ML Talks #4 - منصة التعلم الآلي @ Salesforce

Published: July 4, 2026

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

We are back with another episode of True ML Talks. In this, we dive deep into Salesforce's ML Platform, and we are speaking with Arpeet Kale.

Arpeet was part of the engineering team at Salesforce that built the entire ML platform. He is one of the founders of Builders Fund, where he and his colleagues invest and advise ML/AI companies across the world. And at the same time, he is the head of infrastructure at Skiff.

📌

Our conversations with Arpeet will cover below aspects:
- ML Usecases in Salesforce
- Salesforce ML Team Structure
- Overview of Salesforce ML Infrastructure
- Prototyping ML models at Salesforce
- Managing Costs for Large-Scale ML Projects in the Cloud
- Automated Flow for Moving Models
- Building a Multi-Tenant Real-Time Prediction Service
- Optimization of models for enterprise AI
- Security and Reliability Measures in Salesforce AI Platform
- ML Infrastructure Platform vs Software Deployment Platform

Watch the full episode below:

Why is ML Important to Salesforce

Personalized customer experiences → ML enables Salesforce to provide personalized customer experiences, as it allows them to analyze customer data and generate insights to improve customer interactions.
Automation of marketing campaigns → It helps Salesforce customers to automate their marketing campaigns by analyzing images, text, and social media data, allowing them to focus on their customer personas and optimize their marketing strategy.
Chatbots for efficient customer support → The Chatbots powered by ML help businesses automate customer support, which results in reduced wait times and lower costs for the business.
Identifying and mitigating security risks → ML assists Salesforce in identifying and mitigating potential security risks by analyzing data and detecting anomalies.
Continuously improving products and services → By leveraging ML, Salesforce can continuously improve its products and services, by analyzing customer feedback and using that information to develop new features and improvements.

Salesforce ML Team Structure

At Salesforce, the ML team was divided into three teams:

Research Team: The research team consisted of hundreds of researchers who focused on novel research problems and published research papers
Applied science team: The applied science team was responsible for the pure product, data science use cases
Engineering team: The engineering team was responsible for building the ML platform infrastructure that could support the research and applied science teams.

We found this interesting blog on how ML is used by Salesforce:

‍

How Machine Learning is Building Better Salesforce Systems

Rather than simply automating activities such as data processing, machine learning attempts to reproduce how humans think.

AppExchange and the Salesforce Ecosystem Steve Pogrebivsky

‍

Overview of Salesforce ML Infrastructure

Salesforce ML infrastructure was built on top of a tech stack that was chosen to provide a scalable and reliable platform. Here are some of the most relevant and unique pointers about the infrastructure:

The infrastructure was running on AWS, and Kubernetes was used to manage all of the compute. The use of Kubernetes allowed for easy deployment of any type of machine learning framework, whether TensorFlow or Pytorch.
There was a separation of clusters between the research team, applied science team, and engineering team. This allowed for better management of compute capacity and resources.
The platform was built around an orchestrator for training, a real-time prediction service, batch prediction service, and a front-end API for managing user operations such as authentication and authorization.
The infrastructure consisted of a structured SQL database and an unstructured file store like S3, which were used for managing data. The platform was responsible for managing the data between the two.
There was a mix of GPU clusters, depending on the use case. This allowed for efficient use of resources and better performance.

Machine Learning Platform, ml platform, salesforce, Amazon Web Services, Databricks, truefoundry — Machine Learning Platform Architecture at Salesforce (source)

📌

The main reasons why it is important to separate clusters in machine learning infrastructure are:

1. Security: Separating clusters reduces the risk of data breaches and unauthorized access to sensitive data. Each team can work in their own environment with the necessary security measures.
2. Data Compliance: Different teams may have different data compliance requirements, which can be met by separating clusters. This ensures that each team is working with data that meets the necessary regulatory requirements.
3. Resource Management: Separating clusters allows teams to have the resources they need to complete their tasks without interfering with the resources of other teams. This ensures efficient use of resources and prevents resource contention.

Prototyping at Salesforce: An Opinionated Approach

At Salesforce, the prototyping framework was built around Jupyter Notebooks, allowing data scientists to run short-term experiments interactively and in real-time. The experiments were then transitioned to a long-running job on a large-scale cluster, producing real-time metrics as the job ran.

The training and experimentation SDK was built to abstract the complexity of scheduling jobs, pulling and pushing data, and system dependencies. Data scientists could call a Python API or function to take care of these tasks, and track experiment progress, metrics, logs, and more in the workbench dashboard.

The framework was opinionated, providing an abstracted solution, but still allowing for some flexibility in how data scientists chose to use the platform. However, it was not a completely freeform-style experiment, and there were internal guidelines and standards to follow.

📌

Challenges of Hosting Jupyter Notebooks at Scale with Sensitive Data:
When hosting Jupyter Notebooks at a large scale with sensitive data, the major challenges involve approval workflows for authentication. Data scientists must obtain approval from a certain person or manager to access the data. The notebook environment is ephemeral and destroyed after experiments are completed, but all artifacts generated are persisted. The authentication is API-driven and integrated with internal systems.

How to Manage Costs for Large-Scale Machine Learning Projects in the Cloud

Large-scale machine learning projects can quickly become costly, especially when utilizing GPU resources in the cloud. In order to manage costs during the prototyping phase, there are a few strategies that can be employed.

Reserved Capacity: If you know how much capacity you need, you can reserve it in advance and get a discount on pricing. This works well if you have a good idea of what your resource requirements will be in the long run.
Auto-Scaling: If you're not sure how much capacity you'll need or if your resource requirements fluctuate, auto-scaling can help. By automatically scaling resources up or down based on demand, you can avoid paying for unused capacity.

While there are other strategies for reducing costs, such as utilizing spot instances, these often require a lot of engineering effort and may not be practical for long-running jobs. Additionally, spot instances may not always be available in regions with GPU resources.

By utilizing reserved capacity and auto-scaling, you can effectively manage costs while still having the resources you need for your machine learning projects. These strategies continue to be relevant today and can be applied to any public cloud provider.

Automated Flow for Moving Models

Salesforce's promotion flow for moving models from one environment to another relied on the notion of golden datasets for every domain. The data scientists could evaluate the model's performance on these datasets and also on randomized datasets to assess the model's capability to perform well on different types of data. This helped them decide whether to promote a model into higher environments or not.

The promotion process was done through the workbench, but it was intentionally kept slightly manual to ensure that the model performed beyond a certain threshold on n+1 types of datasets. This was challenging because Salesforce is a multi-tenant system, and every customer has a different dataset, sometimes numbering in the hundreds of thousands. Salesforce built hundreds of thousands of models, each specific to a customer and dataset, and automated the process as much as possible.

Overall, the promotion flow at Salesforce was designed to ensure that models were thoroughly evaluated and performed well on diverse datasets before being promoted to higher environments.

Building a Multi-Tenant Real-Time Prediction Service for Complex Models

Building a multi-tenant real-time prediction service is a complex task that involves serving a large number of models with different sizes and architectures in real-time while meeting specific SLA requirements. To address this challenge, the engineering team at Salesforce developed a serving layer that underwent several iterations.

Initially, the team relied on a structured database for metadata and a file store for model artifacts. However, this approach was not scalable for larger and more complex models. To solve this, they sharded their clusters based on the complexity of the model and the type of compute required. For instance, smaller models ran on CPUs, while larger models needed GPUs. Clusters were dedicated to specific types of models, such as NLP models, LSTM models, transformer models, image classification models, object detection models, and OCR models.

The team also developed a layer that orchestrated deploying services on different clusters and node groups. They implemented caching to ensure frequently requested models had lower latencies. Initially, data and research scientists were allowed to use their preferred framework, which made it challenging to uniformly serve the models. The team narrowed down the frameworks to one or two and optimized the models for these frameworks.

Finally, the team converted the models into a uniform format regardless of the original training framework, allowing them to optimize the serving code for each type of model. Overall, the team's efforts resulted in a scalable, efficient, and reliable real-time prediction service.

The real-time inference was my favorite thing to work on. And I think, by the end of it, we also were able to file a patent on it. So, it was a great engineering feature that we added to the platform. It was the most used feature, actually. We were doing double-digit, millions of predictions per day and so it was very, very satisfying to see that getting used by so many customers.
- Arpeet

We found this interesting blog on ML Lakes and the Salesforce's Data Platform`s architecture:

‍

ML Lake: Building Salesforce’s Data Platform for Machine Learning - Salesforce Engineering Blog

Explore some unique challenges Salesforce has in the realm of data management and learn how ML Lake addresses them.

Salesforce Engineering Blog Eli Levine, Laura Lindeman

Optimization of models for enterprise AI

They heavily benchmarked models and aimed to stay within the bounds of widely supported operators and other operations within a framework to ensure easy conversion. Custom operators were a high-friction conversion and required a high touch approach, but the team found that 95% of use cases were easily solved by off-the-shelf models that did not require novel techniques. This allowed them to optimize for the majority of use cases and spend time on the remaining 5% of models that were not as widely used.

Arpeet also noted that frameworks such as Onyx, Triton, and NVIDIA's Inference Server have made significant strides in standardizing model formats and benchmarking, making them valuable tools for large real-time inference use cases.

تدابير الأمن والموثوقية في منصة Salesforce للذكاء الاصطناعي

الموافقة سير العمل: قبل نشر النموذج في بيئة الإنتاج، كانت هناك مسارات عمل للموافقة حول مجموعة البيانات لضمان خصوصية البيانات.
عزل أمني: كانت بيئة الإنتاج معزولة تمامًا وحاصلة على شهادات مثل HIPAA وISO للامتثال.
تعدد المستأجرين: غطت المصادقة جميع المستأجرين لضمان أن يتمكن كل عميل على حدة من الوصول إلى بياناته الخاصة فقط.
التكرار والتوافر العالي: تم دمج تدابير كافية للتكرار والتوافر العالي في المنصة لضمان الموثوقية.

منصة البنية التحتية للتعلم الآلي مقابل منصة نشر البرمجيات

تتشابه منصات البنية التحتية للتعلم الآلي ومنصات نشر البرمجيات في كثير من الجوانب، وفقًا للمناقشة بين أنوراغ وأربيت. فيما يلي النقاط الرئيسية المستخلصة:

أوجه التشابه: كلاهما يتطلب طبقة بيانات، طبقة حوسبة، طبقة تنسيق، خدمة مصادقة وبوابة API، وخدمة خلفية. قد تقوم الخدمة الخلفية بتشغيل أعباء عمل التحليلات أو هندسة البيانات في بنية تحتية قياسية، بينما في بنية التعلم الآلي، تقوم بتشغيل أعباء عمل التعلم الآلي.
الاختلافات: قد تحتوي البنية التحتية للتعلم الآلي على بحيرة بيانات أو بحيرة تعلم آلي لأعباء عمل التعلم الآلي، بينما قد لا تتطلب البنية التحتية القياسية ذلك. قد تستخدم البنية التحتية للتعلم الآلي أيضًا أنظمة تنسيق متخصصة.
الأدوات: تختلف الأدوات المستخدمة في كلتا منصتي البنية التحتية. قد تتطلب بنية التعلم الآلي أدوات متخصصة للتنسيق، بينما قد تستخدم البنية التحتية القياسية أدوات أبسط. ومع ذلك، فإن أداة النشر لكلا منصتي البنية التحتية هي نفسها في معظم الحالات، خاصة عند النشر على مجموعة Kubernetes.

بشكل عام، لا يوجد فرق كبير بين منصات البنية التحتية للتعلم الآلي ومنصات نشر البرمجيات، باستثناء طبيعة عبء العمل والأدوات المطلوبة للتنسيق.

أفكار إضافية من أربيت

MLOps: بناء أم شراء

استخدم البدائل المتاحة التي توفر العديد من الميزات الجاهزة عند بناء البنية التحتية للتعلم الآلي للشركات متوسطة الحجم والشركات الناشئة.
بالنسبة للشركات الناشئة في مراحلها المبكرة، استخدم نصوص بايثون البرمجية البسيطة، وقم بالتدريب على جهاز واحد مزود بوحدات معالجة رسوميات متعددة، وطبق شكلاً من أشكال التنسيق لتوفير التكاليف.
بالنسبة لسير عمل التعلم الآلي الراسخة في الشركات الصغيرة والمتوسطة، استخدم أدوات مفتوحة المصدر مباشرة، ولكن فكر في استخدام حل جاهز للاستخدام مثل TrueFoundry.

نصائح لمهندسي التعلم الآلي

أعتقد أن التركيز على مجال متخصص في هذه المرحلة، سواء كان ذلك في تشغيل سير عمل الذكاء الاصطناعي على نطاق واسع، سيكون على الأرجح أحد التحديات الصعبة القادمة.
- أربيت

اقرأ مدوناتنا السابقة في سلسلة TrueML

‍

‍‍True ML Talks #3 - Machine Learning Platform @ Facebook

In this blog, we dive deep into Facebook’s ML Platform FBLearner Flow. Understand how it solved both Software and ML deployment and understand its architecture‍TrueFoundry discussing ML platform in Meta

TrueFoundry Blog TrueFoundry

‍

استمر في مشاهدة TrueML سلسلة يوتيوب وقراءة جميع TrueML سلسلة المدونات.

TrueFoundry هي منصة كخدمة (PaaS) لنشر التعلم الآلي فوق Kubernetes لتسريع سير عمل المطورين مع منحهم مرونة كاملة في اختبار ونشر النماذج، وضمان أمان وتحكم كاملين لفريق البنية التحتية. من خلال منصتنا، نمكّن فرق التعلم الآلي من نشر ومراقبة النماذج في 15 دقيقة بموثوقية 100% وقابلية للتوسع والقدرة على التراجع في ثوانٍ - مما يسمح لهم بتوفير التكاليف وإطلاق النماذج إلى الإنتاج بشكل أسرع، وبالتالي تحقيق قيمة تجارية حقيقية.

Discuss About your ML Pipeline Challenges with us here

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now