TrueML Talks #26: Enterprise GenAI & LLMOps

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

We are back with another episode of True ML Talks. In this, we again dive deep into MLOps pipelines and LLMs Applications in enterprises as we are speaking with Labhesh Patel.

Labhesh was a CTO and Chief Scientist at Jumio Corporation, where he worked in leveraging ML / AI in identity verification space. He has held multiple leadership positions, both in engineering and science roles in the past, with leading organizations.

📌

Our conversations with Labhesh will cover below aspects:
- Interesting Research Papers and Patents
- Utilizing AI to solve business problems
- Building the MLOps Pipeline
- Breaking Down Silos: Building Cohesive MLOps Teams for Success
- Navigating Cloud Provider Roadblocks
- Future of Generative AI

Watch the full episode below:

Interesting Research Papers and Patents

Research Papers

Attention is All You Need: This paper introduced the transformer network, which revolutionized natural language processing and laid the foundation for many LLMs like ChatGPT.

‍

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.arXiv.orgAshish Vaswani

arXiv.org Ashish Vaswani

‍

Visual Question Answering with Segmented Guided Attention Networks: This paper proposed a novel method for answering questions about images by utilizing segmentation maps and attention mechanisms. While superseded by newer techniques, it highlights the importance of focusing on specific areas of an image for accurate answers.

‍

Segmentation Guided Attention Networks for Visual Question Answering

Vasu Sharma, Ankita Bishnu, Labhesh Patel. Proceedings of ACL 2017, Student Research Workshop. 2017.

ACL Anthology

‍

CycleGen: This paper explores the idea of generating text summaries based on user reviews and product characteristics. It predates ChatGPT and demonstrates the potential for LLMs to assist with writing tasks.

‍

Cyclegen: Cyclic consistency based product review generator from attributes

Vasu Sharma, Harsh Sharma, Ankita Bishnu, Labhesh Patel. Proceedings of the 11th International Conference on Natural Language Generation. 2018.

ACL Anthology

‍

Patents

Voice over IP Buffering and Negotiation Protocol: This patent arose from a simple bug fix that improved voice quality in VoIP calls. It highlights the potential for innovation in seemingly mundane solutions and the importance of considering defensive patenting strategies.

Utilizing AI to solve business problems

There are a lot of challenges and opportunities in transforming manual processes with AI. Here are some key takeaways:

Start with the Business, Not the Buzz

Identify the core business problem: Why automate? What are the quantifiable benefits (scalability, cost reduction, speed)?
Manage expectations: AI isn't magic. Communicate what's achievable and set realistic performance metrics.
Understand data's role: 90% of the work lies in data management, collection, and quality assurance. Clean data is vital for accurate models.

Building the Right Path

One step at a time: Focus on a single, high-impact use case to prove the concept and build your pipeline.
Compliance first: Ensure proper data consent and usage before even touching a single byte.
Metrics matter: Track relevant metrics (precision, recall, error rates) to evaluate success and guide further decisions.
Teamwork is key: Assemble a team with expertise in ML engineering, data management, and product development.

Beyond the First Step

Iterate and evolve: Continuously evaluate, improve, and expand your AI solutions based on data and feedback.
Embrace the learning curve: Be prepared to invest in talent and education to build a culture of AI understanding within your organization.

Important things to keep in mind

Beware of the 99% trap: High accuracy on isolated cases can mask larger problems. Pay attention to overall performance and error rates.
Think statistically: Metrics like precision and recall provide a more nuanced picture of AI performance than simple accuracy percentages.

By prioritizing business needs, focusing on data quality, and building a strong team, you can navigate the complexities and unlock the true potential of AI to transform your operations.

Building the MLOps Pipeline

For anyone building complex ML systems, there are some things you can keep in mind.

Embrace Cloud-First, But Remain Agile

Leverage your cloud provider's built-in MLOps tools like AWS SageMaker for fast initial setup.
Avoid vendor management and compliance hurdles by staying within the cloud ecosystem.
Move beyond native offerings when limitations arise, seeking out specialized solutions like open-source platforms or vendors.

Importance of Data Quality

Recognize that cloud providers often neglect data quality, requiring additional internal systems or third-party services.
Prioritize automated data cleaning and validation to ensure model accuracy and performance.

Architectural considerations

Model building vs. production: Consider separate teams for model development and deployment, with distinct skill sets and ownership.
Structure for scalability and agility: Design a flexible architecture that can accommodate new tools and integrations as the pipeline evolves.

Breaking Down Silos: Building Cohesive MLOps Teams for Success

In the fast-paced world of MLOps, collaboration is king. But too often, teams become fragmented, with data scientists building models in isolation and engineers struggling to deploy and maintain them. The result? Slow progress, missed opportunities, and frustrated stakeholders.

So how do we break down these silos and build MLOps teams that thrive?

Bringing everyone together

Imagine a cross-functional team of 8-10 individuals, each with unique expertise: product managers, data engineers, DevOps, security, ML engineers, QA, and even customer support. This diverse group, united by a common goal (e.g., reducing fraud), becomes a powerful force for innovation and efficiency.

Here's why this approach works:

Shared ownership: When everyone feels responsible for the entire lifecycle of a model, there's no "over the fence" mentality. Issues get tackled collaboratively, and solutions are optimized for real-world deployment and maintenance.
Informed decisions: Data engineers understand ML needs, and ML engineers appreciate deployment realities. This cross-pollination of knowledge leads to better model selection and feature engineering, avoiding the pitfalls of "research-perfect" models that are impossible to deploy.
Faster iterations: Close collaboration fosters communication and agility. The team can quickly experiment, refine, and iterate on models, maximizing the impact of their efforts.

Tackling skill gaps for building such a team

It is of the utmost importance to do targeted hiring. You need data engineers with a strong understanding of ML pipelines and ML engineers who appreciate software engineering principles. This combination of diverse skills is the secret sauce to a high-performing MLOps team.

Breaking down silos isn't just about structure, it's about culture. Encourage open communication, celebrate diverse perspectives, and create an environment where everyone feels empowered to contribute. By doing so, you'll build a cohesive MLOps team that can turn your ML dreams into reality.

Navigating Cloud Provider Roadblocks

There are a lot of potential roadblocks you can encounter when heavily relying on a Cloud Provider. In such scenarios, it is very important to be able to pivot when such a roadblock arises.

Don’t be afraid to explore alternatives: When cloud providers hit limitations, look for specialized vendors or open-source solutions to fill the gaps.
Proactive communication matters: Don't hesitate to voice your concerns directly to cloud providers. Feedback can lead to improved collaboration and access to exclusive solutions.
Adaptability is key: Be prepared to adjust your approach based on emerging technologies and changing provider offerings.

Here are some common challenges that can arise

Challenge 1: Super-regulated data access

When dealing with sensitive data (PII, healthcare records), strict regulations like GDPR and CCPA come into play. Cloud providers, while compliant with general standards, might not offer specific tools for secure access and audit trails.

The potential solutions to these are:

Alternative vendors: Look for companies specializing in highly regulated environments and offering granular access control and auditability features.
Open-source solutions: Consider open-source tools and customize them to address specific compliance needs.

Challenge 2: Proprietary features & limited access

Sometimes, cloud providers hold back specific features or release them on their schedule, leaving clients waiting for crucial functionalities.

The potential solution to this is to be proactive in communicating with your Point of contact for that cloud provider.

Giving direct feedback to the POC and communicating the blockers you face can sometimes land you and your team early access to private beta programs, ensuring that you don't miss out on future solutions.

Remember, even with roadblocks, a proactive and adaptable mindset can turn challenges into opportunities in the ever-evolving world of cloud-based MLOps.

Future of Generative AI

Generative AI, particularly LLMs (Large Language Models), is all the rage. However, currently, LLMs are in a "hype phase", praised for their magical abilities to handle diverse tasks. Developers resort to throwing API calls at LLMs, leading to issues like rate limiting and high costs.

Challenges for Enterprise Adoption

Cost and scalability: Large models are expensive and computationally demanding, making them unsuitable for widespread enterprise use.
Model safety and bias: Enterprise environments require model safety and control over potential biases, which can be difficult with LLMs.
Inference time: LLMs struggle with latency, causing delays that hinder productivity and user experience.

The Future: Small Language Models to the Rescue?

There might be a shift towards SLMs, trained for specific tasks and domains within enterprises.

This "routered architecture" would direct queries to the appropriate SLM for faster and more efficient responses.

Smaller models also address cost and scalability concerns, making them more accessible to businesses.

Transition Triggers and Considerations

The transition will likely happen gradually, driven by the practical limitations of LLMs and the increasing availability of effective SLMs.

Cost reduction and improved latency will play key roles in accelerating the adoption of SLMs.

Read our previous blogs in the True ML Talks series:

‍

GenAI and LLMOps for GTM (Go-To-Market) @ Twilio‍

Dive deep into Twilio’s GenAI applications like XGPT, and RFP Genie for revolutionizing GTM (Go-To-Market) Strategies. Deep dive into the Backend for these applications.

TrueFoundry Blog TrueFoundry

‍

Keep watching the TrueML YouTube series and reading the TrueML blog series.

TrueFoundry is a ML Deployment PaaS over Kubernetes to speed up developer workflows while allowing them full flexibility in testing and deploying models while ensuring full security and control for the Infra team. Through our platform, we enable Machine learning Teams to deploy and monitor models in 15 minutes with 100% reliability, scalability, and the ability to roll back in seconds - allowing them to save cost and release Models to production faster, enabling real business value realisation.

Discuss About your ML Pipeline Challenges with us here

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now