TrueML Talks #25 - الذكاء الاصطناعي التوليدي وعمليات نماذج اللغة الكبيرة (LLMOps) لاستراتيجية الدخول إلى السوق (GTM) في Twilio

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
We are back with another episode of True ML Talks. In this, we again dive deep into MLOps and LLMs Applications at GitLab and we are speaking with Pruthvi Shetty.
Pruthvi is a Staff Data Scientist at Twilio. Before that, he was also leading ML for SAP as well as a startup called ZapLabs which was acquired by Anywhere RE. In Twilio, Pruthvi leads the Gen AI efforts for Twilio, and we'll deep dive into that today.
📌
Our conversations with Pruthvi will cover below aspects:
- ML and GenAI applications and use cases around GTM
- XGPT: Twilio's Powerhouse for Go-to-Market Teams
- Battling OpenAI Rate Limits
- Experimenting with Open-Source LLM
- RFP Genie: Automating RFP Responses
- Workflow for Traditional ML Models
Watch the full episode below:
Leveraging AI for Go-To-Market Teams
Twilio has a long history of leveraging machine learning (ML) and data science to optimize its products and services. However, the recent advancements in Generative AI (GenAI) have opened up new opportunities to further enhance the way GTM teams operate.
Traditional ML for GTM
While GenAI is a powerful tool, Twilio has not abandoned its traditional ML roots. The company continues to use ML for various GTM tasks, such as:
- Propensity models: Predict the likelihood of a customer converting into a paying user.
- Cross-sell models: Recommend additional products to existing customers based on their usage data.
- Upsell models: Recommend upgrades to higher tiers of service to existing customers based on their usage data.
- Lead generation models: Identify potential new customers who are likely to be interested in Twilio's products.
GenAI for GTM
Twilio recognized the potential of GenAI early on and established a dedicated team to explore its applications. This team has built a suite of GenAI-powered tools specifically for GTM teams, including:
- XGPT: This versatile tool empowers GTM teams to generate personalized outreach content like emails, saving significant time and effort. It also tackles customer inquiries, processing a remarkable 15,000 questions per month, demonstrating its ability to handle large volumes of interactions.
- FlexGPT and SegGPT: Tailored for specific products, these AI models generate comprehensive and accurate documentation for both Flex and Segment, ensuring users have readily available information.
- RFP Genie: This transformative tool tackles the tedious task of answering RFP questions. By processing inquiries with 90% accuracy, it reduces completion time from weeks to minutes, freeing up valuable resources for GTM teams.
XGPT: Twilio's Powerhouse for Go-to-Market Teams
Twilio recognized the potential of Generative AI (GenAI) early on and built a dedicated team to explore its applications. This team, led by Pruthvi, has built a suite of GenAI-powered tools specifically for GTM teams. One of the key tools they built is XGPT.
XGPT was developed as a response to two issues with using publicly available GenAI models like ChatGPT:
- Security and Privacy: Public models train on data shared publicly, which raises security and privacy concerns for Twilio's internal information.
- Limited Customization: Public models cannot incorporate Twilio's specific internal knowledge, such as product release information, sales plays, and competitor positioning.
XGPT tackled these issues by:
- Leveraging Twilio's data: Trained on internal information like product releases, sales plays, and competitor analysis, XGPT provides insights relevant to specific roles and situations.
- Ensuring data privacy: XGPT utilizes Twilio's private API, ensuring data remains secure and unavailable for external training.
We've had it for about 4-5 five months now. Currently, we are answering about 15,000 questions a month, and we've seen a super good lift in the power users of our applications. That's been XGPT so far.
- Pruthvi
XGPT's Functionality and Impact
XGPT is a secure and customizable platform that:
- Answers questions: It provides answers to user queries based on a vast knowledge base of Twilio's internal and external documents.
- Generates content: It helps users create personalized outreach content and emails based on customer conversations.
- Improves GTM efficiency: It empowers GTM teams with readily available information about Twilio's products, competitors, and sales strategies, leading to increased productivity and improved customer experience.
Technical Architecture of XGPT
XGPT is not just one model, but a suite of products, each tailored for specific GTM roles and needs. These products include FlexGPT for customer service representatives and SegGPT for segmentation tasks.
A custom pipeline of RAG flow gathers all relevant information for XGPT, including public and private data. This information comes from various sources, such as content management systems, internal documents, call transcripts, Salesforce notes, and product documentation.
Offline embeddings are used for FlexGPT and other applications, created using tools like Space and Chroma. Custom tweaks ensure scalability and control. In addition to text, XGPT also understands audio and visual data through multimodal embeddings. Whisper transcribes product demos, while a vision model extracts information from charts and diagrams. These embeddings are then converted to Face embeddings, allowing XGPT to link them to relevant sources in its answers.
The main LLM processing is handled by OpenAI API. In specific cases, like RFPs, Llama is used for interpretation. Parallelization and batching strategies optimize processing and avoid rate limits. An interpretation layer filters and contextualizes questions before feeding them to the LLM. XGPT provides links to the relevant documentation for each answer, allowing you to explore further.
Heroku hosts the applications, ensuring stability and performance. Docker containers enable easy deployment and scalability. Data is securely stored in Postgres. Airtable tracks questions and feedback, constantly improving XGPT's functionality. CloudWatch monitors metrics for optimal performance.
Future of XGPT and RAG flow
The team is constantly working on improving XGPT and RAG flow. Their vision for the future includes:
- Enhanced RAG flow: This includes simplifying the process of creating and maintaining embeddings for all Twilio documentation.
- Automated Documentation Gap Detection: XGPT can help identify areas where documentation is lacking and suggest additional content to fill the gaps.
- Hallucination Mitigation: The team is exploring new techniques to further reduce the occurrence of hallucinations in XGPT's responses.
Battling OpenAI Rate Limits: Engineering Tricks for a Parallel XGPT
Twilio's XGPT, a powerhouse for go-to-market teams, faced a significant obstacle: OpenAI's rate limits. Answering questions iteratively, the initial version quickly hit these limits. Rotating API keys offered a temporary solution, but OpenAI's organizational rate limit proved more challenging.
To solve this challenge, The team's first step was to utilize OpenAI's best practices for avoiding rate limits and parallelizing calls. This provided a solid foundation, but further optimization was needed. Twilio's engineers also devised a clever solution: strategically batching API calls to fly under OpenAI's radar. This involved carefully grouping questions while maintaining the user experience of the application. To further improve efficiency, engineers assigned strategic weights to different tasks. This ensured that critical questions received priority while still allowing less urgent requests to be processed.
Experimenting with Open-Source LLM
While both ChatGPT and Llama are powerful language models, Twilio opted for Llama for their XGPT application for a few key reasons:
- Cost-Effectiveness: Llama operates at a significantly lower cost than ChatGPT, making it a more economical choice for a task like interpretation, which requires less complex reasoning and nuance.
- ملاءمة المهمة: تتضمن المرحلة الأولى من XGPT تفسير أسئلة المستخدمين. وهذه مهمة تتناسب مع قدرات لاما تمامًا، نظرًا لتفوقها في فهم معاني النصوص وترجمتها.
- تجنب الارتباط بمورد واحد: ترغب Twilio في تجنب الاعتماد الكلي على مورد واحد لتلبية احتياجاتها من نماذج اللغة الكبيرة (LLM). باستخدام لاما جنبًا إلى جنب مع ChatGPT، يكون لديهم خيار احتياطي في حال حدوث انقطاعات أو تغييرات في سياسات OpenAI.
باختيار لاما للطبقة الأولى من التفسير، حققت Twilio حلاً فعالاً من حيث التكلفة يلبي متطلبات المهمة، مع تنويع استخدامها لنماذج اللغة الكبيرة (LLM) وإظهار التزامها تجاه مجتمع المصادر المفتوحة.
RFP Genie: أتمتة الردود على طلبات تقديم العروض
RFP Genie هي أداة أخرى للذكاء الاصطناعي التوليدي طوّرها فريق Twilio الداخلي. تقوم بأتمتة عملية الرد على طلبات تقديم العروض (RFPs)، وهي مهمة قد تكون مستهلكة للوقت ومملة لفرق التسويق والمبيعات (GTM). يمكن لـ RFP Genie أن:
- استخراج المعلومات الأساسية: استخراج المعلومات والمتطلبات الأساسية تلقائيًا من وثائق طلبات تقديم العروض.
- توليد الردود: توليد ردود شاملة ودقيقة لكل سؤال في طلب تقديم العروض، مما يوفر على فرق التسويق والمبيعات (GTM) ساعات عمل لا تحصى.
- الحفاظ على الاتساق: ضمان اتساق جميع الردود مع هوية Twilio التجارية ورسائلها.
سير عمل نماذج التعلم الآلي التقليدية
في المقدمة، تطرقنا بإيجاز إلى نماذج التعلم الآلي التقليدية التي لا تزال تُستخدم للتسويق والمبيعات (GTM) في Twilio، مثل نماذج الميل ونماذج توليد العملاء المحتملين.
يستفيد سير عمل نماذج التعلم الآلي التقليدية من مزيج قوي من الأدوات والتقنيات:
- تخزين البيانات: تُخزن بيانات العملاء في قواعد بيانات متنوعة، بما في ذلك Postgres و Airtable، وذلك حسب النموذج المحدد.
- تدريب النموذج: تُستخدم مسارات SageMaker لتدريب نماذج التعلم الآلي، مما يضمن قابلية التوسع والكفاءة.
- إدارة مسارات البيانات ودفاتر الملاحظات: يوفر Abacus منصة سهلة الاستخدام لإدارة مسارات البيانات ودفاتر الملاحظات، مما يبسط عملية تطوير النماذج.
- النشر: يضمن Buildkite تلبية جميع متطلبات الامتثال التنظيمي قبل نشر النماذج في بيئة الإنتاج.
اقرأ مدوناتنا السابقة ضمن سلسلة True ML Talks:
استمر في مشاهدة TrueML سلسلة يوتيوب وقراءة سلسلة مدونات TrueML.
TrueFoundry هي منصة PaaS لنشر التعلم الآلي (ML Deployment PaaS) تعمل فوق Kubernetes لتسريع سير عمل المطورين مع منحهم مرونة كاملة في اختبار ونشر النماذج، مع ضمان الأمان والتحكم الكامل لفريق البنية التحتية. من خلال منصتنا، نمكّن فرق التعلم الآلي من نشر ومراقبة النماذج في 15 دقيقة بموثوقية 100% وقابلية للتوسع والقدرة على التراجع في ثوانٍ - مما يسمح لهم بتوفير التكلفة وإطلاق النماذج إلى الإنتاج بشكل أسرع، مما يحقق قيمة تجارية حقيقية.
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI


















.png)
.webp)










.webp)






