Jupyter Notebooks و VS Code المستضافة على Kubernetes

Published: July 4, 2026

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Jupyter notebooks are a powerful and popular tool that provides an interactive computing environment, combining code, data visualization, and explanatory text, making it easier to work with data and share insights. Data scientists use Jupyter notebooks for various tasks throughout the data analysis and machine learning lifecycle like exploratory data analysis (EDA), data preprocessing, visualization, model development, evaluation and validation, etc. For many of these usecases, just installing Jupyter Notebook on your laptop is enough to get started. However, for many companies and organizations, this is not an option and we need hosted Jupyter notebooks.

Why do we need hosted Jupyter notebooks?

Access to resources: A lot of Data-Science workloads require heavy computing which cannot be supported by local machines of DS or MLEs. The hosted Jupyter notebooks can provide DS/MLEs access to powerful machines and GPUs.
Data Access and Security: In many enterprises, DS/MLEs work on projects involving sensitive data which cannot be downloaded on the employee's laptop. So with hosted Jupyter notebooks, companies can provide access to DS/MLEs to work on sensitive projects.
Reproducibility and Collaboration: Hosted Jupyter Notebooks are an excellent way to share executable code and results within a team. Since the development environment is the same, the results are 100% reproducible and team members can collaborate on projects.

How can a company provide hosted Notebooks to its data scientists?

Here are the options that a company can have today to provide access to Jupyter Notebook to its engineers:

Provision a VM for every Data scientist:

DS/MLEs can set up the environment and run a jupyter-server on a VM which can be used for running the workloads. Here is a simple guide on how you can run jupyterlab on an ec2 instance.

👍 Pros:
- Gives full control of the machine in the hand of a DS
- The whole environment is persistent. The VM can be stopped and restarted in the same state.

👎 Cons:
- Large cloud computing cost - There will be no auto-stop feature. DS can start a VM and can be left unutilized for a large part of the time thereby increasing the costs.
- Difficult to manage and track a large number of VMs centrally.
- DS needs to set up a lot of things to set up the workbench for starting the experimentation.
- Difficulty in reproducibility - DS might have installed a bunch of packages which are not tracked anymore and it takes a lot of time to productionize that code which runs on that VM.

Use a Managed Solution:

Another option can be using a managed solution like AWS Sagemaker, Vertex AI Notebooks, or Azure ML Notebooks. While each of these methods has advantages, here are a few pros and cons of these in general.

Comparison of Different Notebook Solutions

Let us discuss what each of these fields means:

Persistent Environments: This refers to the persistence of the python environment. (Whether or not pip package installs and user-created environments are saved across Notebook Restarts)
Persistent Root Installations: When root access is provided to the user, whether or not root package installations (e.g. apt install pkgs) are persisted across Notebook Restarts.
Resource Usage Monitor: Whether or not the user can monitor CPU/Memory Usage from the notebook inside the notebook screen itself.
Code Server Support: If the notebook instance can be connected to a hosted VS Code Server.
SSH into the container: Ability to SSH into a running notebook and connect from the local machine
Auto-Stop: Feature to stop the notebook after "Inactivity Period". Vertex Provides this feature in its "Managed Notebooks" version. Sagemaker has a way to stop notebooks after a time of inactivity, but the person needs to configure an "Init Script" for it instead of a simple deployment option which creates a bit of friction for the user.
Cross-Cloud Support: This feature is exclusive to Truefoundry as you can run the notebook on any of the three cloud providers with the exact same user experience.
Customization of Image: Starting the notebook with a particular set of packages pre-installed.
Mount shared Volume in Notebook: Notebooks provided by each of the cloud providers (AWS/Azure/Vertex) allow creating a separate volume for each notebook but don't provide a way to mount a shared volume. For example, consider that an enterprise has a dataset of 500GB that is being used by multiple Data Scientists for different use cases. Now every DS needs to duplicate this data and pay for the cost of multiple volumes even when the notebooks are not running. This could potentially drain 100s and 1000s of dollars in cloud costs!
With Truefoundry you can mount a NFS type volume in read-only mode to multiple notebooks and thereby saving data-duplication costs!

jnHost Notebooks over Kubernetes:

Another option can be hosting notebooks over Kubernetes, but it comes with its own set of challenges as data scientists cannot directly interact with Kubernetes and need software in between that provides a simple interface to launch Jupyter notebooks. Lets us see what are the options available in this:

Kubeflow Notebook Operator:
Kubeflow helps to make deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. It has a notebook feature that helps to manage and run notebooks easily.
While Kubeflow is a large open-sourced project which provides a lot of features for Machine Learning use cases, it is very difficult to install and manage Kubeflow by yourself.

👍 Pros:
- Easy to launch and manage notebooks for DS
- Persistent home directory backed by a disk
- Option for pre-defined images for sklearn, pytorch, and tensorflow which comes with all installed dependencies.
- Open-sourced code-base
- Get a culling feature that stops notebooks after some time of inactivity.

👎 Cons:
- Difficult to set up Kubeflow on Kubernetes. It takes a lot of time to install and maintain Kubeflow
- For providing notebooks in multiple regions, different Kubernetes clusters need to be created and Kubeflow needs to be installed on every single cluster - leading to high infrastructure and maintainenance costs.
- Python packages are by default not persistent, which means you need to install packages each time you restart
- No direct way to gain root access to the container [ can be useful for multiple use cases ]
- Stopping notebooks cannot be configured on a per notebook level and is a global setting.

Host JupyterHub on Kubernetes:
JupyterHub is a great setup for multi-user use cases which helps in the optimal usage of resources. Deploying JupyterHub on Kubernetes can be done with an Open-Sourced project called Zero to JupyterHub with Kubernetes:

👍 Pros:
- Multiple users can work together easily with authentication support
- Easily setup auto-stop for notebooks
- Easy management of environments

👎 Cons:
- Difficult to set up and manage. We must configure Networking, Persistent Volumes, Scaling, and Load Balancing for JupyterHub to work correctly.
- Difficult to run GPU workloads on different types of GPUs on Jupyterhub. For instance, read this.
- Environments are not persistent

While there are a lot of solutions available right now, each solution comes with its own set of limitations. At Truefoundry we have tried to bridge this gap and tried to build a notebook solution that satisfies all the needs of a DS and also keeps costs in check. In the next section, we will describe our approach to building the notebook solution and the challenges we faced in making the same.

Truefoundry's approach towards building a Notebook solution

Truefoundry is a developer platform for ML teams that helps in deploying Models, Services, Jobs, and now Notebooks on Kubernetes. You can read more about what we do here. Our motivation for building a notebook solution was to simply enable experimentation and development on our platform. After studying all the solutions available, we decided to solve the pain points and missing features in the other platforms so that data scientists can have the best experience without incurring a lot of costs. A few things we wanted to enable are:

Persist environments
The ability for the user to install root dependencies dynamically and not be limited to a few sets of dependencies.
إمكانية التشغيل الفوري في حالات معينة نظرًا لأن أسعارها يمكن أن تكون أقل بكثير. قد تكون التجربة سيئة في بعض الحالات حيث يمكن استعادة الجهاز الافتراضي، لكننا وجدنا أن هذا نادر جدًا في الممارسة العملية، وتبرر الفائدة مقابل التكلفة هذه العوائق العرضية.
إمكانية تهيئة الإيقاف التلقائي لتوفير التكاليف.

بناءً على دفاتر ملاحظات Kubeflow

يدعم Kubeflow تشغيل دفاتر الملاحظات على Kubernetes. ويوفر عددًا من الميزات الجاهزة لدفاتر الملاحظات. ومع ذلك، أردنا معالجة المشكلات التي أبرزناها أعلاه في دفاتر ملاحظات Kubeflow وتوفير تجربة سلسة لعلماء البيانات والمطورين.

لذلك، كان علينا إجراء تغييرات في وحدة التحكم بالدفاتر، ودمجها مع الواجهة الخلفية لـ Truefoundry، وعرض دفاتر الملاحظات على واجهة المستخدم الخاصة بنا.

قمنا بتثبيت وحدة التحكم بالدفاتر ولكن واجهنا بعض المشكلات، مما اضطرنا إلى إجراء تغييرات في kubeflow-notebook-controller:

الإيقاف التلقائي (ميزة الإيقاف التلقائي لدفاتر الملاحظات) لا يعمل مع نقاط النهاية المخصصة. اعتمدت وحدة التحكم بدفاتر ملاحظات Kubeflow على تنسيقات معينة لنقاط النهاية لكي يعمل الإيقاف التلقائي. وهذا يحد من قدرة المستخدم على تحديد نقطة النهاية التي يختارها.
نفس مهلة الإيقاف التلقائي عبر المجموعة. هذا يعني أنه يمكنك فقط تعيين "مهلة عدم النشاط" لدفاتر الملاحظات على مستوى المجموعة. لا يمكنك تعيين قيم مختلفة لمهلة الإيقاف التلقائي لدفاتر ملاحظات مختلفة.
الـ بيئة بايثون الافتراضية ليست دائمة

لقد قمنا بحل المشكلتين المذكورتين أعلاه وأطلقنا tfy-notebook-controller
ونشرناه كـ helm-chart في مستودع Truefoundry للرسوم البيانية العامة (Public Charts). يمكنك العثور على الرسم البياني هنا.

واجهة المستخدم لإنشاء دفاتر الملاحظات وتشغيلها وإيقافها:

لقد أنشأنا واجهة مستخدم سهلة الفهم لعلماء البيانات لبدء دفاتر الملاحظات. يمكن للمستخدم تخصيص مهلة الخمول (مدة عدم النشاط التي بعدها سيتم إيقاف دفتر الملاحظات)، وحجم وحدة التخزين الدائمة (حجم القرص الذي يخزن مجموعة البيانات وملفات التعليمات البرمجية)، والموارد (متطلبات وحدة المعالجة المركزية والذاكرة ووحدة معالجة الرسوميات) وتشغيل دفتر الملاحظات!

مع كل هذه التغييرات، أطلقنا v0 من دفاتر الملاحظات الخاصة بنا.

دفاتر الملاحظات على واجهة مستخدم Truefoundry

ولكن لا يزال أمامنا طريق طويل لتحقيق تجربة مستخدم جيدة، دعونا نرى إيجابيات وسلبيات هذا النهج:

👍 المزايا:
- دليل رئيسي دائم [سيتم الاحتفاظ بجميع الملفات والحزم]
- يمكن تكوين مهلة عدم النشاط (مهلة الإزالة) لكل دفتر ملاحظات
- تشغيل دفتر الملاحظات ببضع نقرات
- تشغيل دفتر الملاحظات بسهولة باستخدام وحدات معالجة الرسوميات (GPUs)

👎 القيود:
- بيئة بايثون غير دائمة (تختفي جميع الحزم المثبتة مع إعادة تشغيل الـ pod)
- لا توجد طريقة لتثبيت الحزم التي تتطلب صلاحيات الجذر
- لا توجد طريقة مناسبة لإدارة بيئات متعددة للتجارب
- لا يمكن تكوين نقطة نهاية لدفتر الملاحظات [أضيفت في الإصدار التالي]

الآن، تعد هذه القيود حاسمة للحل لأنها تعيق العديد من سير عمل علماء البيانات، والتي يمكن أن تكون بسيطة مثل تثبيت "حزم apt" مثل ffmpeg.

حل المشكلات الحرجة في سير عمل المستخدم

حتى هذه النقطة، كنا نستخدم الصور المُعدة مسبقًا لـ Jupyterlab التي يوفرها Kubeflow. ولكن بما أننا بحاجة إلى حل مشكلة البيئات غير الدائمة، والسماح بالوصول إلى الجذر وتثبيت حزم apt. فنحن بحاجة إلى مجموعة خاصة بنا من صور Docker.
فلنلقِ نظرة على كيفية حلنا لهذه المشكلات!

بيئات غير دائمة: كان الهدف هو جعل بيئة بايثون الأساسية دائمة بحيث يتمكن المستخدمون من استخدام دفاتر الملاحظات بسهولة لحالات استخدامهم.

- تم تعديل نص التهيئة (init script) لصورة Docker واستنساخ بيئة Conda الأساسية إلى الدليل الرئيسي وتسميتها jupyter-base
- إضافة ملف .condarc وتعيين $HOME الدليل كمسار افتراضي للبيئة
- تعديل ملف .bashrc لتنشيط jupyter-base البيئة بشكل افتراضي

تثبيت حزم apt
يتوفر للمستخدم خيار توفير قائمة بحزم apt التي يرغب في تثبيتها مسبقًا مع الصورة.
ثم نضيف مرحلة بناء وننشئ صورة مخصصة تأتي مع جميع الحزم المذكورة مثبتة مسبقًا في دفتر الملاحظات!

توفير صلاحيات الجذر
في كثير من الحالات، يحتاج المستخدم إلى صلاحيات الجذر لتثبيت بعض الحزم وتجربة بعض الأمور بسرعة. لهذا الغرض، أنشأنا صورتين لكل نوع من الصور.
truefoundrycloud/jupyter:latest و truefoundrycloud/jupyter:latest-sudo. حيث توفر الصور التي تحتوي على sudo وصول sudo للمستخدم بدون كلمة مرور.
تم ذلك عن طريق تثبيت ملف sudo الثنائي وإضافة المستخدم إلى قائمة sudoers كما هو موضح في هذا الرابط.

ملاحظة: نظرًا لأننا نقوم بتشغيل دفاتر الملاحظات (notebooks) على Kubernetes مع تثبيت دليل المستخدم (home directory)، فإن دليل المستخدم فقط سيكون مستمرًا. لن تكون تثبيتات الحزم الجذرية مستمرة عبر عمليات إعادة تشغيل الـ pod. يرجى قراءة هذا للحصول على فهم أفضل لذلك.

من خلال حل هذه المشكلات، قمنا بحل معظم المشكلات التي يواجهها المستخدمون وقدمنا تجربة جيدة لدفاتر الملاحظات. ولكن مع مرور الوقت، لاحظنا أن المستخدمين واجهوا بعض التحديات التي سنصفها في القسم التالي.

مشكلات قابلية الاستخدام في دفاتر الملاحظات

صعوبة مراقبة استخدام الموارد (وحدة المعالجة المركزية والذاكرة لدفتر الملاحظات): تتعطل النواة (Kernel) كثيرًا بسبب مشكلات نفاد الذاكرة. غالبًا ما نصادف سيناريوهات حيث نقوم بتشغيل دفتر ملاحظات وتتعطل النواة لسبب أو لآخر، مما قد يكون صعبًا في تصحيح الأخطاء.

قد يؤدي تعديل بيئة بايثون أحيانًا إلى حالة سيئة حيث يفشل خادم دفتر الملاحظات في إعادة التشغيل. يمكن أن يحدث هذا لأسباب متعددة مثل قيام شخص ما بإلغاء تثبيت jupyterlab الحزمة. نظرًا لأن البيئة مستمرة، يفشل دفتر الملاحظات في البدء (بمجرد إيقاف دفتر الملاحظات الحالي)
إدارة بيئات متعددة أمر ممكن. ولكن يحتاج المستخدم إلى إضافة بيئة conda يدويًا إلى kernelspec والتأكد من أن kernelspec تم تكوينه بشكل صحيح مما قد يسبب مشكلات.

حل مشكلات قابلية الاستخدام

إضافة مقاييس استخدام الموارد إلى دفتر الملاحظات:
أضفنا مقاييس استخدام الموارد إلى دفتر الملاحظات عن طريق تثبيت الإضافة jupyterlab-system-monitor==0.8.0 وقمنا بتكوين إعداداتها في نص بدء التشغيل عن طريق تمرير الوسائط أثناء بدء تشغيل خادم Jupyterlab.

... jupyter lab \ ... --ResourceUseDisplay.mem_limit=${mem_limit} \ --ResourceUseDisplay.cpu_limit=${cpu_limit} \ --ResourceUseDisplay.track_cpu_percent=True \ --ResourceUseDisplay.mem_warning_threshold=0.8

هذا ما يبدو عليه الأمر في واجهة المستخدم:

فصل النواة التي تشغل خادم Jupyterlab عن نواة التنفيذ

نحتاج إلى التأكد من أن أي تغييرات يجريها المستخدم في الدليل الرئيسي، يجب أن يعاد تشغيل دفتر الملاحظات دائمًا دون أي مشاكل. لهذا، استخدمنا بيئة anaconda 'base' من /opt/conda الدليل لبدء تشغيل خادم Jupyterlab.
بالإضافة إلى ذلك، أنشأنا بيئة منفصلة في $HOME الدليل، ولكن هذا يضيف نواة من نوع الـ قاعدة بيئة conda إلى قوائم النواة.

لحل هذه المشكلة قمنا بتثبيت nb_conda_kernels لإدارة نواة Jupyter. قمنا بتهيئة السكربت الأولي لضمان ظهور بيئات بايثون المستمرة فقط في قائمة النواة.

jupyter lab \ ... --CondaKernelSpecManager.conda_only=True \ --CondaKernelSpecManager.name_format={environment} \ --CondaKernelSpecManager.env_filter=/opt/conda/*"

بهذا، نضمن أن خادم الدفتر سيبدأ دائمًا بأي تغييرات يجريها المستخدم داخل الدفتر.
كما يسهل إدارة النواة المتعددة. ما عليك سوى إنشاء بيئة conda جديدة باستخدام الأمر conda create -n myenv ويبدأ بالظهور في قائمة النواة.

إضافة دعم Code-Server

بينما تحل دفاتر Jupyter عددًا من المشكلات، هناك عدد من المهام التي تتوقف عندها عن المساعدة:

تطوير خدمة مثل تطبيق Gradio أو Streamlit بسيط. يمكن للمستخدمين كتابة وتشغيل الكود في Jupyter Notebook، لكن لا يمكنهم عرض الخدمة في المتصفح.
إذا كان المستخدم يعمل على مشروع يحتوي على كود مجمع في ملفات متعددة، فإن بيئة Jupyterlab ليست مناسبة لهذا التطوير.

بالنظر إلى هذه القيود، قررنا إيجاد حل لها. أضفنا دعم code-server لتوفير تجربة بيئة تطوير متكاملة (IDE) للمستخدمين في المتصفح.

بإضافة دعم VS Code، نمكّن المستخدمين من القيام بالأمور التالية:

طور واختبر الخدمات على متصفحك مباشرةً باستخدام وكيل إعادة توجيه المنافذ (port-forwarding proxy) الخاص بـ VS Code. هذا يعني أن خدماتك المنشورة localhost:8000 يمكن إتاحتها على ${NOTEBOOK_URL}/proxy/8000
أدر التعليمات البرمجية بسهولة باستخدام الحزم المناسبة، حيث يوفر ذلك تجربة بيئة تطوير متكاملة (IDE) كاملة.
استفد من جميع إضافات VS Code الشهيرة لتعزيز الإنتاجية.
صحح أخطاء التطبيقات بسهولة عن طريق تشغيلها في وضع التصحيح وتطبيق نقاط التوقف (breakpoints).

تم ذلك عن طريق إضافة صورة Docker أخرى. فيما يلي رسم بياني يوضح صور Docker الخاصة بـ Truefoundry.

صور دفاتر الملاحظات (Notebook) الخاصة بـ Truefoundry

الوصول عبر SSH إلى دفتر الملاحظات/VSCode الخاص بك:

بينما في معظم الحالات، يمكن لـ VS Code المستضاف حل المشكلة. ولكن قد تكون هناك حالات (خاصة لدفاتر ملاحظات Jupyter) حيث يواجه المستخدم صعوبة ويحتاج إلى وصول مباشر إلى الحاوية التي تشغل دفتر ملاحظات Jupyter / خادم VS Code الخاص به.
لذلك قمنا بتبسيط ذلك عن طريق تثبيت خادم SSH في كل دفتر ملاحظات، وللاتصال بحاويتك، تحتاج إلى تنفيذ أمر بسيط وإدخال كلمة المرور الخاصة بك:

ssh -p 2222 jovyan@test-notebook.ctl.truefoundry.tech

الوصول إلى حاوية دفتر ملاحظات باستخدام SSH

يمكن تعزيز قوة هذه الأداة باستخدام إضافة VS Code الخاصة بك المسماة Remote Explorer حيث يمكنك فتح جميع الملفات مباشرةً داخل VS Code الخاص بك!
انقر هنا لقراءة المزيد عنه

تجربة المستخدم النهائية:

مع جميع الميزات المدمجة في حل دفاتر الملاحظات لدينا، إليك شكل نموذج نشر دفاتر الملاحظات الخاص بنا:

مقارنة الأسعار

أخيرًا، لنقارن أسعار كل من الحلول المدارة مع Truefoundry.

بما أن Truefoundry يعمل بالنشر على سحابة العميل عن طريق ربط مجموعة Kubernetes الخاصة به، فإليك أسعار Truefoundry عند تشغيله على مزودي الخدمات السحابية المختلفين.

مع Truefoundry، يمكنك بالفعل توفير الكثير من التكاليف لأن:

نحن لا نفرض أي رسوم إضافية على تكلفة الأجهزة الافتراضية
نحن ندعم تشغيل دفاتر الملاحظات على مثيلات فورية (spot instances) لأعباء العمل غير الإنتاجية، مما يوفر توفيرًا إضافيًا قدره 50-60% في تكاليف السحابة.

الخاتمة

كان هذا موجزًا عن جهودنا في بناء حل دفاتر الملاحظات. يمكنك الانضمام إلى أصدقاء Truefoundry قناة Slack الخاصة بنا إذا كنت ترغب في مناقشة نهجنا بعمق أو إذا كان لديك أي اقتراحات.

إذا كنت ترغب في تجربة منصتنا، يمكنك التسجيل هنا!

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now