Blank white background with no objects or features visible.

تعلن TrueFoundry عن استحواذها على Seldon AI، موسعة بذلك لوحة التحكم الخاصة بها للذكاء الاصطناعي للمؤسسات. البيان الصحفي الكامل →

حاويات خادم SSH للتطوير على Kubernetes

By شيراغ جاين

Published: July 4, 2026

Last year, we discussed running Hosted Jupyter Notebooks and VS Code (Code Server) on your Kubernetes Clusters. We compared several approaches and existing solutions including renting and managing VMs. We then described our approaches to address usability issues to make the experience nicer and abstract away Kubernetes details.

Since then we have received lots of feedback from our customers, primarily the lack of a better development experience for full-fledged apps. While Jupyter Lab is great for interactive notebooks and lightweight editing, pushing it to full IDE capabilities might require a lot of fiddling with Jupyter Extensions and it still might not be a great experience for non-Python codebases. To address such shortcomings we launched Code Server support which is more or less VS Code in the browser. While it is a zero setup solution and eases out many Developer Experience problems with Jupyter Lab, users reported friction when working with VS Code extensions.

For example, Pylance, the extension that provides excellent Python language support in VS Code cannot be installed on Code Server because of a Microsoft proprietary license. Instead, users have to rely on a combination of Jedi and Pyright which are still not up to the mark with Pylance. Yet another example is Github CoPilot - while it is possible to install it on Code Server it requires manually fiddling with the extension file and upgrading the Code Server version.

While the Code Server editing experience is not bad, sometimes there is a noticeable lag and jumbled-up text in the terminal which can be annoying. We always knew connecting local VS Code with remote VS Code Server via SSH or Tunnels would be a better experience.

The goal is to allow the users to create a deployment running OpenSSH Server in an Ubuntu-based container image with the same disk persistence as Jupyter Lab and let the users connect to it.

Here is how it looks on the platform (docs):

In this post, we'll walk over how we implemented connecting to containers via SSH  without providing direct access to the cluster and without sending traffic outside the VPC.

Istio and Routing

When we deploy applications, we usually configure a domain name to reach those services, but buying a domain for each application is prohibitively expensive. Instead we buy a single domain (e.g. acmecorp.com), configure subdomains (docs.acmecorp.com) and/or path prefixes (acmecorp.com/blog/) and then use a router to match rules and route traffic to different applications.

We use Istio for all our ingress routing. Istio, among many features, offers convenient abstractions to configure the underlying Envoy proxy that is actually handling all the routing.

Let's understand how an HTTP request is routed

In the above when a user tries to fetch https://myapp.acmecorp.com/api/v1

  1. First, *.acmecorp.com is DNS resolved to the public IP of the external load balancer. The Port is inferred as 443 because of https protocol.
  2. A TCP connection is established to the load balancer and HTTP request payload is sent
  3. Load Balancer routes the request payload to Istio Ingress Pods
  4. Istio Ingress looks at all VirtualServicesconfigurations (and Gateway) configuration, matches the hostname (more importantly the subdomain myapp) and path prefix and routes to the corresponding Kubernetes Service
  5. Kubernetes routes the request to one of the Endpoints (Pod) for the Service

This is called Layer 7 Routing because we use actual fields from HTTP spec to do routing.

Routing in context of SSH

SSH uses a custom protocol that uses TCP for transport. A simple SSH connection looks something like the following

ssh user@somemachine.acmecorp.com -p 22

Here we are trying to connect to somemachine.acmecorp.com on port 22. Here somemachine.acmecorp.com:22 has to resolve to a unique IP address and port combination to reach the destination. But recall in our setup, all subdomains are configured to point to the same Load balancer - that means abc.acmecorp.com, xyz.acmecorp.com, somemachine.acmecorp.com all resolve to the same IP address and then Istio/Envoy is supposed to look at the subdomain and decide where to route. But in the case of SSH, this is not possible because after resolving the IP address and establishing a TCP connection, all Istio sees is the Load balancer IP and port number, the actual contents of the packets are being encrypted by SSH. So how can we route to multiple different SSH destinations in the cluster?

Option 1: Use unique ports on the same LoadBalancer

As we only need to ensure unique combinations of IP address and ports, so we can just assign different ports on the load balancer to unique SSH containers

We can then configure TCP Route port match using Istio

Here, all TCP traffic coming to port 22 of the LoadBalancer will reach Service A and all TCP traffic on port 23 will reach Service B.

While this works well, there are a few limitations

  • A max of 65,535 SSH containers can be reached behind a single Load balancer. This is not a big deal because realistically we don't expect these many SSH containers deployed at the same time.
  • The trickier problem is to accurately dynamically open up and free up ports on the external load balancer without ever disrupting any other normal traffic. While certainly possible, any bug or race condition could cause serious downtimes for other applications. Not to mention opening up arbitrary ports is a major security risk for many of our customers.

Option 2: Use a new LoadBalancer for each SSH container

In this case, we have explicitly pointed abc.acmecorp.com and xyz.acmecorp.com to two different external Load Balancers instead of wildcard *.acmecorp.com. Now they point to a unique IP address each and can be routed by two different Istio Gateway (one-to-one linked with an external load balancer). The obvious limitation here is provisioning a new load balancer per SSH container becomes prohibitively expensive.

Is there any way to take advantage of HTTP-level routing yet still only work with TCP traffic? Enter HTTP CONNECT!

Proxying using HTTP CONNECT

The HTTP CONNECT method allows establishing a "tunnel" between two destinations via a Proxy. Imagine the old days of telephone switchboards - you want to call a number but don't have a direct line to reach it, instead, an operator in between facilitates the connection on your behalf and then gets out of the way to let the two parties communicate.

Telephone switchboard - Wikipedia

We recommend watching the following video for a good explanation: https://www.youtube.com/watch?v=PAJ5kK50qp8

Fortunately, in our case, we already use a proxy capable of using CONNECT - Envoy Proxy. Let's look at how it would work in our use case:

  1. The client opens a connection to acmecorp.com:80 - the external load balancer which routes traffic to Envoy.
  2. The client sends an HTTP CONNECT request

CONNECT svc-a.ns.cluster.svc.local:80 HTTP/1.1
Host: svc-a.ns.cluster.svc.local

which is instructing Envoy to establish a TCP Connection to svc-a.ns.cluster.svc.local:80 on their behalf

  1. Once a connection is established, A 200 OK is returned to the client.
  2. After this point, Envoy stops caring about the traffic contents and acts as a "tunnel" allowing the traffic to flow between the client and the pod. It can be anything that works on top of TCP including but not limited to SSH.

Note that svc-a.ns.cluster.svc.local:80 is a Kubernetes Service and does not point to any public IP address, rather it can only be resolved inside the Kubernetes Cluster. Since Envoy lives inside the cluster we can configure it to reach the pods behind it.

All that is left is to configure Envoy to do such routing. Unfortunately, Istio does not have high-level abstractions to configure this easily instead we have to apply patches to Envoy configuration using Envoy Filters

Envoy Filters

Understanding Envoy capabilities and Envoy Filters is out of the scope of this blog post but just take it as a convenient way to modify Istio routing rules using small patches. To enable CONNECT based routing we need to

  1. Have a publicly exposed port on LoadBalancer to accept TCP traffic (for e.g. say 2222) and configure the corresponding Istio Gateway to accept HTTP traffic. We chose to stick with port 80 because we already use it for normal HTTP traffic and SSH traffic is going to be encrypted anyway.
  2. Configure the publicly exposed port on Gateway to accept CONNECT type requests. We found this is already enabled for requests to Port 80. For any other port, you can apply an Envoy Filter like so:
   E.g. Enable CONNECT on Port 2222apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
spec:
 configPatches:
   - applyTo: NETWORK_FILTER
     match:
       context: GATEWAY
       listener:
         filterChain:
           filter:
             name: envoy.filters.network.http_connection_manager
         portNumber: 2222
     patch:
       operation: MERGE
       value:
         typed_config:
           '@type': >-
             type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
           http2_protocol_options:
             allow_connect: true
           upgrade_configs:
             - upgrade_type: CONNECT
 workloadSelector:
   labels:
     app: tfy-istio-ingress
  1. لكل حاوية SSH، قم بتكوين التوجيه المستند إلى CONNECT:

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
 name: svc-a-ns-ssh-envoy-filter
 namespace: istio-system
spec:
 configPatches:
   - applyTo: NETWORK_FILTER
     match:
       context: GATEWAY
       listener:
         filterChain:
           filter:
             name: envoy.filters.network.http_connection_manager
         portNumber: 80
     patch:
       operation: MERGE
       value:
         typed_config:
           '@type': >-
             type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
           route_config:
             name: local_route
             virtual_hosts:
               - domains:
                   - svc-a.ns.svc.cluster.local:80
                 name: svc-a-ns-ssh-vh
                 routes:
                   - match:
                       connect_matcher: {}
                     route:
                       cluster: >-
                         outbound|80||svc-a.ns.svc.cluster.local
                       upgrade_configs:
                         - connect_config: {}
                           enabled: true
                           upgrade_type: CONNECT
 workloadSelector:
   labels:
     istio: tfy-istio-ingress

يبدو ملف YAML هذا مخيفًا للغاية، لكن كل ما نفعله هو تعديل المستمع على المنفذ 80 في البوابة (Gateway) للمطابقة مع طلبات CONNECT إلى svc-a.ns.svc.cluster.local:80 وتوجيهها إلى outbound|80||svc-a.ns.svc.cluster.local، أي المنفذ 80 لخدمة Kubernetes svc-a.ns.svc.cluster.local حيث ينتظر خادم OpenSSH الخاص بنا اتصالات SSH داخل الحاوية.

بدء اتصال CONNECT من جانب عميل SSH

بحد ذاته، لا يعرف عميل SSH شيئًا عن HTTP CONNECT. بدلاً من ذلك، فإنه يوفر ProxyCommand خيارًا يسمح لبرامج أخرى بتسهيل اتصال SSH. هنا نستخدم ProxyTunnel المشروع الذي يسهل ذلك. يبدو التكوين في ~/.ssh/config كما يلي

Host svc-a-ns
 User jovyan
 HostName svc-a.ns.svc.cluster.local
 منفذ 80
 ServerAliveInterval 100
 IdentityFile ~/.ssh/my-private-key
 ProxyCommand proxytunnel -v -p ssh.acmecorp.com:80 -o %h -d %h:%p

بعد إنجاز كل ذلك، يمكن للمستخدمين الآن الاتصال بسهولة وإعداد سير عمل التطوير المفضل لديهم - سواء كان Neovim أو VS Code أو JetBrains IDE، وما إلى ذلك.

القيود والحلول المحتملة

بينما تعزز هذه الميزة تجربة المطور بشكل كبير من حيث تحرير التعليمات البرمجية وتنفيذها، لا تزال هناك بعض القيود سارية لأننا لا نزال نعمل داخل حاوية.

  • لا يعمل Docker لأننا بالفعل داخل حاوية. نظريًا، من الممكن تشغيل بعض الأشياء باستخدام DIND، ولكنه يأتي مع تحدياته.
  • التغييرات التي تتم على نظام الملفات الجذر للحاوية / ليست دائمة عبر عمليات إعادة تشغيل الحاوية. نحن نوفر طريقة لتوسيع صورة خادم SSH الخاص بنا والبدء من تلك الصور المخصصة.
  • وحدات Kubernetes Pods مصممة لتكون سريعة الزوال ويمكن نقلها، ولكن هذا غير مرغوب فيه لبيئة التطوير. نقوم بتكوين ميزانيات تعطيل الـ Pods لمنع نقل الـ Pod.
  • على الرغم من أن الوكالة شفافة، لا تزال حركة المرور تتدفق عبر موازن التحميل ووحدات Istio Envoy Pods. هذا يعني أن القيام بشيء غريب في التطوير مثل تحميل/تنزيل ملفات ضخمة يمكن أن يستهلك النطاق الترددي والموارد ويؤثر على حركة المرور الأخرى. من الأفضل استخدام مجموعة منفصلة من وحدات LoadBalancer و Gateway و Envoy Pods للاتصال بحاويات SSH.

حاويات خادم SSH على Kubernetes في TrueFoundry

TrueFoundry هي منصة كخدمة (PaaS) لنشر نماذج التعلم الآلي/النماذج اللغوية الكبيرة (ML/LLM) فوق Kubernetes لتسريع سير عمل المطورين مع منحهم مرونة كاملة في اختبار ونشر النماذج، وضمان الأمان والتحكم الكامل لفريق البنية التحتية. من خلال منصتنا، نمكّن الفرق من نشر ومراقبة النماذج في 15 دقيقة بموثوقية 100% وقابلية للتوسع، والقدرة على التراجع في ثوانٍ - مما يسمح لهم بتوفير التكلفة وإطلاق النماذج إلى الإنتاج بشكل أسرع، مما يتيح تحقيق قيمة تجارية حقيقية.  

The fastest way to build, govern and scale your AI

Sign Up
Table of Contents

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

The fastest way to build, govern and scale your AI

Book Demo
Summarize with
ChatGPT logo by OpenAI
Perplexity AI logo
Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

Discover More

November 5, 2025
|
5 min read

توطين البيانات في عصر الذكاء الاصطناعي الوكيل: كيف تمكّن بوابات الذكاء الاصطناعي التوسع السيادي والامتثال

October 5, 2023
|
5 min read

<Webinar> عرض الذكاء الاصطناعي التوليدي للمؤسسات

Best Fine Tuning Tools for Model Training
May 3, 2024
|
5 min read

أفضل 6 أدوات ضبط دقيق لتدريب النماذج في عام 2026

May 25, 2023
|
5 min read

النماذج اللغوية الكبيرة مفتوحة المصدر: تبنّها أو تندثر

July 4, 2026
|
5 min read

تكاملات منصة التعلم الآلي #1: Weights & Biases

Use Cases
Engineering and Product
July 4, 2026
|
5 min read

تكامل Pillar Security مع TrueFoundry

No items found.
July 4, 2026
|
5 min read

التخزين المؤقت الدلالي لنماذج اللغة الكبيرة (LLMs): تقليل التكلفة وزمن الاستجابة بما يتجاوز التخزين المؤقت للبادئات

No items found.
July 4, 2026
|
5 min read

تكاملات أدوات التعلم الآلي #2 DVC لإدارة إصدارات بياناتك

Engineering and Product
Use Cases
No items found.

Recent Blogs

Black left pointing arrow symbol on white background, directional indicator.
Black left pointing arrow symbol on white background, directional indicator.
Take a quick product tour
Start Product Tour
Product Tour