ما هي هندسة الأوامر؟

By TrueFoundry

Published: July 4, 2026

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Introduction

A prompt is just like a clear instruction or a set of instructions you give to a tool or person. Whether it's a keyword you type into a search engine, a command for a computer program, or a question you ask a friend, prompts help them understand what you're looking for or want them to do.

Prompt engineering, the art and science of crafting effective prompts, has become increasingly essential with the rise in popularity of Large Language Models (LLMs) as it enables utilization of the full capabilities of LLMs.

This article will help you master prompt engineering through the lens of Language Models.‍

Prompts and LLMs

While working on prompt engineering, you generally use an API to interact with the LLM. These APIs consist of a set of hyperparameters that can be adjusted to achieve desired outputs. In this discussion, we will examine the Hugging Face Inference API (as depicted in the image below) and explore the importance of each parameter.

from huggingface_hub import InferenceClient # HF Inference Endpoints parameter endpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud" hf_token = "hf_YOUR_TOKEN" # Streaming Client client = InferenceClient(endpoint_url, token=hf_token) # Generation parameters gen_kwargs = { "max_new_tokens": 512, "top_k": 50, # Adjusting top-k sampling parameter "top_p": 0.8, # Adjusting nucleus sampling parameter "temperature": 0.5, # Adjusting temperature for randomness "repetition_penalty": 1.5, # Adjusting repetition penalty to avoid repetitive responses "stop_sequences": ["\nUser:", "", "</s>"], } # Prompt prompt = "What are the effects of climate change on" # Text generation stream = client.text_generation(prompt, stream=True, details=True, **gen_kwargs)

As mentioned above, different hyperparameters can be adjusted to influence the quality and diversity of generated text. Let’s take a closer look at the various hyperparameters included in the gen_kwards property above - ‍

Hyperparameters

Temperature

It's like adjusting the spice level in your cooking - higher temperature means more randomness, like adding spice for flavour, while lower temperature keeps things predictable, like sticking to a recipe.For example, in creative writing tasks like generating poetry or brainstorming story ideas, a higher temperature setting can result in more diverse and imaginative text.

Top_k

Think of it as narrowing down choices in a library to the most popular books. It selects the most probable tokens during text generation, refining the output.Consider a customer service chatbot that assists users with common queries. By setting a top_k parameter, the chatbot can prioritize responses based on the most relevant information, ensuring that users receive accurate and helpful assistance without being overwhelmed by unnecessary details.

Top_p

Top_p sets a limit on the tokens considered by choosing tokens until a cumulative probability of p is reached. Both top_k and top_p are used to control diversity and quality.

Max_new_tokens

It's like setting a word limit for an essay. Max_new_tokens determines how much text the model can generate, keeping it within a specified length.For instance, if you're generating responses for a chatbot, setting a maximum token limit ensures that the responses remain concise and relevant to the conversation context or you can increase

Repetition_penalty

Repetition_penalty discourages the model from reusing tokens, promoting diversity in the generated text.In a conversational AI application, such as a virtual assistant, setting a repetition_penalty ensures that the assistant's responses remain varied and natural during extended interactions

Frequency Penalty

Frequency Penalty encourages the model to explore less common tokens, making the text more unique.Suppose you're developing a news aggregator app that summarizes articles from various sources. By applying a frequency penalty, the app can prioritize lesser-known publications or niche topics, providing users with a diverse range of perspectives.

Presence Penalty

Presence Penalty guides the model to generate text that aligns with specific criteria or avoids certain topics, ensuring relevance.In a content moderation system for online forums, setting a presence penalty helps filter out inappropriate or offensive language. For example, if a user attempts to post discriminatory comments, the presence penalty would guide the system to generate a warning message.

General tips for writing better prompts

Starting with Simplicity

You should begin by crafting simple straightforward prompts and gradually introduce complexity through refinement, rather than burdening all the information at the beginning. So while dealing with a big task, try to break it down into subsets.

Clear Guidance

Commands should be clear and explicit. For example,

Poor prompt: " The quick brown fox jumps over the lazy dog, Translate this."

Better prompt: "Translate the following English text into Spanish: 'The quick brown fox jumps over the lazy dog.'"

Being Specific

Enhance prompt clarity by including relevant examples and detailed instructions.

Poor prompt: “Write about social media and its effects.”

Better prompt: “Write a 500-word essay discussing social media's impact on teenagers' mental health. Include statistics from reputable sources such as the American Psychological Association and provide real-life examples of individuals affected by excessive social media use. “

Avoiding Confusion

Ensure prompts are clear and direct to prevent ambiguity in model responses.

Continuous Improvement

Iteratively refine prompts based on feedback and performance to optimise results.

Tools, frameworks, and libraries for prompt engineering in LLMs

Choosing the best prompt engineering tools depends on whether you are optimizing prompts for experimentation, production deployment, or model-specific workflows.

Hugging Face Transformers

It provides interfaces for fine-tuning models on specific tasks and allows for the creation of custom prompts.

OpenAI Codex API

OpenAI's API includes capabilities for prompt engineering, allowing users to generate code or text based on prompts.

GPT-3 Playground

Platforms like the GPT-3 Playground offer interactive environments for experimenting with prompts and observing model responses in real-time.

Google's T5 Framework

T5 (Text-To-Text Transfer Transformer) is a framework that casts all NLP tasks into a text-to-text format, simplifying prompt engineering for various tasks.

AllenNLP

AllenNLP provides tools for building and evaluating NLP models, including mechanisms for prompt design and evaluation.

Advanced Techniques for Prompt Engineering

In this section, I will talk about some SOTA (State-of-the-art) techniques that are implemented under the umbrella of Prompt Engineering in natural language processing (NLP) and language model fine-tuning. There may be slight overlap with the methods mentioned above, consider this to be a more formal and structured approach to understanding the techniques:

Zero-shot Prompting

As evident by the term ‘zero-shot’, here the inference is direct without providing additional context to the LLM. Instead, it generates results based on its pre-existing knowledge.

Prompt:

Who was the first man in space?

Output:

The first man in space was Yuri Gagarin, a Soviet cosmonaut.

Few-shot Prompting

When it comes to Few-shot Prompting, it's all about giving the Language Model (LLM) a bit of a heads-up before asking the question. This extra context serves two big purposes: first off, it helps the LLM grasp what's going on, so it can give a more spot-on response. Secondly, it enables the LLM to learn new stuff it hasn't come across before, keeping its knowledge fresh and adaptable.

Prompt:

Compose a poem with three stanzas, each focusing on a different theme: love, nature, and solitude.

Output:

In your embrace, I find my solace, A symphony of hearts beating as one, Lost in the dance of passion's embrace. Behold the whispering trees, their leaves aglow, Nature's canvas painted with hues divine, Serenity is found in the murmuring flow. In the quietude of the night, I am alone, A solitary soul under the starry sky, Finding solace in the silence, all on my own.

Prompt Chaining

Prompt chaining is a powerful technique for breaking down complex tasks into more manageable steps. By feeding the output of one prompt into the input of another, you can streamline processes and tackle intricate problems efficiently.

This approach offers several advantages: simplifying instructions, pinpointing areas of difficulty for focused troubleshooting, and better validation of results. Additionally, prompt chaining enables the parallel execution of tasks, allowing for the simultaneous processing of multiple subtasks.

Prompt-1:

يوجد روتين تمارين، موجود على الموقع الإلكتروني <link>:

يرجى تحديد أي تمارين مذكورة على الموقع الإلكتروني. قدمها كاقتباسات مباشرة، محاطة بعلامات XML <quotes></quotes>. إذا لم تكن هناك تمارين، يرجى قول "لم تُذكر أي تمارين".

المخرجات-1:

<quotes> "10 تمارين ضغط" "20 تمرين قرفصاء" </quotes>

الموجه-2:

فيما يلي التمارين المذكورة في الروتين، محاطة بعلامات XML <quotes></quotes>:

<quotes> "10 تمارين ضغط" "20 تمرين قرفصاء" </quotes>

يرجى استخدام هذه التمارين لإنشاء روتين تمارين. تأكد من أن تعليماتك واضحة وسهلة الاتباع.

المخرجات-2:

# ينشئ خطة تمارين شاملة

توجيه سلسلة التفكير

تقنية توجيه سلسلة التفكير تحاكي السيناريو الذي يُعطى فيه الطالب مشكلة مثال، ثم يُطلب منه حل مشكلات مماثلة.

الموجه:

س: لدى سالي 50 تفاحة. أعطت 15 منها لصديقتها ثم اشترت ثلاثة أضعاف العدد الذي أعطته. كم تفاحة لدى سالي الآن؟

ج: بدأت سالي بـ 50 تفاحة. بعد أن أعطت 15 تفاحة، تبقى لديها 50 - 15 = 35 تفاحة. ثم اشترت ثلاثة أضعاف العدد الذي أعطته، وهو 3 * 15 = 45 تفاحة. بإضافة التفاح الذي اشترته إلى ما تبقى لديها، أصبح لدى سالي الآن 35 + 45 = 80 تفاحة. لذلك، لدى سالي 80 تفاحة. س: لدى جو 20 بيضة. يشتري كرتونين إضافيين من البيض. يحتوي كل كرتون على 12 بيضة. كم بيضة لدى جو الآن؟

الناتج:

ج: بدأ جو بـ 20 بيضة. كرتونان من 12 بيضة يساويان 24 بيضة. 20 + 24 = 44. لذلك، لدى جو 44 بيضة، والإجابة هي 44.

في الحالات التي يكون لديك فيها أمثلة قليلة أو لا توجد أمثلة، فإن إضافة عبارة مثل "دعنا نفكر خطوة بخطوة" إلى الموجه الأصلي يكون فعالاً في تحسين أداء النموذج.

سلسلة الأفكار التلقائية (Auto-CoT)

تقوم تلقائيًا بإنشاء أمثلة توضح لنموذج اللغة الكبير (LLM) كيفية حل المشكلات. تُسمى هذه الأمثلة "توضيحات" ويتم إنشاؤها عن طريق حث نموذج اللغة الكبير على التعبير عن عملية تفكيره وتوضيح كيفية تعامله مع المشكلة.

كيف تعمل سلسلة الأفكار التلقائية (Auto-CoT)؟

تعمل سلسلة الأفكار التلقائية (Auto-CoT) على مرحلتين:

تجميع الأسئلة: أولاً، تقوم بتجميع الأسئلة المتشابهة. يساعد هذا على ضمان أن التوضيحات تغطي مجموعة واسعة من أنواع المشكلات المختلفة.
أخذ عينات التوضيحات: ثم، تختار سؤالاً تمثيليًا من كل مجموعة وتطلب من نموذج اللغة الكبير إنشاء توضيح له. يقوم نموذج اللغة الكبير بذلك عن طريق التفكير بصوت عالٍ وشرح منطقه خطوة بخطوة.

الضبط الدقيق لنماذج اللغة الكبيرة (LLMs)

غالبًا لا يُعتبر الضبط الدقيق جزءًا من هندسة الموجهات، ولكنه على غرار هندسة الموجهات، هو طريقة لتكييف نماذج اللغة الكبيرة (LLMs) مع مهام محددة.
يتضمن تدريب النموذج المدرب بالفعل على مجموعة بياناتنا المتخصصة والموسومة، وبالتالي تعديل معلماته. بينما غالبًا ما يتم تعديل الطبقات الأخيرة لتناسب البيانات الجديدة، يمكن أن يتضمن الضبط الدقيق تعديل المعلمات عبر طبقات متعددة لالتقاط الميزات الخاصة بالمجال بشكل أفضل مع الاحتفاظ بالمعرفة المكتسبة من التدريب الأصلي.

تقليديًا، كان الضبط الدقيق عملية معقدة وتتطلب الكثير من الموارد، حيث كانت تحتاج إلى أجهزة قوية، وخبرة في تعلم الآلة، وكميات كبيرة من البيانات الموسومة.

ومع ذلك، الآن مع منصات مثل Hugging Face، التي توفر نماذج مدربة مسبقًا وخطوط أنابيب ضبط دقيق سهلة الاستخدام، أصبح الضبط الدقيق أكثر سهولة وكفاءة. من خلال دمج قدرات Hugging Face مع أساليب الضبط الدقيق التقليدية، يمكننا الاستفادة من النماذج المدربة مسبقًا كنقاط بداية، مما يقلل الحاجة إلى كميات هائلة من البيانات الموسومة والخبرة.

توفر Truefoundry أيضًا إمكانية الضبط الدقيق لنماذج اللغة الكبيرة الخاصة بك، فبواجهتها البديهية والبسيطة، يمكنك ضبط نماذجك بدقة في 3 خطوات سهلة:

ربط قاعدة بياناتك
مقارنة مهام الضبط الدقيق واختيار الأنسب لاحتياجاتك.
انشر نموذجك المضبوط بدقة.

التوليد المعزز بالاسترجاع (RAG)

في RAG، يُستخدم الاسترجاع كمكون إلى جانب التوليد لتحسين أداء النموذج في مهام مثل الإجابة على الأسئلة وتوليد النصوص. يتكيف RAG مع السيناريوهات التي تتغير فيها الحقائق، وهو أمر قيّم لأن المعرفة الثابتة لنماذج اللغات الكبيرة لا يمكنها مواكبة التطورات. يتيح RAG لنماذج اللغة تجاوز إعادة التدريب، والوصول إلى أحدث المعلومات من خلال التوليد القائم على الاسترجاع لإنتاج مخرجات موثوقة.

في السنوات الأخيرة، تطورت أنظمة RAG من RAG الساذج الأساسي إلى نماذج RAG المتقدمة وRAG المعياري الأكثر تطوراً.

يسترجع RAG الساذج المعلومات بناءً على مدخلات المستخدم، لكنه يعاني من مشكلات في الدقة بسبب البيانات القديمة والاستجابات غير ذات الصلة. يعمل RAG المتقدم على تحسين ذلك من خلال الضبط الدقيق لعملية الاسترجاع، مما يجعله أكثر دقة وملاءمة.

يذهب RAG المعياري أبعد من ذلك من خلال تقديم وحدات قابلة للتخصيص مثل البحث والذاكرة، مما يتيح مرونة في حل مشكلات محددة. بشكل عام، تهدف هذه التطورات إلى جعل أنظمة المحادثة أكثر ذكاءً وموثوقية من خلال إدارة أفضل لاسترجاع المعلومات وتوليد الاستجابات.

Truefoundry كما تقدم واجهة شاملة لـ RAG مع القدرة على التكامل مع أي مخزن بيانات وصفية، أو تضمينات، أو نماذج لغات كبيرة.

التعلم المعزز من التغذية الراجعة البشرية (RLHF)

لفترة طويلة، بدت فكرة تدريب نموذج لغوي باستخدام التعلم المعزز غير قابلة للتحقيق بسبب التحديات الهندسية والخوارزمية على حد سواء. سيتطلب فهم الجوانب الفنية لـ RLHF معرفة مسبقة بمتطلبات التعلم المعزز المختلفة، لذا سأحاول إبقاء الشرح عامًا جدًا.

لنفترض مشكلة حيث هدفنا هو تدريب روبوت على التنقل في متاهة. تقليديًا في التعلم المعزز (RL)، يهدف الروبوت إلى الوصول إلى هدفه بسرعة ويتلقى تغذية راجعة بناءً على مدى أدائه في المتاهة. لكن التعلم المعزز من التغذية الراجعة البشرية (RLHF) يخطو خطوة أبعد من خلال السماح للبشر بتقديم مدخلات إضافية. يمكنهم التعليق على أكثر من مجرد السرعة، مثل ما إذا كان الروبوت يتجنب العوائق أو يسلك مسارًا يبدو جيدًا.

على سبيل المثال، إذا اختار الروبوت مسارًا يتجنب العوائق أو يتبع طريقًا يفضله البشر، فقد يحصل على بعض النقاط الإضافية. وبهذه الطريقة، لا يتعلم الروبوت الوصول إلى الهدف بسرعة فحسب، بل يتعلم أيضًا مراعاة ما يفضله البشر.

‍

في هندسة الأوامر لنماذج اللغات الكبيرة (LLMs)، يعتبر RLHF مفيدًا جدًا. فهو يضمن أن تصبح الأوامر أفضل في الحصول على الاستجابات التي نريدها، ويحسن جودة الأوامر من خلال المراجعات البشرية، ويتيح لنا تخصيص الأوامر لتناسب تفضيلاتنا، ويواكب التغيرات في ما هو شائع بمرور الوقت. من خلال تضمين المدخلات البشرية، يساعد ذلك على ضمان أن تكون النتائج أقرب إلى ما نبحث عنه، عبر المهام والمجالات المختلفة.

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now