A prompt is just like a clear instruction or a set of instructions you give to a tool or person. Whether it's a keyword you type into a search engine, a command for a computer program, or a question you ask a friend, prompts help them understand what you're looking for or want them to do.
Prompt engineering, the art and science of crafting effective prompts, has become increasingly essential with the rise in popularity of Large Language Models (LLMs) as it enables utilization of the full capabilities of LLMs.
This article will help you master prompt engineering through the lens of Language Models.
While working on prompt engineering, you generally use an API to interact with the LLM. These APIs consist of a set of hyperparameters that can be adjusted to achieve desired outputs. In this discussion, we will examine the Hugging Face Inference API (as depicted in the image below) and explore the importance of each parameter.
from huggingface_hub import InferenceClient# HF Inference Endpoints parameterendpoint_url = "https://YOUR_ENDPOINT.endpoints.huggingface.cloud"hf_token = "hf_YOUR_TOKEN"# Streaming Clientclient = InferenceClient(endpoint_url, token=hf_token)# Generation parametersgen_kwargs = { "max_new_tokens": 512, "top_k": 50, # Adjusting top-k sampling parameter "top_p": 0.8, # Adjusting nucleus sampling parameter "temperature": 0.5, # Adjusting temperature for randomness "repetition_penalty": 1.5, # Adjusting repetition penalty to avoid repetitive responses "stop_sequences": ["\nUser:", "", "</s>"],}# Promptprompt = "What are the effects of climate change on"# Text generationstream = client.text_generation(prompt, stream=True, details=True, **gen_kwargs)
As mentioned above, different hyperparameters can be adjusted to influence the quality and diversity of generated text. Let’s take a closer look at the various hyperparameters included in the gen_kwards property above -
It's like adjusting the spice level in your cooking - higher temperature means more randomness, like adding spice for flavour, while lower temperature keeps things predictable, like sticking to a recipe.For example, in creative writing tasks like generating poetry or brainstorming story ideas, a higher temperature setting can result in more diverse and imaginative text.
Think of it as narrowing down choices in a library to the most popular books. It selects the most probable tokens during text generation, refining the output.Consider a customer service chatbot that assists users with common queries. By setting a top_k parameter, the chatbot can prioritize responses based on the most relevant information, ensuring that users receive accurate and helpful assistance without being overwhelmed by unnecessary details.
Top_p sets a limit on the tokens considered by choosing tokens until a cumulative probability of p is reached. Both top_k and top_p are used to control diversity and quality.
It's like setting a word limit for an essay. Max_new_tokens determines how much text the model can generate, keeping it within a specified length.For instance, if you're generating responses for a chatbot, setting a maximum token limit ensures that the responses remain concise and relevant to the conversation context or you can increase
Repetition_penalty discourages the model from reusing tokens, promoting diversity in the generated text.In a conversational AI application, such as a virtual assistant, setting a repetition_penalty ensures that the assistant's responses remain varied and natural during extended interactions
Frequency Penalty encourages the model to explore less common tokens, making the text more unique.Suppose you're developing a news aggregator app that summarizes articles from various sources. By applying a frequency penalty, the app can prioritize lesser-known publications or niche topics, providing users with a diverse range of perspectives.
Presence Penalty guides the model to generate text that aligns with specific criteria or avoids certain topics, ensuring relevance.In a content moderation system for online forums, setting a presence penalty helps filter out inappropriate or offensive language. For example, if a user attempts to post discriminatory comments, the presence penalty would guide the system to generate a warning message.
You should begin by crafting simple straightforward prompts and gradually introduce complexity through refinement, rather than burdening all the information at the beginning. So while dealing with a big task, try to break it down into subsets.
Commands should be clear and explicit. For example,
Poor prompt: " The quick brown fox jumps over the lazy dog, Translate this."
Better prompt: "Translate the following English text into Spanish: 'The quick brown fox jumps over the lazy dog.'"
Enhance prompt clarity by including relevant examples and detailed instructions.
Poor prompt: “Write about social media and its effects.”
Better prompt: “Write a 500-word essay discussing social media's impact on teenagers' mental health. Include statistics from reputable sources such as the American Psychological Association and provide real-life examples of individuals affected by excessive social media use. “
Ensure prompts are clear and direct to prevent ambiguity in model responses.
Iteratively refine prompts based on feedback and performance to optimise results.
It provides interfaces for fine-tuning models on specific tasks and allows for the creation of custom prompts.
OpenAI's API includes capabilities for prompt engineering, allowing users to generate code or text based on prompts.
Platforms like the GPT-3 Playground offer interactive environments for experimenting with prompts and observing model responses in real-time.
T5 (Text-To-Text Transfer Transformer) is a framework that casts all NLP tasks into a text-to-text format, simplifying prompt engineering for various tasks.
AllenNLP provides tools for building and evaluating NLP models, including mechanisms for prompt design and evaluation.
In this section, I will talk about some SOTA (State-of-the-art) techniques that are implemented under the umbrella of Prompt Engineering in natural language processing (NLP) and language model fine-tuning. There may be slight overlap with the methods mentioned above, consider this to be a more formal and structured approach to understanding the techniques:
As evident by the term ‘zero-shot’, here the inference is direct without providing additional context to the LLM. Instead, it generates results based on its pre-existing knowledge.
Prompt:
Who was the first man in space?
Output:
The first man in space was Yuri Gagarin, a Soviet cosmonaut.
When it comes to Few-shot Prompting, it's all about giving the Language Model (LLM) a bit of a heads-up before asking the question. This extra context serves two big purposes: first off, it helps the LLM grasp what's going on, so it can give a more spot-on response. Secondly, it enables the LLM to learn new stuff it hasn't come across before, keeping its knowledge fresh and adaptable.
Compose a poem with three stanzas, each focusing on a different theme: love, nature, and solitude.
In your embrace, I find my solace,A symphony of hearts beating as one,Lost in the dance of passion's embrace.Behold the whispering trees, their leaves aglow,Nature's canvas painted with hues divine,Serenity is found in the murmuring flow.In the quietude of the night, I am alone,A solitary soul under the starry sky,Finding solace in the silence, all on my own.
Prompt chaining is a powerful technique for breaking down complex tasks into more manageable steps. By feeding the output of one prompt into the input of another, you can streamline processes and tackle intricate problems efficiently.
This approach offers several advantages: simplifying instructions, pinpointing areas of difficulty for focused troubleshooting, and better validation of results. Additionally, prompt chaining enables the parallel execution of tasks, allowing for the simultaneous processing of multiple subtasks.
Prompt-1:
There is a workout routine, present on the website <link>:
Please identify any exercises mentioned on the website. Provide them as direct quotes, enclosed in <quotes></quotes> XML tags. If there are no exercises, please say "There are no exercises mentioned.”
Output-1:
<quotes>"10 push-ups""20 squats"</quotes>
Prompt-2:
Here are exercises mentioned in the routine, enclosed in <quotes></quotes> XML tags:
Please use these exercises to construct a workout routine. Ensure that your instructions are clear and easy to follow.
Output-2:
# Generates a comprehensive workout plan
The Chain-of-Thought Prompting technique mirrors the scenario where a student is given an example problem and subsequently challenged to solve similar problems.
Q: Sally has 50 apples. She gives 15 to her friend and then buys three times as many as she gave away. How many apples does Sally have now?
A: Sally started with 50 apples. After giving away 15, she has 50 - 15 = 35 apples left. Then she buys three times as many as she gave away, which is 3 * 15 = 45 apples. Adding the apples she bought to what she had left, Sally now has 35 + 45 = 80 apples. Therefore, Sally has 80 apples.Q: Joe has 20 eggs. He buys 2 more cartons of eggs. Each carton contains 12 eggs. How many eggs does Joe have now?
A: Joe started with 20 eggs. 2 cartons of 12 eggs is 24 eggs. 20 + 24 = 44. Therefore, Joe has 44 eggs, and the answer is 44.
In situations when you have fewer examples or no examples, adding a phrase like "Let's think step by step" to the original prompt is effective at improving the model's performance.
It automatically generates examples that show the LLM how to solve problems. These examples are called "demonstrations" and they are created by eliciting the LLM to articulate its thought process and elucidate how it would approach a problem.
Auto-CoT works in two stages:
Often fine-tuning is not regarded as part of prompt engineering but similar to prompt engineering it is a method for adapting large language models (LLMs) to specific tasks.It involves training the already trained model to our specialised labelled dataset, thus adjusting its parameter. While the last layers are often adjusted to suit the new data, fine-tuning can involve tweaking parameters across multiple layers to better capture domain-specific features while retaining the knowledge learned from the original training.
Traditionally Fine-Tunning was a complex and resource-intensive process that required powerful hardware, expertise in machine learning and large amounts of labeled data.
However, now with platforms like Hugging Face, which provide pre-trained models and easy-to-use fine-tuning pipelines, fine-tuning has become more accessible and efficient. By integrating the capabilities of Hugging Face with traditional fine-tuning approaches, we can leverage pre-trained models as starting points, reducing the need for vast amounts of labelled data and expertise.
Truefoundry also provides the facility of fine-tuning your LLMs, with its intuitive and simple interface, you can fine-tune your models in 3 simple steps:
In RAG, retrieval is used as a component alongside generation to enhance the model's performance in tasks such as question answering and text generation. RAG is adaptable for scenarios with evolving facts, which is valuable because LLMs' fixed knowledge can't keep up. RAG lets language models skip retraining, accessing the latest information through retrieval-based generation to produce dependable outputs.
In recent years, RAG systems have progressed from basic Naive RAG to more sophisticated Advanced RAG and Modular RAG models.
Naive RAG retrieves information based on user input but struggles with accuracy due to outdated data and irrelevant responses. Advanced RAG improves this by fine-tuning the retrieval process, making it more precise and relevant.
Modular RAG takes it further by offering different customizable modules like search and memory, allowing for flexibility in solving specific problems. Overall, these advancements aim to make conversation systems smarter and more reliable by better managing information retrieval and response generation.
Truefoundry also offers an end-to-end interface for RAG with the ability to integrate with any metadata store, embeddings, or LLM models
For quite a while, the idea of training a language model using reinforcement learning seemed unfeasible due to both engineering and algorithmic challenges. Understanding the technicalities of RLHF will involve various Reinforcement learning prerequisites, So I will try to keep the explanation very general.
Consider a problem where our goal is to train a robot to navigate a maze. Traditionally in Reinforcement Learning (RL), the robot aims to reach its goal quickly and gets feedback based on how well it performs in the maze. But Reinforcement Learning from Human Feedback (RLHF) takes it a step further by letting humans give extra input. They can comment on more than just speed, like whether the robot avoids obstacles or takes a path that looks good.
For example, if the robot picks a path that dodges obstacles or follows a route that humans like, it might get some bonus points. This way, the robot learns not just to reach the goal fast, but also to consider what humans prefer.
In prompt engineering for large language models (LLMs), RLHF is pretty handy. It makes sure prompts get better at getting the responses we want, improves prompt quality with human checks, lets us customize prompts to fit our preferences, and keeps up with changes in what's popular over time. By including human input, it helps make sure the results are closer to what we're looking for, across different tasks and fields.
Join AI/ML leaders for the latest on product, community, and GenAI developments