There is a plethora of Prompting techniques, and plenty of scientific literature that benchmarks their effectiveness. Here, I just want to introduce a few well-known concepts. I believe that once you get the general idea, you will be able to expand your prompting repertoire and even develop and test new techniques yourself.
Ask and it will be given to you
Before going into specific prompting concepts, I would like to stress a general idea that, in my opinion, cannot be stressed enough:
The quality of your prompt highly determines the response of the model.
And by quality I don’t necessarily mean a sophisticated prompt construction. I mean the basic idea of asking a precise question or giving well-structured instructions and providing necessary context. I have touched on this already when we met Sam, the piano player, in my previous article. If you ask a bar piano player to play some random Jazz tune, chances are that he might not play what you had in mind. Instead, if you ask exactly what it is you want to hear, your satisfaction with the result is likely to increase.
Similarly, if you ever had the chance of, say, hire someone to do something around your house and your contract specification only says, say, “bathroom renovation”, you might be surprised that in the end your bathroom does not look like what you had in mind. The contractor, just like the model, will only refer to what he has learned about renovations and bathroom tastes and will take the learned route to deliver.
So here are some general guidelines for prompting:
· Be clear and specific.
· Be complete.
· Provide context.
· Specify the desired output style, length, etc.
This way, the model has sufficient and matching reference data in your prompt that it can relate to when generating its response.
Roleplay prompting — simple, but overrated
In the early days of ChatGPT, the idea of roleplay prompting was all around: Instead of asking the assistant to give you an immediate answer (i.e. a simple query), you first assign it a specific role, such as “teacher” or “consultant” etc. Such a prompt could look like [2]:
From now on, you are an excellent math teacher and always teach your students math problems correctly. And I am one of your students.
It has been shown that this concept yields superior results. One paper reports that by this role play, the model implicitly triggers a step by step reasoning process, which is what you want it to do when applying the CoT- technique, see below. However, this approach has also been shown to sometimes perform sub-optimal and needs to be well designed.
In my experience, simply assigning a role doesn’t do the trick. I have experimented with the example task from the paper referred to above. Unlike in this research, GPT3.5 (which is as of today the free version of OpenAI’s ChatGPT, so you can try it yourself) has given the correct result, using a simple query:
I have also experimented with different logical challenges with both simple queries and roleplay, using a similar prompt like the one above. In my experiments two things happen:
either simple queries provides the correct answer on the first attempt, or
both simple queries and roleplay come up with false, however different answers
Roleplay did not outperform the queries in any of my simple (not scientifically sound) experiments. Hence, I conclude that the models must have improved recently and the impact of roleplay prompting is diminishing.
Looking at different research, and without extensive further own experimenting, I believe that in order to outperform simple queries, roleplay prompts need to be embedded into a sound and thoughtful design to outperform the most basic approaches — or are not valuable at all.
I am happy to read your experiences on this in the comments below.
Few-Shot aka in-context learning
Another intuitive and relatively simple concept is what’s referred to as Few-Shot prompting, also referred to as in-context learning. Unlike in a Zero-Shot Prompt, we not only ask the model to perform a task and expect it to deliver, we additionally provide (“few”) examples of the solutions. Even though you may find this obvious that providing examples leads to better performance, this is quite a remarkable ability: These LLMs are able to in-context learn, i.e. perform new tasks via inference alone by conditioning on a few input-label pairs and making predictions for new inputs [3].
Setting up a few-shot prompt involves
(1) collecting examples of the desired responses, and
(2) writing your prompt with instructions on what to do with these examples.
Let’s look at a typical classification example. Here the model is given several examples of statements that are either positive, neutral or negative judgements. The model’s task is to rate the final statement:
Again, even though this is a simple and intuitive approach, I am sceptical about its value in state-of-the-art language models. In my (again, not scientifically sound) experiments, Few-Shot prompts have not outperformed Zero-Shot in any case. (The model knew already that a drummer who doesn’t keep the time, is a negative experience, without me teaching it…). My finding seems to be consistent with recent research, where even the opposite effect (Zero-Shot outperforming Few-Shot) has been shown [4].
In my opinion and on this empirical background it is worth considering if the cost of design as well as computational, API and latency cost of this approach are a worthwhile investment.
CoT-Prompting or “Let’s think step-by-step’’
Chain of Thought (CoT) Prompting aims to make our models better at solving complex, multi-step reasoning problems. It can be as simple as adding the CoT instruction “Let’s think step-by-step’’ to the input query, to improve accuracy significantly [5][6].
Instead of just providing the final query or add one or few examples within your prompt like in the Few-Shot approach, you prompt the model to break down its reasoning process into a series of intermediate steps. This is akin to how a human would (ideally) approach a challenging problem.
Remember your math exams in school? Often, at more advanced classes, you were asked to not only solve a mathematical equation, but also write down the logical steps how you arrived at the final solution. And even if it was incorrect, you might have gotten some credits for mathematically sound solution steps. Just like your teacher in school, you expect the model to break the task down into sub-tasks, perform intermediate reasoning, and arrive at the final answer.
Again, I have experimented with CoT myself quite a bit. And again, most of the time, simply adding “Let’s think step-by-step” didn’t improve the quality of the response. In fact, it seems that the CoT approach has become an implicit standard of the recent fine-tuned chat-based LLM like ChatGPT, and the response is frequently broken down into chunks of reasoning without the explicit command to do so.
However, I came across one instance where the explicit CoT command did in fact improve the answer significantly. I used a CoT example from this article, however, altered it into a trick question. Here you can see how ChatGPT fell into my trap, when not explicitly asked for a CoT approach (even though the response shows a step wise reasoning):
When I added “Let’s think step-by-step” to the same prompt, it solved the trick question correctly (well, it is unsolvable, which ChatGPT rightfully pointed out):
To summarize, Chain of Thought prompting aims at building up reasoning skills that are otherwise difficult for language models to acquire implicitly. It encourages models to articulate and refine their reasoning process rather than attempting to jump directly from question to answer.
Again, my experiments have revealed only limited benefits of the simple CoT approach (adding “Let’s think step-by-step“). CoT did outperform a simple query on one occasion, and at the same time the extra effort of adding the CoT command is minimal. This cost-benefit ratio is one of the reasons why this approach is one of my favorites. Another reason why I personally like this approach is, it not only helps the model, but also can help us humans to reflect and maybe even iteratively consider necessary reasoning steps while crafting the prompt.
As before, we will likely see diminishing benefits of this simple CoT approach when models become more and more fine-tuned and accustomed to this reasoning process.
In this article, we have taken a journey into the world of prompting chat-based Large Language Models. Rather than just giving you the most popular prompting techniques, I have encouraged you to begin the journey with the question of Why prompting matters at all. During this journey we have discovered that the importance of prompting is diminishing thanks to the evolution of the models. Instead of requiring users to invest in continuously improving their prompting skills, currently evolving model architectures will likely further reduce their relevance. An agent-based framework, where different “routes” are taken while processing specific queries and tasks, is one of those.
This does not mean, however, that being clear and specific and providing the necessary context within your prompts isn’t worth the effort. On the contrary, I am a strong advocate of this, since it not only helps the model but also yourself to figure out what exactly it is you’re trying to achieve.
Just like in human communication, multiple factors determine the appropriate approach for reaching a desired outcome. Often, it is a mix and iteration of different approaches that yield optimal results for the given context. Try, test, iterate!
And finally, unlike in human interactions, you can test virtually limitlessly into your personal trial-and-error prompting journey. Enjoy the ride!
Source: The Business Guide to Tailoring Language AI Part 2