Home » How to Create Powerful LLM Applications with Context Engineering

How to Create Powerful LLM Applications with Context Engineering

engineering is a powerful concept you can utilize to increase the effectiveness of your LLM applications. In this article, I elaborate on context engineering techniques and how to succeed with AI applications utilizing effective context management. Thus, if you are working on AI applications utilizing LLMs, I highly recommend reading the full contents of the article.

I first wrote about the topic of context engineering in my article: How You Can Enhance LLMs with Context Engineering, where I discussed some context engineering techniques and important notes. In this article, I expand on the topic by discussing more context engineering techniques and how to do evaluations on your context management.

In this article, I discuss how you can utilize context engineering to increase the efficiency of your LLMs. Image by ChatGPT.

If you haven’t read it already, I recommend you first read my initial article on context engineering, or you can read about ensuring reliability in LLM applications.

Table of Contents

Motivation

My motivation for writing this article is similar to my last article on context engineering. LLMs have become incredibly important in a lot of applications since the release of ChatGPT in 2022. However, LLMs are often not utilized to their full potential due to poor context management. Proper context management requires context engineering skills and techniques, which is what I’ll discuss in this article. Thus, if you are working on any applications utilizing LLMs, I highly recommend taking notes from this article and integrating it into your own application.

Context engineering techniques

In my last article, I discussed context engineering techniques such as:

  • Zero/few-shot prompting
  • RAG
  • Tools (MCP)

I’ll now elaborate on more techniques that are important to proper context management.

Prompt structuring

With prompt structuring, I’m referring to how your prompt is organized. A messy prompt will, for example, contain all the text without line breaks, repetitive instructions, and unclear sectioning. Check out the example below for a properly structured prompt, vs a messy prompt:

# unstructured prompt. No line breaks, repetitive instructions, unclear sectioning
"You are an AI assistant specializing in question answering. You answer the users queries in a helpful, concise manner, always trying to be helpful. You respond concisely, but also avoid single-word answers."

# structured prompt:
"""
## Role  
You are an **AI assistant specializing in question answering**.  

## Objectives  
1. Answer user queries in a **helpful** and **concise** manner.  
2. Always prioritize **usefulness** in responses.  

## Style Guidelines  
- **Concise, but not overly brief**: Avoid single-word answers.  
- **Clarity first**: Keep responses straightforward and easy to understand.  
- **Balanced tone**: Professional, helpful, and approachable.  

## Response Rules  
- Provide **complete answers** that cover the essential information.  
- Avoid unnecessary elaboration or filler text.  
- Ensure answers are **directly relevant** to the user’s question.  
"""

Prompt structuring is important for two reasons.

  1. It makes the instructions clearer to the AI
  2. It increases (human) readability of the prompt, which helps you detect potential issues with your prompt, avoid repetitive instructions, etc

You should always try to avoid repetitive instructions. To avoid this, I recommend feeding your prompt into another LLM and asking for feedback. You’ll typically receive back a much cleaner prompt, with clearer instructions. Anthropic also has a prompt generator in their dashboard, and there are also a lot of other tools out there to improve your prompts.

Context window management

Two main points for context management. Keep the context short, and if the context gets too long, you can utilize context compression by summarizing. Image by ChatGPT.
Two main points for context management. Keep the context short, and if the context gets too long, you can utilize context compression by summarizing. Image by ChatGPT.

Another important point to keep in mind is context window management. With this, I am referring to the amount of tokens you are feeding into your LLM. It’s important to remember that while recent LLMs have super-long context windows (for example, Llama 4 Scout with a 10M context window), they are not necessarily able to utilize all of those tokens. You can, for example, read this article, highlighting how LLMs perform worse with more input tokens, even if the difficulty of the problem stays the same.

It’s thus important to properly manage your context window. I recommend focusing on two points:

  1. Keep the prompt as short as possible, while including all relevant information. Look through the prompt and determine if there is any irrelevant text there. If so, removing it will likely increase LLM performance
  2. You might be experiencing problems where the LLM runs out of context window. Either because of the hard context size limit, or because too many input tokens make the LLM slow to respond. In these cases, you should consider context compression

For point one, it’s important to note that this irrelevant information is often not a part of your static system prompt, but rather the dynamic information you are feeding into the context. For example, if you are fetching information using RAG, you should consider excluding chunks that have similarity below a specific threshold. This threshold will vary from application to application, though empirical reasoning here typically works well.

Context compression is another powerful technique you can use to properly manage the context of your LLM. Context compression is typically done by prompting another LLM to summarize part of your context. This way, you can contain the same information using fewer tokens. This approach is, for example, used to handle the context window of agents, which can quickly expand as the agent performs more actions.

Keyword search (vs RAG)

This image shows RAG architecture from https://github.com/infiniflow/ragflow (Apache 2 license). You can improve the RAG flow by implementing contextual retrieval.
This image shows RAG architecture from https://github.com/infiniflow/ragflow (Apache 2 license). You can improve the RAG flow by implementing contextual retrieval.

Another topic I think is worth highlighting is to utilize keyword search, in addition to retrieval augmented generation (RAG). In most AI applications, the focus is on RAG, considering it can fetch information based on semantic similarity.

Semantic similarity is super powerful because in a lot of cases, the user doesn’t know the exact wording of what they are looking for. Searching for semantic similarity thus works very well. However, in a lot of cases, keyword search will also work super well. I thus recommend integrating an option to fetch documents using some sort of keyword search, in addition to your RAG. The keyword search will, in some scenarios, retrieve more relevant documents than RAG is able to.

Anthropic highlighted this approach with their article on Contextual Retrieval from September 2024. In this article, they show you how you can utilize BM25 to fetch relevant information in your RAG system effectively.

Evaluation

Evaluation is an important part of any machine-learning system. If you don’t know how well your LLMs are performing, it’s hard to improve your system.

The first step to evaluation is observability. I thus recommend implementing prompt management software. You can find a series of such tools on this GitHub page.

One way to evaluate your context management is to perform A/B testing. You simply run two different versions of a prompt, using different context management techniques. Then you can, for example, gather user feedback to determine which approach works better. Another way to test it is to prompt an LLM with the problem you are trying to solve (for example, RAG) and the context you are using to answer the RAG query. The LLM can then provide you with feedback on how to improve your context management.

Furthermore, an underrated approach to improving the quality of the contexts is to manually inspect them. I believe a lot of engineers working with LLMs spend too little time on manual inspections, and analyzing the input tokens fed into LLMs falls under this category. I thus recommend setting aside time to go through a series of different contexts that are fed into your LLM, to determine how you can improve. Manual inspection provides you with the opportunity to properly understand the data you are working with and what you are feeding into your LLMs.

Conclusion

In this article, I have elaborated on the topic of context engineering. Working on context engineering is a powerful approach to improving your LLM application. There are a series of techniques you can utilize to better manage the context of your LLMs, for example, improving your prompt structure, proper context window management, utilizing keyword search, and context compression. Furthermore, I also discussed evaluating the contexts.

👉 Find me on socials:

🧑‍💻 Get in touch

🔗 LinkedIn

🐦 X / Twitter

✍️ Medium

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *