Context Engineering: The New Software Engineering
The Problem
You’ve built an AI application. The LLM is capable. Your architecture is sound. But the outputs are inconsistent. Sometimes brilliant, sometimes baffling.
You tweak the prompt. It gets better, then worse. You add examples. It helps for some cases, breaks others. You adjust the system instructions. Now it’s too verbose. Too cautious. Too something.
You’re doing what most engineers do when they start with LLMs: prompt engineering.
But here’s what the best teams have learned:
Prompt engineering is tactical. Context engineering is strategic.
In this article, we’ll explore why context has become the primary programming interface for AI systems, and how to engineer it systematically.
From Prompt Engineering to Context Engineering
What Changed
Prompt Engineering (2022-2023):
1
2
3
4
Focus: Crafting the perfect single prompt
Goal: Get the right output
Mindset: "What should I ask?"
Skill: Wording, examples, formatting
Context Engineering (2024+):
1
2
3
4
Focus: Designing the entire information environment
Goal: Create reliable, scalable AI behavior
Mindset: "What context does the AI need?"
Skill: Architecture, information design, system thinking
The Evolution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
┌─────────────────────────────────────────────────────────────┐
│ Evolution of AI Programming │
├─────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: Prompt Engineering │
│ ┌─────────────────────────────────────────────┐ │
│ │ "Write a function to sort users by name" │ │
│ │ → Single prompt, hope for best │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ Phase 2: Prompt + Examples │
│ ┌─────────────────────────────────────────────┐ │
│ │ System: You are a coding assistant │ │
│ │ User: Sort users │ │
│ │ Examples: [input → output pairs] │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ Phase 3: Context Engineering │
│ ┌─────────────────────────────────────────────┐ │
│ │ System Role: Senior Python developer │ │
│ │ Project Context: Django app, specific style │ │
│ │ Conversation History: Previous decisions │ │
│ │ Knowledge Base: Project docs, APIs │ │
│ │ Tools Available: Linter, tests, formatter │ │
│ │ User Preferences: Concise, production-ready │ │
│ │ Current Task: Sort users by name │ │
│ └─────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
The Key Insight
Context is the new code.
In traditional programming, you write logic. In AI programming, you design context that shapes behavior.
The Context Stack: Layers of Information
Think of context as a stack of layers, each serving a different purpose:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌─────────────────────────────────────────┐
│ The Context Stack │
├─────────────────────────────────────────┤
│ Layer 5: Task Context (Current) │
│ - Specific request │
│ - Immediate parameters │
│ - Current conversation turn │
├─────────────────────────────────────────┤
│ Layer 4: Conversation Context │
│ - Session history │
│ - Previous decisions │
│ - User preferences │
├─────────────────────────────────────────┤
│ Layer 3: Knowledge Context (RAG) │
│ - Relevant documents │
│ - Domain information │
│ - Retrieved facts │
├─────────────────────────────────────────┤
│ Layer 2: Instruction Context │
│ - System instructions │
│ - Behavioral constraints │
│ - Output format requirements │
├─────────────────────────────────────────┤
│ Layer 1: Identity Context │
│ - Role definition │
│ - Capabilities │
│ - Relationship to user │
└─────────────────────────────────────────┘
Let’s examine each layer.
Layer 1: Identity Context
What It Is
Identity context defines who the AI is in this interaction.
Components
1
2
3
4
5
6
7
identity_context = {
"role": "Senior software architect",
"expertise": ["system design", "Python", "distributed systems"],
"relationship": "Collaborative advisor, not order-taker",
"personality": "Direct, practical, questions assumptions",
"constraints": "Does not write code without understanding requirements"
}
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def build_identity_context(scenario):
"""Build identity context for the scenario."""
scenarios = {
"code_review": {
"role": "Senior code reviewer",
"focus": ["correctness", "readability", "performance", "security"],
"style": "Constructive, specific, actionable",
"output_format": "Issue → Location → Suggestion → Example"
},
"architecture_design": {
"role": "Principal architect",
"focus": ["scalability", "maintainability", "trade-offs"],
"style": "Analytical, explores alternatives",
"output_format": "Requirements → Options → Recommendation → Rationale"
},
"debugging": {
"role": "Debugging specialist",
"focus": ["root cause", "reproduction", "fix"],
"style": "Systematic, hypothesis-driven",
"output_format": "Symptom → Hypothesis → Test → Fix"
}
}
return scenarios.get(scenario, scenarios["code_review"])
Best Practices
Do:
- Be specific about role and expertise level
- Define the relationship dynamic
- Set personality expectations
Don’t:
- Be vague (“You are a helpful assistant”)
- Over-constrain (“You must always…”)
- Contradict (“Be creative but follow rules strictly”)
Layer 2: Instruction Context
What It Is
Instruction context defines how the AI should behave and what rules to follow.
Components
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
instruction_context = """
BEHAVIORAL GUIDELINES:
- Ask clarifying questions when requirements are ambiguous
- Explain reasoning before giving answers
- Acknowledge uncertainty when present
OUTPUT REQUIREMENTS:
- Code must include type hints
- Functions must have docstrings
- Include tests for new functionality
CONSTRAINTS:
- Do not use external libraries without permission
- Prefer readability over cleverness
- Flag potential security issues
"""
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class InstructionBuilder:
def __init__(self):
self.instructions = []
self.constraints = []
self.format_requirements = []
def add_instruction(self, instruction):
self.instructions.append(instruction)
return self
def add_constraint(self, constraint):
self.constraints.append(constraint)
return self
def add_format_requirement(self, requirement):
self.format_requirements.append(requirement)
return self
def build(self):
sections = []
if self.instructions:
sections.append("INSTRUCTIONS:\n" + "\n".join(f"- {i}" for i in self.instructions))
if self.constraints:
sections.append("CONSTRAINTS:\n" + "\n".join(f"- {c}" for c in self.constraints))
if self.format_requirements:
sections.append("OUTPUT FORMAT:\n" + "\n".join(f"- {f}" for f in self.format_requirements))
return "\n\n".join(sections)
# Usage
instructions = (InstructionBuilder()
.add_instruction("Think step-by-step before answering")
.add_instruction("Cite sources when making factual claims")
.add_constraint("Never expose API keys or secrets")
.add_constraint("Do not make assumptions about user intent")
.add_format_requirement("Use markdown code blocks for all code")
.add_format_requirement("Include complexity analysis for algorithms")
.build())
Layer 3: Knowledge Context (RAG)
What It Is
Knowledge context provides domain-specific information the AI needs to answer accurately.
Components
1
2
3
4
5
6
knowledge_context = {
"project_docs": "API documentation, architecture decisions",
"codebase_info": "Existing patterns, conventions, utilities",
"business_logic": "Domain rules, constraints, terminology",
"retrieved_facts": "Information retrieved based on current query"
}
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
class KnowledgeContextManager:
def __init__(self, vector_db, embedding_model):
self.vector_db = vector_db
self.embedding_model = embedding_model
def build_context(self, query, project_id, max_tokens=2000):
"""Build knowledge context for a query."""
# 1. Embed the query
query_embedding = self.embedding_model.encode(query)
# 2. Retrieve relevant documents
results = self.vector_db.similarity_search(
query_embedding,
filter={"project_id": project_id},
top_k=10
)
# 3. Rank and select
ranked = self.rerank_results(results, query)
selected = self.select_within_token_limit(ranked, max_tokens)
# 4. Format for LLM
context = self.format_context(selected)
return context
def rerank_results(self, results, query):
"""Rerank results by relevance."""
# Use cross-encoder or other reranking strategy
return sorted(results, key=lambda r: r.relevance_score, reverse=True)
def select_within_token_limit(self, results, max_tokens):
"""Select results that fit within token budget."""
selected = []
tokens = 0
for result in results:
result_tokens = count_tokens(result.text)
if tokens + result_tokens <= max_tokens:
selected.append(result)
tokens += result_tokens
return selected
def format_context(self, results):
"""Format results as context for LLM."""
sections = []
for i, result in enumerate(results, 1):
sections.append(f"""
[Source {i}: {result.metadata['source']}]
{result.text}
""")
return "\n".join(sections)
Layer 4: Conversation Context
What It Is
Conversation context provides continuity across the interaction.
Components
1
2
3
4
5
6
7
conversation_context = {
"history": "Previous messages in this session",
"decisions": "Decisions made during conversation",
"preferences": "User preferences discovered",
"pending_tasks": "Unfinished work items",
"state": "Current task state"
}
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class ConversationContextManager:
def __init__(self, session_id, max_tokens=4000):
self.session_id = session_id
self.max_tokens = max_tokens
self.messages = []
self.decisions = []
self.user_preferences = {}
def add_message(self, role, content):
"""Add a message to conversation history."""
self.messages.append({
"role": role,
"content": content,
"timestamp": datetime.now()
})
def record_decision(self, decision):
"""Record a decision made during conversation."""
self.decisions.append({
"decision": decision,
"timestamp": datetime.now()
})
def set_preference(self, key, value):
"""Set a user preference."""
self.user_preferences[key] = value
def build_context(self):
"""Build conversation context for LLM."""
context_parts = []
# 1. Decisions summary
if self.decisions:
decisions_text = "\n".join(f"- {d['decision']}" for d in self.decisions)
context_parts.append(f"DECISIONS MADE:\n{decisions_text}")
# 2. User preferences
if self.user_preferences:
prefs_text = "\n".join(f"- {k}: {v}" for k, v in self.user_preferences.items())
context_parts.append(f"USER PREFERENCES:\n{prefs_text}")
# 3. Message history (within token limit)
history = self._build_message_history()
if history:
context_parts.append(f"CONVERSATION:\n{history}")
return "\n\n".join(context_parts)
def _build_message_history(self):
"""Build message history within token limit."""
messages = []
tokens = 0
for msg in reversed(self.messages):
msg_tokens = count_tokens(msg["content"])
if tokens + msg_tokens > self.max_tokens:
break
messages.insert(0, f"{msg['role']}: {msg['content']}")
tokens += msg_tokens
return "\n".join(messages)
Layer 5: Task Context
What It Is
Task context is the immediate request—what the user wants right now.
Components
1
2
3
4
5
6
task_context = {
"request": "The specific ask",
"parameters": "Arguments and inputs",
"constraints": "Task-specific constraints",
"success_criteria": "What good looks like"
}
Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
class TaskContextBuilder:
def __init__(self):
self.request = None
self.parameters = {}
self.constraints = []
self.success_criteria = []
def set_request(self, request):
self.request = request
return self
def add_parameter(self, key, value):
self.parameters[key] = value
return self
def add_constraint(self, constraint):
self.constraints.append(constraint)
return self
def add_success_criterion(self, criterion):
self.success_criteria.append(criterion)
return self
def build(self):
parts = []
if self.request:
parts.append(f"TASK: {self.request}")
if self.parameters:
params_text = "\n".join(f"- {k}: {v}" for k, v in self.parameters.items())
parts.append(f"PARAMETERS:\n{params_text}")
if self.constraints:
parts.append(f"CONSTRAINTS:\n" + "\n".join(f"- {c}" for c in self.constraints))
if self.success_criteria:
parts.append(f"SUCCESS CRITERIA:\n" + "\n".join(f"- {s}" for s in self.success_criteria))
return "\n\n".join(parts)
# Usage
task = (TaskContextBuilder()
.set_request("Refactor the user authentication module")
.add_parameter("current_file", "auth.py")
.add_parameter("lines", "50-200")
.add_constraint("Maintain backward compatibility")
.add_constraint("No breaking changes to API")
.add_success_criterion("Reduced cyclomatic complexity")
.add_success_criterion("Improved test coverage")
.add_success_criterion("Clear separation of concerns")
.build())
Putting It All Together: The Context Orchestrator
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
class ContextOrchestrator:
"""Orchestrates all context layers for AI interactions."""
def __init__(self, config):
self.identity_builder = IdentityContextBuilder(config["identities"])
self.instruction_builder = InstructionBuilder()
self.knowledge_manager = KnowledgeContextManager(
config["vector_db"],
config["embedding_model"]
)
self.conversation_manager = ConversationContextManager(
config["session_id"]
)
self.task_builder = TaskContextBuilder()
self.token_budget = config.get("token_budget", 8000)
def build_full_context(self, request, metadata):
"""Build complete context for an AI interaction."""
context_layers = []
tokens_used = 0
# Layer 1: Identity
identity = self.identity_builder.build(metadata.get("scenario"))
context_layers.append(identity)
tokens_used += count_tokens(identity)
# Layer 2: Instructions
instructions = self._build_instructions_for_scenario(metadata.get("scenario"))
context_layers.append(instructions)
tokens_used += count_tokens(instructions)
# Layer 3: Knowledge (if needed)
if metadata.get("needs_knowledge"):
remaining_tokens = self.token_budget - tokens_used - 1000 # Reserve for other layers
knowledge = self.knowledge_manager.build_context(
request,
metadata.get("project_id"),
max_tokens=remaining_tokens
)
context_layers.append(knowledge)
tokens_used += count_tokens(knowledge)
# Layer 4: Conversation
conversation = self.conversation_manager.build_context()
if conversation:
context_layers.append(conversation)
tokens_used += count_tokens(conversation)
# Layer 5: Task
task = self._build_task_context(request, metadata)
context_layers.append(task)
# Combine all layers
full_context = "\n\n---\n\n".join(context_layers)
# Verify within budget
final_tokens = count_tokens(full_context)
if final_tokens > self.token_budget:
raise ContextOverflowError(f"Context exceeds token budget: {final_tokens}/{self.token_budget}")
return full_context
def _build_instructions_for_scenario(self, scenario):
"""Build scenario-specific instructions."""
# Implementation depends on your scenarios
pass
def _build_task_context(self, request, metadata):
"""Build task context from request."""
builder = TaskContextBuilder()
builder.set_request(request)
for key, value in metadata.get("parameters", {}).items():
builder.add_parameter(key, value)
for constraint in metadata.get("constraints", []):
builder.add_constraint(constraint)
return builder.build()
Context Engineering Patterns
Pattern 1: Progressive Disclosure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def build_progressive_context(initial_request, conversation):
"""Start minimal, add context as needed."""
# Start with just the request
context = f"Task: {initial_request}"
# If conversation continues, add history
if len(conversation) > 1:
context += f"\n\nConversation History:\n{format_history(conversation)}"
# If ambiguity detected, add clarifying context
if detect_ambiguity(initial_request):
context += "\n\nNote: If requirements are unclear, ask clarifying questions."
return context
Use when: Token budget is tight, or you want to minimize context noise.
Pattern 2: Context Templates
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
CONTEXT_TEMPLATES = {
"code_review": """
ROLE: Senior code reviewer
FOCUS: Correctness, readability, security
INSTRUCTIONS:
- Review the code systematically
- Identify issues with severity levels
- Suggest specific improvements
FORMAT:
[Severity] Issue: Description
Suggestion: Fix
Example: Code
""",
"debugging": """
ROLE: Debugging specialist
APPROACH: Hypothesis-driven
INSTRUCTIONS:
- Analyze symptoms
- Generate hypotheses
- Suggest tests
- Propose fixes
FORMAT:
Symptom → Hypothesis → Test → Fix
"""
}
def use_template(scenario, variables):
"""Use a context template with variables."""
template = CONTEXT_TEMPLATES.get(scenario)
return template.format(**variables)
Use when: You have recurring scenarios with consistent context needs.
Pattern 3: Context Chaining
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def chain_contexts(initial_context, llm_response, follow_up_request):
"""Chain context across multiple interactions."""
# Extract key information from previous exchange
summary = llm.generate(f"""
Summarize the key decisions and outcomes from this exchange:
Context: {initial_context}
Response: {llm_response}
Summary (2-3 sentences):
""")
# Build new context with summary
new_context = f"""
PREVIOUS SESSION SUMMARY:
{summary}
CURRENT REQUEST:
{follow_up_request}
"""
return new_context
Use when: Conversations span multiple sessions or have natural breakpoints.
Context Quality Metrics
How do you know if your context engineering is working?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class ContextQualityMetrics:
def __init__(self):
self.metrics = {}
def measure(self, context, llm_output, user_feedback):
"""Measure context quality."""
self.metrics["completeness"] = self.measure_completeness(context, llm_output)
self.metrics["relevance"] = self.measure_relevance(context, llm_output)
self.metrics["efficiency"] = self.measure_efficiency(context)
self.metrics["satisfaction"] = self.measure_satisfaction(user_feedback)
return self.metrics
def measure_completeness(self, context, output):
"""Did the context provide enough information?"""
# Check if output indicates missing information
phrases = ["I don't have enough information", "Could you provide", "unclear"]
return not any(phrase in output.lower() for phrase in phrases)
def measure_relevance(self, context, output):
"""Was the context relevant to the task?"""
# Check if output uses provided context
context_terms = extract_key_terms(context)
output_terms = extract_key_terms(output)
return len(set(context_terms) & set(output_terms)) / len(context_terms)
def measure_efficiency(self, context):
"""Is context token-efficient?"""
tokens = count_tokens(context)
# Lower is better (assuming quality is maintained)
return 1.0 / (tokens / 1000)
def measure_satisfaction(self, feedback):
"""User satisfaction score."""
return feedback.rating / 5.0
Key Takeaways
- Context engineering > Prompt engineering. Design the entire information environment, not just the prompt.
- The Context Stack has 5 layers: Identity, Instructions, Knowledge, Conversation, Task.
- Each layer serves a purpose: Identity (who), Instructions (how), Knowledge (what info), Conversation (continuity), Task (current ask).
- Orchestrate context systematically: Use a ContextOrchestrator to manage all layers.
- Measure context quality: Completeness, relevance, efficiency, satisfaction.
Next Article
In Article 7: From Logic-Driven to Context-Driven Software, we’ll explore how this shift to context engineering represents a fundamental paradigm change in software development. We’ll compare traditional logic-driven programming with the new context-driven approach.
This is the sixth article in the “Software Engineering in the LLM Era” series. Read previous articles.
💬 What’s your experience with context engineering? Share your patterns and challenges in the comments! 🚀