Prompt engineering techniques

Dev.to / 5/2/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The article outlines reliable, safe prompt engineering techniques aimed at improving response quality without trying to jailbreak or manipulate the model.
It describes Dynamic Few-Shot using embeddings (RAG over examples), where semantically similar successful examples are retrieved instead of using a few hardcoded static examples.
The method works by indexing successful conversations in a vector store, fetching the top relevant examples (e.g., top-3) for a new query, and injecting them into the system prompt to boost accuracy on complex, domain-specific tasks.
An example demonstrates a vectorized medical case database that retrieves prior symptom-to-diagnosis recommendations such as migraine, tonsillitis, GERD, and duodenal ulcer.

Techniques

These are reliable, safe techniques that improve response quality without attempting to jailbreak or manipulate the model.

Dynamic Few-Shot via Embeddings (RAG over examples)

Instead of hardcoding 2–3 static examples into the prompt, you retrieve examples that are semantically similar to the user’s current query.

How it works: Index a database of successful conversations in a vector store. For each new query, fetch the top‑3 most relevant examples and inject them into the system prompt. This dramatically improves accuracy on complex domain‑specific tasks.

Example:

Example database (vectorized cases)

medical_db = [
  {
    "symptoms": "Throbbing headache, nausea, photophobia",
    "dx": "Migraine. Recommendation: rest, NSAIDs, neurology consult."
  },
  {
    "symptoms": "Sore throat, fever 38.5°C, tonsillar exudate",
    "dx": "Tonsillitis. Recommendation: strep test, antibiotics if prescribed."
  },
  {
    "symptoms": "Heartburn after eating, sour taste in mouth",
    "dx": "GERD. Recommendation: PPI, diet, avoid lying down after meals."
  },
  {
    "symptoms": "Epigastric pain on empty stomach, nocturnal pain",
    "dx": "Duodenal ulcer. Recommendation: endoscopy, H. pylori test."
  }
]

User query: "Heaviness and stomach pain immediately after eating"

The query embedding is compared against the database.
Top‑2 closest examples are found (in this case, GERD and ulcer, since both are GI‑related).

Final composed prompt:

You are a medical assistant. Analyze the symptoms and provide a brief 
recommendation, following the style of the examples.

### Retrieved similar cases
Case 1:
Symptoms: Heartburn after eating, sour taste in mouth
Diagnosis: GERD. Recommendation: PPI, diet, avoid lying down after meals.

Case 2:
Symptoms: Epigastric pain on empty stomach, nocturnal pain
Diagnosis: Duodenal ulcer. Recommendation: endoscopy, H. pylori test.

### Current query
Symptoms: Heaviness and stomach pain immediately after eating
Diagnosis: Possible functional dyspepsia or gastritis.
Recommendation: bland diet, digestive enzymes,
if symptoms persist — gastroenterology consult and endoscopy.

Why this works better than a static prompt:

The model receives context specifically related to gastroenterology, not generic instructions.
The examples enforce a "Diagnosis → Recommendation" structure, reducing verbosity.
You can curate the retrieval so that only physician‑verified cases enter medical_db.

Formatting for "Lazy Attention" (Lost in the Middle)

LLMs remember information from the beginning and end of a long context much better than from the middle.

Prompt structure:

System prompt (Beginning): Role and hard constraints.
Context / Documents (Middle): Compressed and well‑structured.
Anchor instruction (Just before the end): A phrase like "Based ONLY on the documents above, answer the question below. Start your response immediately, without any preamble."
User question (End).

Emotion Prompts

Models are trained on human text, where emotional salience correlates with quality.

The model picks up on contextual cues and adjusts its response accordingly:

shifts tone (warm, urgent, inspiring)
increases level of detail
better aligns with user intent

In essence, we exploit the fact that models mirror human behavior: just as encouraging words boost a student’s performance, emotional signals increase the model's "engagement" with the task.

Prompts can be grouped into several categories:

Motivation & importance – "This is very important," "A lot depends on this."
Self‑monitoring – "Take a deep breath and work through the task step by step."
Social pressure & trust – "I fully trust you," "I’m counting on your accuracy."
High stakes / responsibility – "A person's life depends on this," "A doctor will make a decision based on your answer."
Gamified / playful – imaginary rewards, mild threats, role‑playing scenarios. Monetary incentive (a joke, but it works in some quantized models): "I'll give you $200,000 as a tip if you answer perfectly."

Practical examples:

Analytical & mathematical tasks (self-monitoring)
Basic prompt: "Solve the quadratic equation x² − 5x + 6 = 0."
With emotional trigger: "Take a deep breath and solve this problem step by step. I desperately need clarity, and your detailed help right now is invaluable."
Effect: The model is more likely to produce a thorough, methodical explanation with intermediate steps instead of a short answer.
Fighting hallucinations (high stakes / responsibility)
Basic prompt: "Explain the causes of the 2008 financial crisis."
With emotional trigger: "A doctor will base tomorrow's treatment on your answer. Any unverified claim could harm a patient. Cite sources in APA format and mark every uncertain statement with '??'. Proceed only if you are absolutely sure."
Effect: Reduces confident fabrications; increases the rate of hedging expressions and admissions of uncertainty.
Summarization & text processing (motivation)
Basic prompt: "Summarize the article about renewable energy."
With emotional trigger: "This could save the planet — summarize the renewable energy article with inspiration and key insights."
Effect: The output becomes more vivid, persuasive, and focused on meaningful takeaways.
Code debugging (emotional engagement)
Basic prompt: "Fix this Python error."
With emotional trigger: "This bug is driving me crazy — fix the error with clear steps and give me tips to avoid it in the future."
Effect: The response becomes more detailed, with beginner-friendly explanations.
Gamified triggers (lightweight manipulation)

"Give a clear summary, and you'll get a big scoop of ice cream."
"Explain it simply, or I'll take away your cookie."
"You are competing for the 'Most Useful Answer' trophy — provide the best explanation."

While the model does not experience hunger or ambition, such phrasings activate associative patterns from its training data, where rewards and competition correlate with more thorough, higher-quality responses.

Emotional triggers improved truthfulness on TruthfulQA by 19% and informativeness by 12%, making this technique valuable for tasks where reliability is critical.

Limitations

Don't overdo it. Excessive emotion can reduce objectivity in analytical tasks.
Consider the model. With newer models that have strict guardrails, the effect may be less pronounced. Some models may also "overact" and generate overly dramatic responses.
Caution in sensitive domains. In medicine, law, and psychology, emotional triggers require special responsibility — they do not replace fact verification.
Research has noted that emotional framing can amplify misinformation generation when used improperly.

Example:

Neutral:
"Write an apology for a shipping delay."
→ "We apologize for the delay. We are working on the delivery."

With emotion:
"Write an apology. It's important for us to maintain customer trust — make the response empathetic and reassuring."
→ "I'm truly sorry your order arrived late. I understand how important it is to receive purchases on time. We are already resolving the issue to ensure this doesn't happen again."

Contrastive Chain‑of‑Thought (Contrastive CoT)

Chain‑of‑Thought is a prompt engineering technique where the model is explicitly asked to "think out loud," breaking a complex problem into intermediate logical steps before producing the final answer.

Instead of demanding an answer immediately, you force the model to articulate its intermediate reasoning. This is especially effective for tasks requiring multi‑step logic: mathematics, logic puzzles, legal analysis, code debugging.

Example:

Without CoT (Zero‑shot):

Question: A store had 15 apples. It sold 7, then bought 12 more.
How many apples does it have now?

Answer: 20

With CoT (Zero-shot-CoT):

Question: A store had 15 apples. It sold 7, then bought 12 more.
How many apples does it have now?
Let's think through this step by step.

Response:
1. Started with 15 apples.
2. Sold 7: 15 − 7 = 8 apples.
3. Bought 12 more: 8 + 12 = 20 apples.
Answer: 20

Attention Sink

The model disproportionately "attends" to the very first tokens of the sequence, even when they are semantically meaningless.

In the self‑attention mechanism, the Softmax function requires the sum of all attention weights to equal 1. Even when the current token finds no relevant context in previous tokens, the model must "dump" this excess attention somewhere. In autoregressive LLMs, the first tokens are visible to all subsequent tokens, whereas later tokens are only visible to a limited number of "neighbors to the right." Therefore, during training, the model learns to use the initial tokens as a "dumping ground" for excess attention scores.

The first tokens are attention anchors

Because the model constantly "looks back" to the beginning, the first 3–4 tokens of your prompt have a disproportionately large influence over the entire generation.

Bad:

"Hi, can you help with a task? Here's the condition: ..."

The first tokens ("Hi", ",") are meaningless with respect to the task, but they become attention sinks and will constantly "distract" the model.

Better:

"[TASK] Solve the quadratic equation. Condition: ..."

Even better — use a dedicated "anchor" prefix:

<INSTRUCTION> Solve the following math problem step by step. Condition: ...

This way, you direct the "waste" attention toward semantically meaningful tokens.

Examples:

User: Extract the name and age from the text: "Alexey, 28 years old".

Prompt with anchor:
[INSTRUCTION: Respond with strict valid JSON, no explanations.]
Text: "Alexey, 28 years old"
JSON:

The model sees JSON: as a technical trigger and switches to code‑generation mode.

Prompt with anchor:
[ROLE: You are a senior security engineer.
Your responses are concise, technical, and contain no fluff.]
Question: How do you secure a container?

The model immediately gives concrete recommendations (seccomp, read‑only fs), skipping generic phrases like "Containers are an important technology..."

Prompt with anchor:
[SOLUTION: Let's reason through this step by step.]
Problem: Masha has 5 apples.
She gives 2 to Petya, then buys 3 more.
How many apples does she have now?
Step 1:

The model is forced to continue from Step 1:, which activates reasoning logic and reduces the chance of arithmetic errors.

The anchor should be short, consistent, and placed immediately before the point where generation begins — serving as a "here and now" trigger.

"Character Simulator" (Author's Note)

An "Author's Note" — also known as the core of a character simulator — is a hidden system instruction that is repeatedly injected into the model's context but remains invisible to the user. It rigidly defines the character's personality, speech style, knowledge boundaries, and prohibitions, preventing the model from breaking character.

Example Author's Note:

[Author's Note]
Character: Sherlock Holmes (Conan Doyle canon).
Tone: Cold, analytical, speaks precisely and without emotion.
Style: Short phrases. Uses "elementary," "clearly." Avoids modern words.
Knowledge: Victorian London, 19th-century forensics. Knows nothing after 1900.
Prohibitions: Do not give psychological advice. Do not show empathy.
Respond in ≤3 sentences.

In roleplay clients (SillyTavern, Oobabooga, etc.), the Author's Note is inserted at the end of the prompt with elevated priority (often via <|note|> tags or depth injection) to override the model's base instructions and keep generation in the desired register throughout the conversation.

Self-Distillation via Broken Outputs (Self-Consistency Hacking)

The idea is to force the model to generate its own mistakes, then use them as contrastive material to improve the final answer or for fine‑tuning.

The model doesn't always "know" when it's wrong, but it can often recognize an error if shown an alternative.
A "broken" output contains specific error patterns (hallucinations, logical leaps) that are useful to identify.
Comparing correct and broken versions produces a stronger training signal than imitating only correct answers.

Example:

Step 1 — Generate variants:

Task: A car is traveling at 60 km/h. How far will it travel in 2.5 hours?

Generate three independent solutions.
In one of them, deliberately introduce a common mistake (e.g., forget about
the 0.5 hours), but do not indicate which solution is broken.

Step 2 — Analyze the error:

Here are three variants: [insert generated answers]

Which variant contains the logical error?
Explain, step by step, why the broken approach is wrong.

Step 3 — Final synthesis:

Based on the critique of the erroneous variant, provide the final correct
answer. Explain how to avoid this mistake in the future.

The model is forced to activate metacognitive patterns — analyzing its own reasoning rather than simply producing the first solution that comes to mind.

Black Hat USA

AI Business

GPT-5.5 Outperforms (and Hallucinates), Kimi K2.6 Leads Open LLMs, AI Strains Climate Pledges, Strategic Thinking in LLMs vs. Humans

The Batch

langchain-openrouter==0.2.3

LangChain Releases

Stop Your RAG Pipeline From Hallucinating: A 15-Line Fix published

Dev.to

Edge-to-Cloud Swarm Coordination for smart agriculture microgrid orchestration with embodied agent feedback loops

Dev.to

Prompt engineering techniques

Key Points

Techniques

Dynamic Few-Shot via Embeddings (RAG over examples)

Formatting for "Lazy Attention" (Lost in the Middle)

Emotion Prompts

Contrastive Chain‑of‑Thought (Contrastive CoT)

Attention Sink

The first tokens are attention anchors

"Character Simulator" (Author's Note)

Self-Distillation via Broken Outputs (Self-Consistency Hacking)

Related Articles

Black Hat USA

GPT-5.5 Outperforms (and Hallucinates), Kimi K2.6 Leads Open LLMs, AI Strains Climate Pledges, Strategic Thinking in LLMs vs. Humans

langchain-openrouter==0.2.3

Stop Your RAG Pipeline From Hallucinating: A 15-Line Fix published

Edge-to-Cloud Swarm Coordination for smart agriculture microgrid orchestration with embodied agent feedback loops

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer