VZ editorial frame
Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.
VZ Lens
Through a VZ lens, the value is not information abundance but actionable signal clarity. Prompting isn’t a technical skill—it’s an X-ray of the mind. Certainty is misleading in 32% of cases, and coherence is more important than accuracy. Strategic value emerges when insight becomes execution protocol.
TL;DR — The structure of your thinking lies in the structure of your question
- Prompting isn’t a technical skill, but a mirror of your thinking: what you ask of the machine reveals the most about how you organize your own thoughts.
- Coherence is more important than accuracy: a coherent but incorrect line of reasoning can improve the model’s performance—because the form triggers existing knowledge.
- The danger zone of certainty: large language models are confidently wrong in 32% of cases—and humans do exactly the same thing.
- Complexity is not an obstacle, but a foothold: harder examples lead to better results because they slow things down—and slowing down demands attention.
- The question of fidelity is the darkest mirror of human thought: the explanation we give for a decision is rarely the story of the decision itself—it’s mostly a story told in hindsight.
The question you ask isn’t important for the answer—but because it shows where you’re standing.
I. The question behind which the questioner stands
There is a moment in a conversation when one realizes that one has asked the wrong question. Not because the answer was wrong, but because the question did not lead where one wanted to go. The words were appropriate, the grammar correct, the intention clear. Yet something was missing. It’s as if we were asking for directions in a foreign city, and the passerby said exactly what we asked—but at the same time, we sensed that they knew, and we knew: that wasn’t the real question.
A bad question doesn’t lead to a bad answer. It leads in the wrong direction.
Prompting (the art and science of communicating with large language models) is exactly about this. Not about correct sentence structure, not about choosing the right keywords—but about the gap that yawns between the question and the underlying intent. And about how to narrow that gap.
Researchers at Stanford and Meta recently published a study on self-reflective prompting—the method that incorporates thinking about thinking into dialogue with machines. A team from Nanjing University conducted a comprehensive analysis of various strategies for step-by-step thinking. Joint research by Princeton and Google DeepMind examined the “Tree of Thoughts,” where branching paths appear instead of a straight line. If you just read the titles, you might easily think these are technical papers for technical people. But at their core, they’re about something entirely different. About how we think. How we understand something. And what is the difference between knowing the answer and understanding the problem?
The structure of the question is the structure of thought. Those who ask one-step questions see only one step.
II. The Gap Between Conclusion and Understanding
Chain-of-Thought prompting (a step-by-step thinking technique, Wei et al., 2022) has revolutionized the use of large language models. The idea is simple: instead of expecting the answer in a single leap, we ask the model to show the intermediate steps. It’s as if a math teacher were writing not just the final result on the board, but the entire derivation.
It works. It works surprisingly well. For complex math problems, logic puzzles, and symbolic reasoning, performance improves dramatically. But there is a limit where the method hits a wall.
Asking better questions doesn’t yield smarter answers. It uncovers deeper problems.
Daniel Kahneman distinguishes between two modes of thinking in his book Thinking, Fast and Slow. The first—System 1—is fast, intuitive, and effortless. The second—System 2—is slow, deliberate, and requires effort. When someone asks how much two plus two is, System 1 answers. When they ask how much thirteen times twenty-seven is, System 2 is needed. In their default state, large language models resemble System 1: they respond immediately, without thinking. Step-by-step prompting forces them not to rely on intuition, but to think things through.
Behind every question lies another question that the asker has not yet asked themselves.
Wang and Zhao’s study examines precisely this boundary. The conclusion, they write, logically connects steps. Understanding, however, captures depth of meaning. The two are not the same. One can be excellent at proceeding step by step while missing the point. This pattern is familiar. I’ve seen it in a conference room when someone precisely laid out their argument, but I could see on the other party’s face that that wasn’t the point. I’ve seen it in myself as well, when I ran through every aspect of a problem and realized in the end that I had to rethink the whole thing because I was looking in the wrong place.
Self-reflective prompting offers a bridge across this gap. It guides you through the model in five steps, and these steps will be familiar to anyone who has ever worked in a team: interpretation, preliminary judgment, critical review, decision and justification, and finally self-assessment. It doesn’t just probe the “how,” but also the “why.” It’s not enough to calculate the answer. You have to understand why that is the answer.
III. The Trap of First Impressions
The first answer is often accurate. We just don’t dare to accept it.
According to research, thirty-one percent of Chain-of-Thought errors occur because the model overrides its initial, correct intuition with a more complex—but erroneous—train of thought. At first glance, this seems contradictory. How can thinking go wrong? How can thinking something through more thoroughly lead to a worse result?
Thinking and talking about thinking: they are not the same thing.
But anyone who has worked on themselves or with others is familiar with this pattern. The first intuition is often accurate; we just don’t dare to accept it. It seems too simple, too quick, too unsubstantiated. We start to overthink, and that overthinking takes us somewhere where we can no longer hear the original voice.
The other side is “overthinking.” Sixty-eight percent of mistakes fall into this category. With simple tasks, the model overcomplicates the problem, introduces irrelevant factors, and drifts away from the correct solution. It’s as if, when answering a simple question, someone starts analyzing the assumptions behind the question, the cultural background, and possible alternative interpretations—while the person asking simply wanted to know what time it is.
Thinking more isn’t always better thinking. Sometimes, stopping is exactly what helps.
Together, these two types of errors reveal something about how understanding works. It is not a linear process. It is not always true that thinking more leads to better results. Sometimes it is precisely pausing and returning to the simple that helps. And there are times when we need to look beyond the first answer, because the quick reaction has remained on the surface. The real question isn’t whether we should think more or less. The real question is when to do which. And this decision isn’t made by an algorithm—but by attention.
IV. The Mystery of Coherence — When Form Matters More Than Truth
A research team at Nanjing University made a surprising discovery while examining the anatomy of step-by-step thinking. The thought chain—the reasoning through which the model guides the solution—consists of two components: “bridging objects” and “linguistic templates.”
Bridging objects are critical elements of the logical process. They are the points through which thinking must pass. Linguistic templates, on the other hand, are the connective tissue that links these points, providing context and background knowledge.
The surprising part: even a completely flawed line of reasoning can improve performance. Provided it is coherent.
Coherence is more important than accuracy. Structure precedes content.
It’s worth dwelling on this. It’s not the content that matters most, but the internal harmony. If the steps follow logically from one another, if the argument is consistent within itself, then the model performs better, even if the individual statements are false. It is as if the form of thinking were more important than its content. As if the connection itself created the possibility of understanding.
This suggests that step-by-step thinking does not “teach” the model to draw conclusions. Rather, it brings out what it already knows. The coherent structure is the key that unlocks existing knowledge. It does not provide new information, but access.
Form validates content. That is why a well-crafted lie is dangerous.
If we apply this to the human world: it is not enough to tell someone what the correct answer is. It is not enough to list the facts. Understanding requires a framework into which the facts fit. And this framework is not built from the facts themselves, but from the relationships between the facts. Anyone trained in systems thinking—whether in software design or organizational development—knows this: the relationship is always more important than the element. The element is replaceable. It is the relationship that turns a pile into an architecture.
V. Why does the instruction “Let’s think step by step” work?
There is a simple instruction that works with surprising effectiveness: “Let’s think step by step.” That’s it. No examples, no detailed instructions, just these five words. And the model’s performance improves significantly.
Kojima et al. (2022) called this zero-shot Chain-of-Thought prompting—and its results surprised the research community. There’s no need to craft complicated prompts or provide examples. All it takes is a single sentence, and the model generates the intermediate steps on its own.
But even this simple instruction has its weaknesses. Researchers at Singapore Management University analyzed a hundred incorrect answers and identified three patterns. The most common is misunderstanding: the model fails to grasp what the task is about—this accounts for twenty-seven percent of errors. The second is a missing step: the model skips part of the thought process, especially when there are many steps—this accounts for 12 percent. The third is a calculation error: a simple arithmetic mistake—this accounts for 7 percent.
The problem of missing steps is the most serious, because the model doesn’t know what it doesn’t know. The answer to this is to separate planning from execution: “First, let’s understand the problem and make a plan. Then let’s execute the plan step by step.” The plan is like a map: it shows where we need to go before we set out. This “plan-and-solve” approach consistently yields better results than simply “thinking step by step.”
Why does either of them work? Researchers can’t explain it exactly. There are a few hypotheses. One suggests that the instruction activates patterns the model encountered in similar situations during training. Another suggests it slows down the inference process, thereby reducing the likelihood of “fast and wrong” answers. A third suggests that it simply generates more text units (tokens), and more tokens provide more opportunities for correction.
But none of these is entirely satisfactory.
Perhaps because the question itself is poorly framed. The point isn’t what happens inside the model, but what happens in the communication. The instruction changes the rules of the game. We’re not expecting a single answer, but a process. We’re not expecting a result, but a path. And this shift in expectations changes what we get.
It’s as if we were telling someone: “Don’t answer right away; think out loud.” Most people behave differently when they know that their thought process is being observed, not just the final result. They become more cautious, but not in the paralyzing sense of caution. Rather, they become more attentive. They pay closer attention to whether what they say truly follows from what came before.
The instruction is not just content. The instruction itself is form. And the form works.
VI. The Tree of Thoughts — A Branch Against Dead Ends
Step-by-step thinking has a hidden limitation: it moves forward along a single path. If the path leads to a dead end, there is no turning back.
Yao et al. (2023), researchers at Princeton and Google DeepMind offer a solution to this problem. In the “Tree of Thoughts” (ToT) approach, the model does not generate a single chain of thoughts, but multiple ones in parallel. At every branching point, it evaluates which branch seems more promising. If one branch leads to a dead end, it backtracks and heads in another direction.
This is how a chess player thinks: they don’t just see the next move, but look several moves ahead and weigh the possibilities. The results are surprising. In a mathematical game—where you have to generate twenty-four numbers from four—traditional step-by-step thinking yielded a four percent success rate. The tree of thought yielded seventy-four percent. A twentyfold improvement. Not gradual, but exponential.
But it comes at a cost. Traversing the tree takes time and computational resources. Every branch, every backtrack, every evaluation consumes resources. There is no free lunch between speed and accuracy—economists and mathematicians call this TANSTAAFL (There Ain’t No Such Thing As A Free Lunch), and it applies to the physics of thinking as well.
The same is true of human thinking. A quick, instinctive response is often enough. But there are situations where it’s worth pausing, running through the possibilities, taking a step back, and rethinking. The question is: which approach is appropriate when? Neither a machine nor a human can always know this in advance. Experience—and perhaps wisdom—lies precisely in the quality of this decision.
VII. Complexity Is Not the Enemy
One recurring lesson from research: harder examples are better. If we want the model to perform well on a task, the examples included in the prompt (few-shot examples—training with just a few examples) should be complex rather than simple.
This contradicts what one might instinctively think. Wouldn’t it be more logical to start with easy examples and gradually increase the difficulty? Wouldn’t it be more practical to present problems that are similar in difficulty to the task at hand?
The answer: not necessarily.
A complex example generates a longer chain of thought. The longer chain gives the model more “space” to work in. More intermediate steps, more checkpoints, more opportunities for correction. After a simple example, the model tends to “cut corners”—jumping quickly to the final result. The complex example teaches that this is not acceptable.
Complexity slows things down. Slowing down demands attention. Attention creates depth.
There is something here that goes beyond artificial intelligence. When we train someone for a job, we often start with simple tasks so as not to scare them off. But perhaps by doing so, we are actually teaching superficiality. Perhaps the easy examples shown at the beginning convey the message: “You don’t have to think deeply here.” And this message persists even when more difficult tasks come along.
Google researchers discovered something surprising: it matters who writes the examples. Three different people solved the same math problem step by step, and the model’s performance differed depending on whose solution it saw. Not because one was wrong—all three were correct. But the style, the order of the steps, and the level of detail in the intermediate explanations differed. The model is sensitive to how we present our thinking, not just to what we present.
This is also familiar from teaching, leadership, and mentoring. When I give an example to a group, I face the same questions. It should be difficult enough to be a challenge, but not so much that it discourages them. It should be relevant, but not so much that it’s “all about us”—because then the distance needed for self-reflection is lost. Choosing good examples is part of the art of teaching. In dialogue with the machine, this hasn’t changed. It’s just become visible.
Whoever teaches always makes a choice: what to show and what to leave unsaid. The choice itself is the teaching.
VIII. What does certainty reveal about us?
The final step in self-reflective prompting is self-assessment. The model states how certain they are of their answer—with a number, a percentage, or in words.
Certainty is not a sign of knowledge. Often, it is a sign of ignorance.
The researchers compared the level of confidence expressed with the actual accuracy. The results are nuanced—and unsettling.
In 55 percent of cases, the model was correctly confident: it indicated high certainty, and the answer was indeed correct. In 6 percent of cases, it was correctly uncertain: it indicated low certainty, and it was indeed wrong. But in 32 percent of cases, it was wrongly confident: it gave a high certainty rating while providing the wrong answer.
That 32 percent is the danger zone.
Anyone who has ever worked as a consultant is familiar with this pattern. The client states the problem with complete conviction. They are precise, decisive, and clear. And that is precisely the problem. Internal certainty and actual accuracy do not move in tandem. There is not the correlation between confidence and accuracy that we would like to see.
The self-reflective approach attempts to provide an answer to this as well. The certainty-assessment step does not solve the problem, but at least it makes it visible. If the model gives a number, if they openly assess their own reliability, then there is something to work on. There is something to fine-tune. There is a place to start the dialogue about what lies behind the certainty.
It is the same in human work. The goal is not for everyone to always know exactly how certain they are about something. The goal is to have a practice, a question that regularly brings us back to: “How certain am I?” and “Based on what?”
IX. The Problem of Faithfulness — When the Explanation Does Not Tell the Truth
One of the most important chapters in the review studies is about “faithfulness.” Faithfulness here means whether the reasoning generated by the model truly reflects the decision-making process. Whether the chain of thought we see is really how the model “thought”—rather than a retrospective explanation.
This is by no means self-evident.
Most current methods work by having the model generate both the thought process and the final answer simultaneously. Nothing guarantees that the two are actually connected. It may be that the model already “knows” the answer, and the thought process is merely a fabricated explanation added after the fact. It may be plausible, coherent, and convincing—but it is not true in the sense that it did not lead to the answer.
Fidelity is not a given, but a practice. It is continuous, laborious, and never complete.
If this sounds familiar, it’s no coincidence. The human mind works exactly this way. The decision happens—often below the level of consciousness—and then the explanation follows. The story we use to convince ourselves and others that a rational process took place, when in reality something else happened. This has been documented by numerous psychological studies: from research on patients with split brains (Michael Gazzaniga’s classic experiments) to choice blindness (the work of Lars Hall and Petter Johansson). People rationalize in hindsight; they do not think in real time.
Artificial intelligence has not solved this problem. Rather, it has held up a mirror to us.
The explanation we give for our decision is rarely the story of the decision itself. It is mostly a story told in hindsight.
This isn’t merely a technical issue. It’s also an ethical issue. If a system provides a justification, but that justification is just a facade, then the system is lying. Not intentionally, not maliciously, but it is lying. And anyone who accepts that facade as the truth is placing their trust on the wrong foundation. This applies to machine justifications—and it applies just as much to the story we tell ourselves in the mirror each morning about why we made the decisions we did.
X. The Shadow of Hallucinations—When the Form Is Perfect, but the Content Is Empty
There is a phenomenon that researchers refer to as a “hallucination.” The model confidently asserts something that is not true. It does not lie in the human sense of the word, because it has no intention to do so. It simply states the untrue in a fluent, grammatically correct, and convincing tone.
This is particularly dangerous because the form is perfect—only the content is flawed. It is as if a well-dressed, confident speaker were speaking, pronouncing every sentence with authority, but the data is incorrect. The audience is inclined to believe him because his appearance and tone suggest that everything is in order.
A comprehensive study by Harbin University distinguishes between two types of hallucinations. One is internal contradiction: the model provides an answer that is inconsistent with itself. The other is factual inaccuracy: the model asserts something that is simply not true in the real world. The two may overlap, but their roots are different.
It doesn’t matter what we say. What matters is how what we say fits together.
Step-by-step thinking does not solve the problem of hallucinations. In fact, it sometimes exacerbates it. The model arrives at a false statement through a step-by-step, logically constructed line of reasoning, and it is precisely the logic of that reasoning that makes the lie convincing. Form validates content—and that is precisely why the coherence trap is so dangerous. Referring back to what we discussed in Part IV: coherence improves performance. But coherence also improves false performance. Structure is a weapon that fires in both directions.
This is a warning. It is not enough that the answer sounds good. It is not enough that the line of reasoning is followable. The ultimate standard is always whether what we say is true. And the model cannot apply this standard to itself. We must apply it.
XI. Where is the threshold at which thinking “unfolds”?
There is a limit below which step-by-step thinking does not work. Roughly ten billion parameters (10B parameters). Below this, models do not improve from the prompts in the thought chain. They often get worse. They talk nonsense: they generate text that is fluent, grammatically correct, and sounds logical, but is completely meaningless.
What we see as emergence in machines has long been present in humans. We just hadn’t seen it so clearly.
Researchers call this an “emergent ability.” It doesn’t appear gradually as the model grows. It comes suddenly, once a certain threshold is crossed. Like self-awareness in childhood, or the ability to walk: not learning—rather, emergence. One moment it isn’t there, the next it is.
Wei et al. (2022) systematically documented this phenomenon. GPT-3 (175 billion parameters) and PaLM (540 billion parameters) improved dramatically with Chain-of-Thought prompts. Smaller models, however, did not—or even performed worse. The boundary is not sharp, but it exists. Somewhere around the ten-billion-parameter mark, something happens that researchers do not yet fully understand.
But even sub-threshold models can be trained. If not through prompting, then through fine-tuning. If not through fine-tuning, then by transferring knowledge from larger models—this is called distillation. The smaller ones learn from the larger ones, and what emerged naturally in the larger models can be taught to the smaller ones.
There is something reassuring and something unsettling about this. Reassuring, because the capability can be disseminated, transferred, and made widely accessible. Unsettling, because the ten-billion-parameter limit seems arbitrary. Why exactly there? What happens at that point? No one knows. And what cannot be understood cannot be controlled.
XII. Layers and Depth
If we step back and look at the studies as a whole, a pattern emerges. Prompting—every successful method for improving communication with the model—does the same thing: it adds layers.
Step-by-step thinking adds a layer on top of “what is the answer”: “how do I get to the answer.” The self-reflective approach adds a layer on top of that: “why do I think this is the right path.” Confidence assessment adds a layer on top of all of these: “how much do I trust what I think.” And the question of authenticity adds yet another layer: “Do I really think this way, or am I just talking about how I think?”
This isn’t overcomplication. It’s depth.
The difference is the same as that between looking at a map and walking the terrain. A map is a single layer: it shows what is where. During a field trip, we get to know the soil, the vegetation, the waterways, the smells, the sounds, the light. We see multiple layers at once, and the connections between them as well.
The history of prompting is, in fact, the history of the archaeology of thought. We dig down layer by layer—and beneath every layer lies another. The question is not how many layers there are. The question is whether we dig deep enough to understand what we are looking for.
XIII. Ensemble as Collective Wisdom
There is a method that draws conclusions from running the model multiple times: ensemble (self-consistency). We ask the same question multiple times, with different settings or different examples, and form some kind of collective result from the answers. We take a vote. We look for consensus. Or we examine the extremes.
A good summary is not an abbreviation. A good summary knows what is the core and what is the shell, what is the figure and what is the background.
This idea is not new. It has long been known in the science of numbers, machine learning, and decision theory: the sum of many weak opinions is often stronger than a single strong opinion. The wisdom of crowds (a concept by James Surowiecki) works, under certain conditions.
Google researchers measured an improvement of nearly eighteen percent in mathematical tasks compared to traditional step-by-step thinking. Not because the model became smarter, but because if multiple paths can lead to the answer for a complex question, and these paths arrive at the same destination, then that answer is likely correct. If they lead to different places, then we have reason to be skeptical.
But there’s a catch here, too. If the problem is simple to begin with and the model gives the right answer at first, averaging only introduces confusion. If the problem is very difficult and the model consistently gives the wrong answer, averaging doesn’t help—it only reinforces the error. Aggregation works where there is uncertainty, but the uncertainty is not total. Where errors are random, not systematic. Where different paths lead to different errors, and the errors cancel each other out.
The same logic applies to human decision-making. The wisdom of the crowd doesn’t always work. But when it does, it’s stronger than any individual decision. The question is: how can we recognize which situation we’re in?
XIV. The Art of Breaking Down Problems into Subproblems
Sometimes the task is too big. When the model—any model, with any prompt—cannot solve it all at once. In such cases, decomposition helps: breaking the big problem down into smaller ones, solving the smaller ones one by one, and then putting them back together.
Decomposition is not a trick. Decomposition is an act of understanding.
There is a pedagogical principle that comes from early childhood education: from the smallest amount of help to the greatest—this is called scaffolding. First, let the child try on their own. If they get stuck, give a little help. If they’re still stuck, give more. In prompting, this works the other way around: first we solve the simplest subproblem, then the next one, and so on. Every solved part helps with the next one. The answer builds on the previous answer.
This way, the model may be able to solve tasks that are harder than anything it has seen in the examples. This is no small feat. Most machine learning systems can only handle what they’ve already seen. Here, something different is happening: mastering simple steps enables the solution of complex ones. Google researchers tested this on a test called SCAN, where traditional methods achieved 16% accuracy—while the incremental approach achieved 99%.
The key: decomposition doesn’t happen on its own. It’s not self-evident where the boundaries lie between subproblems. It’s not obvious what order to proceed in. And it’s not simple how the whole thing comes together in the end. Good decomposition is a creative process in itself. You have to understand the structure of the problem to break it down into good pieces. If we break it down poorly, solving the parts will not lead to solving the whole.
In programming, we’ve known this for a long time. Much of good software design is about how we break the system down into units. Where we draw the boundaries. What belongs together, and what doesn’t. A bad breakdown is worse than not breaking it down at all. A good breakdown makes possible what would otherwise be impossible.
XV. What does all this mean for the person asking the questions?
The art of prompting is ultimately the art of asking questions. And the art of asking questions is ultimately the practice of self-knowledge.
When I ask questions, I reveal how I think. I reveal what I consider important, what I take for granted, and where my blind spots lie. The structure of my questions is the structure of my thinking. If I ask a one-step question, I think in one-step terms. If I break the question down into steps, I break the problem down into steps.
A good prompt isn’t a trick. It isn’t a keyword, a formula, or a secret recipe. A good prompt is a reflection of how I want to think. And if the prompt improves, it improves along with me, not in my place.
Three lessons emerge from the entire field of research:
First: Thinking is not a single leap. It is not a flash of insight, nor an epiphany—but a series of steps that must be taken consciously, deliberately, and consistently. Understanding is sensitive to structure. Facts do not come together on their own. They must be put together. And the quality of that assembly matters.
It’s not a problem if we’re wrong. The problem is when we don’t know when we’re wrong.
Second: complexity is not the enemy. Slowing down is not a loss. When the easy answer is tempting, it’s worth pausing to check if we’ve missed something. Speed is often just a cover for staying on the surface. Kahneman’s System 1 is fast and efficient—but without System 2, the price of speed is accuracy.
The third: certainty is suspicious. Not because we’re always wrong, but because we rarely know when we’re not wrong. The link between confidence and accuracy is weaker than we’d like. The word “I know” often masks the truth of “I assume.”
And finally: fidelity is not a given, but a practice. That what we say truly reflects what we think. That the story we use to explain our decisions is truly the story of the decision—not an after-the-fact justification. This is not a natural state. It is work. Continuous, laborious, and never finished.
XVI. The Machine and the Mirror
Large language models hold up a peculiar mirror to us. When we want to improve their performance, we unwittingly learn about our own thinking. When we try to understand why they answer certain questions better—we are, in fact, learning how we understand anything.
The machine is a mirror. But the one who stands before it is the one looking into the mirror.
Thinking about thinking—metacognition—is not an invention of artificial intelligence. It has been around for millennia. From Socrates to Zen Buddhism, from Descartes to cognitive psychology. The difference: now it can be put into practice. It can be broken down into steps, tested, and measured. We find out what works and what doesn’t. We find out where the error patterns are and how they can be reduced.
This doesn’t mean the machine thinks. It doesn’t mean the machine understands. It means that, through the machine’s responses, it becomes clearer what it means to think and to understand. It’s as if we had to explain something in a foreign language, and in the process, we realized just how much we took for granted in our native tongue. Prompting is exactly this: translating thought into a foreign language—where the “foreign language” is that ruthlessly unambiguous form that tolerates no looseness, no tacit assumptions, no comfortably vague phrasing.
Prompting isn’t important because it makes the machine’s responses better. It’s important because the path toward better prompts also makes the questioner better. And a better questioner gets better answers—not just from the machine, but from everyone.
There’s a downside to this knowledge, too. What we can use to ask better questions can also be used to ask worse ones. Prompting is not a neutral tool. Researchers have shown that with well-chosen prompts, a model’s security constraints can be circumvented—this is called jailbreaking. Anyone who understands how the system works may be able to make it do things its creators did not intend.
So the question is not just how to ask better questions. It is also what we use our knowledge for. The machine is a mirror, but it is the person standing in front of it who looks into the mirror. And what they see depends on them.
Key Takeaways
- The structure of the question is the structure of your thinking: the prompt isn’t meant for the machine—it’s a reflection of your thinking. If you improve the prompt, you improve your thinking.
- Coherence precedes truth: a coherent structure evokes existing knowledge—but it can equally validate false knowledge. Form is a double-edged sword.
- Certainty and accuracy do not go hand in hand: in thirty-two percent of cases, the model is confidently wrong. So are humans. The word “I know” is the most dangerous word in the dictionary.
- Thinking is not a single layer, but many: step → justification → self-reflection → certainty assessment → fidelity check. Each layer adds depth, not complexity.
- The problem of fidelity applies to every thinking system: what we say about a decision is rarely the story of the decision. This is true for machines, and it is true for us as well.
- Decomposition is not a trick, but understanding: if I know how to break down the problem, then on some level I already understand what I am doing.
- Prompting is the training ground for metacognition: thinking about thinking—now, for the first time, it can be operationalized, measured, and improved.
FAQ
What is Chain-of-Thought prompting, and why is it important? Chain-of-Thought prompting means asking the language model to show the intermediate steps of thinking—not just the final result. Since Wei et al.’s 2022 research, we know that this dramatically improves performance on complex tasks. But its importance extends beyond the machine: the technique teaches us that the quality of thinking depends on the quality of the intermediate steps—not on the speed of the final result.
Why is prompting said to be a practice in self-awareness? Because the structure of your questions reflects the structure of your thinking. If you ask one-step questions, you think in one-step increments. If you don’t break things down into steps, you don’t break down the problem. Improving your prompt isn’t technical optimization—it’s recognizing where your blind spots in thinking lie. What you ask of the machine is about how you organize your own mind.
What is the difference between Chain-of-Thought and Tree of Thoughts? Chain-of-Thought moves forward along a single path—like someone walking down a trail, hoping they’re heading in the right direction. The Tree of Thoughts (Yao et al., 2023) sets off down multiple paths at once, evaluates the branches, and backtracks if it reaches a dead end—like a chess player thinking several moves ahead. The result can be dramatically better (4% vs. 74% in a benchmark test), but the cost is higher computational overhead and slower response times.
Related Thoughts
- The Metacognitive Revolution — If prompting is the mirror of thought, metacognition is the wall behind the mirror. It is the only area where humans remain unbeatable.
- The Decision Tsunami — The neurobiological foundations of Kahneman’s System 1/System 2 dichotomy — and why the prefrontal cortex goes offline precisely when it’s needed most.
- Artificial Mentors — If the prompt is an X-ray of thought, the chatbot mentor is the gym for thought. It doesn’t answer — it asks questions. And in doing so, it sometimes provokes more than any advisor.
Key Takeaways
- Prompting is not merely a technical trick, but a reflection of the structure of our own thinking. As CORPUS emphasizes, prompt engineering is partly an art, and it depends on the context of the task and the nuances of the model. The question we ask reveals more about our own mental architecture than the machine’s response does.
- Coherence and the clarity of the train of thought are often more important than initial accuracy. The effectiveness of Chain-of-Thought prompting shows that a coherent but potentially flawed path can lead to better results because it activates the model’s deeper knowledge—similar to how a well-structured example (the Provide Examples principle) can aid understanding.
- Overconfidence and overriding first impressions are dangerous pitfalls for both models and humans. Research shows that models often abandon their correct intuitions in favor of a more complex but flawed line of reasoning, which highlights the tension between Kahneman’s System 1 and System 2 thinking.
- The steps of self-reflective prompting—interpretation, critical review, self-assessment—essentially represent the formalization of conscious, slow thinking (System 2). This focuses not only on the “how” but also on the “why,” which is the key to true understanding.
- The essence of effective prompting lies in accurately conveying context and intent, not in perfect sentence structure. As noted in CORPUS, there is no universal formula; success depends on taking into account the task’s context, the goal, and the model’s specific characteristics, which can be directly paralleled with the challenges of human communication.
Zoltán Varga - LinkedIn
Neural • Knowledge Systems Architect | Enterprise RAG architect
PKM • AI Ecosystems | Neural Awareness • Consciousness & Leadership
Where the prompt meets the mirror — that’s where thinking begins.
Strategic Synthesis
- Translate the thesis into one operating rule your team can apply immediately.
- Monitor one outcome metric and one quality metric in parallel.
- Run a short feedback cycle: measure, refine, and re-prioritize based on evidence.
Next step
If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.