Metacognitive abilities are skills that allow you to monitor and improve how you think and learn. These include things like recognizing your strengths and weaknesses, being aware of the limits of your knowledge (epistemic humility), asking yourself the right questions (self-questioning), spotting your own mistakes (error detection), and adjusting your strategies when something isn’t working (self-correction).
Developing these skills can make a huge difference in life, yet they hardly get the attention they deserve. They’re the unsung heroes of intelligence—running quietly in the background, making everything work better. One of them, however, has fascinated me ever since I first made its acquaintance.
1. Calibration
For those not in the know, calibration is basically a way to measure how well your confidence matches reality. In other words, a person is well-calibrated if, when they say they’re 70% confident about something, they’re right 70% of the time.
This isn’t just some abstract concept—you can actually test how good you are at it. Here’s a free test you can try. Remember, calibration isn’t about always being right; it’s about matching your confidence levels to reality.
As far as I know, the concept of calibration was popularized by notorious nerd Julia Galef in her book The Scout Mindset: Why Some People See Things Clearly and Others Don’t.
As you can imagine, having your confidence align with reality is an unbelievably useful skill—I can hardly overstate it. It allows you to calculate expected values with precision and make better decisions, and since life is ultimately a series of decisions, this skill plays a huge role in determining how well things turn out for you.
One extraordinary aspect of calibration tests is their content-agnostic mechanism. Because calibration applies universally to any probabilistic domain, the validity of a calibration test doesn’t depend on the nature of its questions—whether they’re about sports, politics, or flatulists. What matters is only if confidence judgments correspond to actual outcomes.
This content-agnostic nature makes calibration tests easy to generate dynamically using algorithms or randomization, and it also makes them harder to game or rig in favor of any particular individual. It’s the kind of quality that could make them incredibly useful for evaluating decision-makers in industry and society. But, of course, we never use them.
My wet dream would be a public database cataloging predictions made by political and financial pundits, complete with their stated confidence levels and historical track records. I believe it would be profoundly illuminating for humanity, especially in our times.
2. Are We Sleeping on Calibration?
Given how important and easy to test calibration is, I was a little surprised to find only a handful of recent studies on the topic that were hardly cited. Perhaps my Google-fu is failing me. An interesting question is whether calibration skills can be improved. There’s some evidence suggesting that training on tests with performance feedback—like the one linked above—can help. However, a recent study reported a negative result in this area, so the evidence is mixed.
Alongside the question of how to train calibration, it seems to me that there are plenty of other interesting open questions:
What other traits or measures does calibration correlate with? Does a higher IQ mean better calibration, or is it unrelated?
Preliminary evidence appears to point towards unrelated - which to me seems quite extraordinary.
Are people more miscalibrated on political questions?1
I’m thinkign of questions like: “Is the percentage of trans people in America right now over or under 5%?”
How does calibration vary across different political affiliations?
Can political miscalibration be corrected through feedback faster or slower than non-political miscalibration?
But I’m probably tanking the signal-to-noise ratio of this article at this point. So let me just say this: if anyone out there wants to plan a study and needs a thoroughly unimportant academic statistician, feel free to hit me up.
Too obvious? (yes)
You can do this at the prediction aggregating website Metaculus. I used to be a top 20 forecaster for some years and found their tools quite helpful. You get your own calibration curve broken up into 21 segments of grey ranges indicating 90% credible/confidence interval of a perfect calibration (for those wondering, I have 13). This is great, but it used to actually be even better as they would give you the raw numbers instead of having to estimate it visually. There are other platforms, but they aren't as widely used.
One problem I see though, with wishes like these... :
> My wet dream would be a public database cataloging predictions made by political and financial pundits, complete with their stated confidence levels and historical track records.
...is that it's quite easy to game. If calibration became a prominent metric people would just make a lot of predictions on "easy" predictions (e.g. will the sun rise tomorrow?) and cash that in for their "hard" predictions (this meme-stock I own will rise in value, trust me I have a calibration of 99%). So we need a second metric of how hard questions are. But to calibrate that we would have to incentivize a lot of people to spend a lot of time answering a lot of questions. Some people use that as an argument for prediction markets, but given that we are terrible at solving problems with both our information landscape, *and* problems with market failures, I shudder to think what a prediction market (which is rightly also called an information market) will do to the world.
This is such a handy, easy to understand concept.
Whether or not it’s possible to improve your own calibration, the notion helps cement why we should be wary of people who habitually express certainty and favor hyperbolic language.