These are my notes from the book "The Signal and the Noise" by Nate Silver. The first 25% of the book starts a little slow in my opinion, talking about things that don't seem terribly useful. However, as the book progresses, it becomes much more relevant and demonstrates the power, and manipulations, of predictions and how they influence our society, economy, and personal decision making. I'd highly recommend this book to anyone who wants to understand how humanity makes decisions and how to make better ones by drawing meaning from data and avoiding the trap of deriving meaning from noise.
The whole purpose of this book is to describe how society can make better predictions. But making better first guesses under conditions of uncertainty is an entirely different enterprise than second guessing. Asking what a decision maker believed, given the information available to her at the time, is a better paradigm than pretending she should’ve been oracular.
Every time you’re making a trade, whether it’s a stock or a baseball player, there’s another, probably smart and equally well trained person on the other side of that trade. Or to expand the metaphor further, suppose you are engaging in an exchange of ideas. Are you sure you know something they don’t? Are you sure they don’t know something that you don’t? If your side of the bargain is so appealing, then why are they willing to trade with you in the first place? There is obviously some sort of balance here. On the one hand it is presumptuous to think that you, just as one person are wiser than the consensus around a subject, which in theory, reflects a lot of smart people‘s views and experiences combined. On the other hand, society would never move forward if nobody questioned that consensus. And it’s not as though that consensus has delivered us to a particularly good place.
We face danger whenever information growth outpaces the rate at which we can process it.
If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypothesis to test, so many data sets to mine, but a relatively constant amount of objective truth.
The printing press changed the way in which we make mistakes. Routine errors of transcription became less common. But when there was a mistake it would be reproduced many times over.
Appreciate the distinction between risk and uncertainty. Risk is something you can put a price on. Uncertainty is risk that is hard to measure. You might have some big awareness of the demons lurking out there, you might even be acutely concerned about them, but you have no idea how many of them there are or when they might strike. Your back of the envelope assessment might be off by a factor of 100 or a factor of 1000. There’s no good way to know. This is uncertainty. Risk greases the wheels of a free market economy. Uncertainty grinds them to a halt.
If you are in a market and someone’s trying to sell you something that you don’t understand you should think that they’re selling you a lemon. George Ackerlov wrote a famous paper on the subject called the market for lemons that won him a Nobel prize. In the paper he wrote that in a market plagued by asymmetries of information, the quality of goods will decrease and the market will come to be dominated by crooked sellers and gullible or desperate buyers.
Greed and fear are volatile quantities however and the balance can get out of whack. When there is an excess of greed in a system there is a bubble. When there is an excess of fear, there is a panic.
Ordinarily we benefit from consulting our friends and neighbors before we make a decision. But when their judgment is compromised that means ours will be too.
When you layer the large uncertainty intrinsic in measuring stimulus on top of the large uncertainty of an economic forecast of any kind, you have the potential for a prediction that goes very badly.
Beware the “out of sample” problem in making predictions.
One of the pervasive risks that we face in the Information Age is that even if the amount of knowledge in the world is increasing, the gap between what we know and what we think we know may be widening. This syndrome is often associated with very precise seeming predictions that are not at all accurate.
News coverage is produced every day. Most of it is filler packaged in the form of stories that are designed to obscure it’s unimportance.
Wherever there is human judgment there is the potential for bias. The way to become more objective is to recognize the influence our assumptions play in our forecast and to question ourselves about them.
Good innovators typically think very big and very small. New ideas are often found in the most granular details of problems where most people don’t bother to look. And they are sometimes found when you are doing your most abstract and philosophical thinking. Considering why the world is the way it is and whether or not there may be an alternative to the dominant paradigm. Rarely can they be found in the temperate latitudes between these two spaces where we spend 99% of our lives.
Earthquakes cannot be predicted. They can however be forecasted.
In statistics the name given to the act of mistaking noise for signal is called overfitting.
A prediction interval is a range of the most likely outcomes that are prediction provides for, much like the margin of error in a poll. For example a 90% prediction interval is supposed to cover 90% of the real world possible outcomes.
If you just look at the economy as a series of variables and equations without any underlying structure, you are almost certain to mistake noise for a signal. And you may delude yourself and gullible investors into thinking you are making good forecasts when you are not.
Statistical inferences are much stronger when they are backed up by some theory or at least deeper thinking about their root causes.
One of the most useful quantities for predicting disease spread is called the basic reproduction, number usually designated as R0, and measures the number of infected individuals that can be expected to be infected by an individual with the disease. An R0 of four, for example, means that an infected individual could be expected to pass the disease along to four other individuals before recovering or dying from it.
Diseases and other medical conditions can have a self fulfilling property. When medical conditions are widely discussed in the media people are more likely to identify their symptoms, and doctors are more likely to diagnose or misdiagnose them. The best known case of this in recent years is autism.
In diseases that have no causal mechanism, news events precipitate increased reporting.
What we find again and again and again is that the more particular condition is on peoples minds and the more it’s a topic of discussion, the closer the reporting gets to 100%.
A self canceling prediction is a case where a prediction tends to undermine itself. GPS navigation systems are an example of this, because as the GPS sends more drivers on the fastest route, that route no longer becomes the fastest.
In epidemiology the traditional models that doctors use are quite simple and they are not working that well. The most basic mathematical treatment of infectious disease is called the SIR model. The model, that was Formulated in 1927, posits that there are three compartments any given person will reside at any given time. “S” stands for being infected by the disease, “I” stands for being infected by it, and “R” stands for being recovered from it.
One fairly well-established principle is that people’s willingness to engage in inconvenient but healthful measures like vaccination is linked to the risk they perceive of acquiring the disease.
As the statistician George EP Box wrote, “All models are wrong, but some models are useful.” What he meant by that was that all models are simplifications of the universe, as they necessarily must be. The best model of a cat, is a cat.
It should be a given that whatever forecast we make, on average, will be wrong. So usually it’s about understanding how it’s wrong, or what to do when it’s wrong, and minimizing the cost to us when it’s wrong. The key is remembering that a model is a tool for helping us understand the complexities of the universe, never a substitute for the universe itself.
This is why it’s so crucial to develop a better understanding of ourselves and the way we distort and interpret the signals we receive.
When we fail to think like a Bayesian, false positives are a problem for all of science.
This is why our predictions may be more prone to failure in the era of big data. As there is an exponential increase in the amount of information, there is likewise an exponential increase in the number of hypotheses to investigate.
Why is the error rate so high? There are many reasons for it, some having to do with our psychological biases, some having to do with common methodological errors, and some having to do with misaligned incentives. Close to the root of the problem, however, is some type of flawed statistical thinking that these researchers are applying.
If you use a biased instrument, it doesn’t matter how many measurements you take. You’re aiming at the wrong target.
Essentially, the frequentist approach to statistics seeks to wash its hands of the reasons that statistics often go wrong - human error. It treats uncertainty as something intrinsic to the experiment rather than something intrinsic to our ability to understand the real world.
The frequentist method also implies that as you collect more data, the error approaches zero.
The bigger problem is that the frequentist methods, in striving for immaculate statistical procedures that cannot be contaminated by the researchers bias, keep the researcher hermetically sealed off from the real world. These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis - something that the Bayesian method demands in the form of a prior probability. Yes you will see apparently serious papers published on how toads can supposedly predict earthquakes, or how big box stores like Target beget racial hate groups, which apply frequentist tests to produce statistically significant but manifestly ridiculous findings. Data is useless without context.
Absolutely nothing useful is realized when one person holds that there is a 0% probability of something happening argues against another person who holds that the probability is 100%.
In accordance with Bayes’ Theorem, prediction is fundamentally an information processing activity – a matter of using new data to test a hypothesis about the objective world, with the goal of coming to truer and more accurate conceptions about it.
The heuristic approach to problem-solving consists of applying rules of thumb when a deterministic solution to a problem is beyond our practical capacities. Heuristics are very useful things but they necessarily produce biases and blind spots.
The blind spots in our thinking are usually of our own making and they can grow worse as we age.
Computers are very very fast at making calculations. But this does not mean that computers make perfect forecasts or even necessarily good ones. The acronym GIGO, or garbage in garbage out, sums up this problem. If you give a computer bad data or devise a foolish set of instructions for it to analyze, it won’t spin straw into gold. Meanwhile computers are not very good at tasks that require creativity and imagination, like devising strategies or developing theories about the way the world works. Computers are most useful to forecasters, therefore, in fields like weather forecasting and chess, where the system abides by relatively simple and well understood laws, but where the equations that govern the system must be solved many times over in order to produce a good forecast. They seem to have helped very little in fields like economics or earthquake forecasting, where our understanding of root causes is blurrier and the data is noisier.
As Arthur Conan Doyle once said, “Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.”
It’s important in most areas of life to come up with a probability rather than a yes or no. It’s a huge flaw that a lot of people make in the areas they analyzeWhether it be trying to form a fiscal union, pay for groceries, or hoping that they don’t get fired.
It is often possible to make a profit in fields where the competition succumbs to poor incentives, bad habits, or blind adherence to tradition, or because you have better technology or data than they do. It is much harder to be very good in fields where everyone else is getting the basics right, and you may be fooling yourself if you think you have much of an edge. In general society does need to make the extra effort at prediction even though it may entail a lot of hard work with no immediate reward. And we need to be more aware that the approximations that we make come with trade-offs. But if you are approaching prediction from a business perspective you’re better off finding a place where you can be a big fish in a small pond.
In the 1950s the average stock was held for six years. By the 2000s the average stock was held for only six months. Trading is rational only when it makes both parties better off. As in an investor who is getting ready to retire cashes out of stocks by selling them to an investor who is just getting his feet wet in the market. But very little of the trading that occurs on Wall Street today conforms to this view. Most of it reflects true differences of opinion - contrasting predictions about the future return of a stock. Never before in human history have so many predictions been made so quickly and for such high stakes.
If me and you can’t come to a consensus on our forecasts, then the law of the land says we must place a bet to settle our differences. In Bayes’ land you must make one of these two choices: come to a consensus, or bet. Otherwise to a Bayesian you’re not really being rational. If after we have our little chat and you still think your forecast is better than mine you should be happy to bet on it because you stand to make money. If you don’t, you should have taken my forecast and adopted it as your own. Of course this whole process would be extremely inefficient. We’d have to maintain forecasts on thousands and thousands of events and keep a ledger of the hundreds of bets we had outstanding at any given time. In the real world this is the function that markets play. They allow us to make transactions at one fixed price, at a consensus price, rather than having to barter or bet on everything.
Adam Smith’s invisible hand may be thought of as a Bayesian process, in which prices are gradually updated in responses to supply and demand, eventually reaching some equilibrium. Or Bayesian reasoning might be thought of as an invisible hand wherein we gradually update and improve our beliefs as we debate our ideas, sometimes placing bets on them when we can’t agree. Both are consensus seeking processes that take advantage of the wisdom of crowds. It might follow them that markets are an especially good way to make predictions. That’s really what the stock market is – a series of predictions about the future earnings and dividends of companies. My view is that this notion is mostly right most of the time.
A bubble is something that has a predictable ending. If you can’t tell you’re in a bubble, it’s not a bubble. In order for a bubble to violate efficient market hypothesis it needs to be predictable in real time. Some investors need to identify it as it is happening and then exploit it for a profit. Identifying a bubble is of course much easier with the benefit of hindsight.
Simply looking at the historical stock price in which the average has increased at a significantly faster rate can give you some inkling of a bubble.
In theory, the value of a stock is a prediction of the companies earnings and dividends.
One conceit of economics is that markets can perform rationally even if the participants within them perform irrationally. But irrational behavior in the markets can arise precisely because people are behaving rationally according to their incentives. So long as most traders are judged on the basis of short term performance, bubbles involving large deviations from their long-term values are possible and perhaps even inevitable. Herding can also result from deeper psychological reasons.
Bubbles are easier to detect than to burst.
Any investor can do as well as the average investor with almost no effort. Just buy an index fund that tracks the market.
Noisy data can obscure the signal even when there is virtually no doubt that the signal exists.
Predictions are potentially much stronger when backed up by the knowledge of the root causes behind a phenomenon.
But even if you believe, as Bayesian reasoning would have it, that almost all scientific hypotheses should be thought of probabilistically, we should have a greater degree of confidence in a hypothesis backed up by clear and strong causal relationships. Newly discovered evidence that seems to militate against the theory should nevertheless lower our estimate of its likelihood. It should be weighed in the context of the other things we know, or think we do about the planet and it’s climate. Healthy skepticism needs to precede from this basis. It needs to weigh the strength of new evidence against the overall strength of the theory rather than rummaging through fact and theory alike for ideological convenience.
Climate refers to the long-term equilibrium is that the planet achieves. Weather describes short term deviations from it.
It is right to correct a forecast when you think it might be wrong, rather than to fight to the quixotic death for it.
What distinguishes science and what makes a forecast scientific is that it is concerned with the objective world. What makes a forecast fail is when it is only concerned with the method, maxim, or model.
Over the very long run, the stock market essentially always moves upward. But this tells you almost nothing about how it will behave in the next day, week, or year.
This book encourages readers to think about the signal and the noise and to seek out forecasts that couch their prediction in percentage or probabilistic terms. They are a more honest representation of the limits of our predictive abilities. When a prediction about a complex phenomenon is expressed with a great deal of confidence it may be a sign of the forecaster has not thought through the problem carefully, has an overfitted statistical model, or is more interested in making a name for himself rather than getting at the truth.
Uncertainty is an essential and non-negotiable part of a forecast.
Under Bayes Theorem no theory is perfect. Rather, it is a work in progress, always subject to further refinement and testing.This is what scientific skepticism is all about. In politics, one is expected to give no quarter to his opponents. It is seen as a gaffe when one says something inconvenient and true. Partisans are expected to show equal conviction on a set of beliefs on social, economic, and foreign policy issues that have little intrinsic relation to one another. As far as approximations of the world go, the platforms of the Democratic and Republican parties are about as crude as it gets.
This book thinks of a signal as an indication of the underlying truth behind a statistical or predictive problem. Noise is random patterns that might easily be mistaken for signals.
There is a tendency in our planning to confuse the unfamiliar with the improbable. The contingency we have not considered seriously looks strange. And what looks strange is thought improbable. What is improbable need not be considered seriously.
“There are known knowns. These are things we know we know. We also know there are known unknowns. That is to say, we know there are some things we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.” – Donald Rumsfeld
We must always except some degree of risk from terrorism if we want to live in a free society, whether or not we want to admit it. To the extent we alter our behavior as free people, they’ve won.
In signal analysis, as with other types of prediction, it is very easy to see what you want in the mess of tangled data.
If we expect to find the world divided into the contours of the possible and the impossible, and room for a little else in between, we will end up with overconfident predictions on the one hand and unknown unknowns on the other.
Where our enemies will strike us is predictable. It is where we least expect them to.
The Bayesian approach towards thinking about probability is more compatible with decision making under high uncertainty. It encourages us to hold a large number of hypotheses in our head at once, think about them probabilistically, and to update them frequently when we come across new information that may be consistent with them.
When we are making predictions we need a balance between curiosity and skepticism. They can be compatible. The more eagerly we commit to scrutinizing and testing our theories, the more readily we accept that our knowledge of the world is uncertain, the more willingly we accept that perfect prediction is impossible, the less we will live in fear of our failures and the more freedom we will have to let our minds flow freely. By knowing more about we don’t know, we may get a few more predictions right.
Whatever range of abilities we have acquired, there will always be tasks sitting right at the edge of them. If we judge ourselves by what is hardest for us, we may take for granted what we do easily and routinely.
Natures laws do not change very much. So long as human knowledge continues to expand, as it has since Gutenberg’s printing press, we will slowly come to a better understanding of natures signals, but never all its secrets. If science and technology are the heroes of this book, there is the risk in the age of big data of been to starry-eyed about what they can accomplish. There is no reason to conclude that the affairs of men are becoming more predictable. The opposite may well be true.
Technology is completely changing the way we relate to one another. Because of the Internet, the whole context, all the equations, all the dynamics of the propagation of information change, I was told by Tim Berners-Lee who invented the World Wide Web in 1990. The volume of information is increasing exponentially. But relatively little of this information is useful. The signal to noise ratio may be waning. We need better ways of distinguishing the two. This book is less about what we know than the difference between what we know and what we think we know. It recommends a strategy so that we might close that gap. The strategy requires one giant leap and then some small steps forward. The leap is into the Bayesian way of thinking about prediction and probability. Think probabilistically. Bayes’ Theorem begins and ends with a probabilistic expression of the likelihood of a real world event. It does not require you to believe that the world is intrinsically uncertain. It does require you to accept, however, that your subjective perceptions of the world are approximations of the truth.
In many walks of life expressions of uncertainty are mistaken for admissions of weakness. When you first start to make these probability estimates you may be quite poor. But there are two pieces of favorable news. First, these estimates are just a starting point. Bayes’ Theorem will have you revise and improve them with new information. Second, there is evidence that this is something we can learn to improve.
Our brains process information by means of approximation. This is less an existential fact than a biological necessity. We receive far more inputs then we can consciously consider. We handle this problem by breaking them down into regularities and patterns.
Consider the following set of seven statements, which are related to the idea of the efficient market hypothesis and whether an individual investor can beat the stock market. Each statement is an approximation, but each statement builds on the last one to become slightly more accurate.
No investor can beat the stock market.
No investor can beat the stock market over the long run.
No investor can beat the stock market over the long run relative to his level of risk.
No investor can beat the stock market over the long run relative to his level of risk and accounting for transaction costs.
No investor can beat the stock market over the long run relative to his level of risk and accounting for transaction costs unless he has inside information.
Few investors beat the stock market over the long run relative to their level of risk and accounting for transaction costs unless they have inside information.
It is hard to tell how many investors beat the stock market over the long run because the data is very noisy, but we know that most cannot relative to their level of risk, since trading produces no net excess return and entails transaction costs. So unless you have inside information, you’re probably better off investing in an index fund.
Information becomes knowledge only when it’s placed in context. Otherwise we have no way to differentiate the signal from the noise. The search for the truth might be swamped by false positives.
What isn’t acceptable under Bayes’ Theorem is to pretend that you don’t have any prior beliefs. You should work to reduce your biases, but to say you have none is a sign that you have many. To state your beliefs upfront, to say “here’s where I’m coming from”, is a way to operate in good faith, and to recognize that you perceive reality through a subjective filter.
Bayes’ Theorem encourages us to be disciplined about how we weigh new information. If our ideas are worthwhile, we ought to be willing to test them by establishing falsifiable hypotheses and subjecting them to a prediction.
Prediction is difficult for us for the same reason it is so important. It is where objective and subjective reality intersect. Distinguishing the signal and the noise requires both scientific knowledge and self knowledge.