The Plastiras Lake in Greece is a typical example where everyone wants something different. The electricity company wants to use it to produce energy. The farmers want to use it to water their fields. The nearby city is supplied with water from it, and therefore wants the water to be of good quality. The tourist resorts around the lake would prefer it if the others didn’t take any water at all, because drawing water makes the lake less beautiful. So fifteen years ago we were given the task of studying these conflicting objectives and proposing a way to manage the lake better.
We can’t quantify beauty
The value of hydroelectric power is easy to calculate; the value of irrigation water is just a bit harder. But can we quantify beauty? We had lots of disagreement on that, even becoming, at times, angry with each other. In the end we published a scientific paper about it. Our answer was no.
Or can we?
However, recently I read an interesting book:
When the author says “anything”, he really means anything, even the value of beauty of the Plastiras Lake. While I am not ready to change my mind yet, the author really knows his stuff and he has good arguments. It’s a pity this book didn’t exist at the time we published the paper.
I found the book long and hard to read. So here is a summary of it to get you started.
The bad parts
The book is long and dense. I feel that the author talks too much. He often interrupts his presentation of a methodology in order to tell you a story. I suspect he does that in order to make reading easier, but he overdoes it and makes it harder. Take, for example, the presentation of the Monte Carlo method on page 125. First paragraph, he presents the problem. Second paragraph, he provides a very short explanation of what Monte Carlo is. At this point you would expect him to give an example to make it clearer. But no; in the third paragraph, he gives you a relatively long story of how Fermi, Ulam, von Neumann, and Metropolis devised the Monte Carlo method. Fourth paragraph, he tells you another related story from his own experience. He only starts to say more on Monte Carlo a couple of pages later.
After a few chapters I became so tired of this that I didn’t know which chapter I was in, and how that chapter was fitting in the entire book—in other words I had lost the main story line. I’m not certain it’s because of the stories or whether the book could be a bit better organized—maybe both. In any case, I started keeping notes in order to track where I am. They evolved into this blog post.
The good parts
I wanted to measure humidity in my home, and humidity is a notoriously difficult thing to measure accurately. Most humidity meters for home use come together with a thermometer, and the manufacturers will typically advertise a precision of ±1 °C, conveniently failing to mention the precision of the humidity meter. I didn’t know what to do. After I read Chapter 3 of the book, I just grabbed a temperature/humidity meter that I found somewhere and started to use it. I realized that a measurement doesn’t need to be perfect, and that it’s OK even if I don’t know the sensor error. What is important is that the measurement adds to my knowledge. I know that if it reads 40% on one day, and 50% the next day, then it is very likely that humidity is higher the second day (although the readings are probably very far from reality). I also know that it underestimates humidity, maybe by 15% or more. But it does give me useful information. Obvious? It seems so. Had I thought about it? No! I only thought about it after reading the book.
A reservoir engineer of a water company once told me that their request for their annual budget is justified with the need to be safe. When I heard that, having recently read the book, a bell ringed. What do you mean by “safety”? Do you mean the risk that an aqueduct breaks? The risk that we run out of water? The risk that the water becomes contaminated? The risk that a journalist asks us how much of our water comes from inflow and how much from rainfall, and we get negative publicity from an inability to reply? “Safety” is such a vague term it can mean anything, and as such you can’t measure it; you need to clarify what you mean. After you are told of this need for clarification, it seems obvious; but I only thought about it after reading the book.
So, the author does a really good job of stating things that are obvious—after you’re told. In addition, despite the fact that I disagree with some of his views, I appreciate the fact that he knows what he’s talking about. I am a bit weary of people who propose methods just because these methods are fashionable, without really understanding what they do.
Summary of “How to measure anything”
Chapters 1 and 2 are introductory and contain little beyond advocacy (but it’s necessary to read Chapter 2 because the author keeps referring to the stories mentioned in it throughout the rest of the book). The real beginning of the book is Chapter 3, with its definition of a measurement:
A measurement is a quantitatively expressed reduction of uncertainty based on one or more observations.
A measurement doesn’t need to be given in physical units such as miles or volts; it can state set membership, or it can be ordinal, like the four-star system for movies.
Besides “measurement”, we also need to define precisely the object of measurement. I already mentioned that “safety” is vague, and the same is true for other terms such as “strategic alignment”, “flexibility”, “customer satisfaction”, “mentorship”, “quality”, “risk”, “security”, “public image”, “employee empowerment”, and “ecological sustainability”. The author offers two neat solutions for clarifying a term. One is to keep asking why we want to measure something, and what might happen if we don’t measure it. The second solution is to pretend we are an alien being that has the capability of cloning organizations to perform controlled experiments; if the alien cloned our organization, what is the answer he would be looking for?
So, we have defined what a measurement is, and our object of measurement. We also need to define (Chapter 4) the decision that the measurement is supposed to support. The decision must have 1) two or more realistic alternatives; 2) uncertainty; 3) potentially negative consequences if it turns out you took the wrong position; and 4) a decision maker. It is also important to have clear definitions on uncertainty, risk, measurement of uncertainty, and measurement of risk. The author provides these definitions on page 84.
Chapter 3 also deals with the objections people often have against measuring something. These objections are either that it’s too expensive to measure, or that there is no usefulness in measuring, or that it is unethical. The first two are mostly caused by not using good definitions of “measurement” and the object of measurement, and unfamiliarity with measurement methods. There are simple methods that can often tell you much with very low cost. For example, the rule of five says there’s a 94% chance that the median of a population is between the smallest and largest values in any random sample of five from that population; the single sample majority rule says that, given maximum uncertainty about a population proportion, such that you believe the proportion could be anything between 0% and 100% with all values being equally likely, there is a 75% chance that a single randomly selected item is from the majority of the population. In fact, when fearing the cost of a measurement, it’s usually safe to make these four assumptions: 1) It’s been measured before; 2) You have far more data than you think; 3) You need far less data than you think; 4) Useful, new observations are more accessible than you think.
As for the ethical objections, the argumentation is quite convincing. Take this example of his: Should a 99-year-old with several health problems be worth the same effort to save as a 5-year-old? The author points out that whatever your answer is, it is a measurement of the relative value you hold for each.
Preparing for the measurement
If a measurement costs 100 thousand, and the value of the information it gives us is 10 thousand, we will obviously not perform the measurement. So the main purpose of our preparation is to determine what we already know and find out to what extent we need a measurement.
We first need an estimate as a starting point, and a good way to get it is to ask people to guess. In a fascinating Chapter 5, the author explains that people often say they have absolutely no idea, but with the proper guidance they can make fairly good estimates of a 90% confidence interval for a variable. Most people are overconfident (they choose too narrow intervals), and most of the rest are underconfident (too wide). The author describes several methods for calibrating these estimates.
Recall that the measurement is supposed to help support a decision, and that the decision has at least two realistic alternatives. The decision has potentially negative consequences if it turns out you took the wrong position; in one word, it has risk. In Chapter 6, the author presents a methodology for quantifying that risk with Monte Carlo. In Chapter 7, he presents some methods for estimating the value of information of a measurement.
How to measure
Chapter 8 is a collection of hints. In order to measure something, the first step is to decompose it. Instead of asking how many piano tuners there are in Chicago, it is better to ask what Chicago’s population is, what is the percentage of the population that has a piano, and how many pianos a tuner can manage. The second step is to search for prior work, e.g. on the Internet. The third is to find a way to observe the object. The “measurement instrument” that we need to design can be a poll, for example. We need to be asking what we’d see if the value is very high or very low (e.g. for a quality measurement problem, if the quality is better, what should I see? Fewer customer complaints?). We need to be iterative; i.e. to make a few observations and recalculate the information value. We need to consider multiple approaches. We need to ask what is the really simple question that could make the rest of the measurement moot. Finally, we need to just do it; even the first few observations might surprise.
Chapters 9 and 10 present some statistical methods for drawing conclusions from studying a sample. Such methods are the t-distribution, an extension of the rule of five, an investigation of the concept of statistical significance, sampling methods, and Bayesian statistics. If it is not physically possible to sample the population (e.g. because the population doesn’t exist yet, as in the case of creating a new product), then you need to create an experiment. In experiments it is often necessary to have a test group and a control group.
In the Chapter about Bayesian statistics, the author also deals with four myths: 1) The myth that absence of evidence is not evidence of absence (it is not proof, but it is evidence); 2) The myth that correlation is not evidence of causation (it is not proof, but it is evidence); 3) That ambiguous results tell us nothing (this is a variation of the first myth); 4) that something alone tells me nothing (a single piece of information may fall short of what we’d want, but it does tell something).
Dealing with subjective stuff
Chapters 11 and 12 contain some fascinating stuff. Here is an extract from Chapter 11 that deserves to be quoted as is:
It’s not uncommon for managers to feel that concepts such as “quality,” “image,” or “value” are immeasurable. In some cases, this is because they can’t find what they feel to be “objective” estimates of these quantities. But that is simply a mistake of expectations. All quality assessment problems—public image, brand value, and the like—are about human preferences. In that sense, human preferences are the only source of measurement. If that means such a measurement is subjective, then that is simply the nature of the measurement. It’s not a physical feature of any object. It is only how humans make choices about that thing. Once we accept this class of measurements as measurements of human choices alone, then our only question is how to observe these choices.
When we assess subjective stuff, the author says, we can use the Likert scale, multiple choice, rank order, and open-ended questions in questionnaires. He presents the Willingness-To-Pay method for measuring in specific units; he points out some problems it has, but overall, he says, it works. We can quantify risk tolerance by asking ourselves what risk would be acceptable for a given Return-On-Investment; thus, we create the investment boundary for our organization. And we can deal with multiple conflicting preferences by creating utility curves.
The human mind, the author says in Chapter 12, has some incredible abilities (such as recognizing a face or a voice instantly), but is susceptible to fallacies. It can be used as a measurement instrument if the fallacies are accounted for. Some effects that affect judgement are anchoring, halo/horns, bandwagon bias, and emerging preferences. When experts trust their own expert judgement, they don’t measure how often it is correct, and several studies show that they are way overconfident. Simple models often do better than them at predictions. Organizing the information can help, but it can be insufficient. Simple weighted linear models often help more, but they need to be applied correctly. Rasch models can help with several problems when there is no “invariant comparison” (invariant comparison means that if one measurement instrument says A>B, another measurement instrument should give the same answer; this is not true, for example, in IQ tests). The Lens model can remove expert inconsistency by creating a model calibrated by looking at expert choices.
The author warns against using traditional cost-benefit analysis, subjective weighted score, “information economics”, and “analytic hierarchy process”. He emphasizes that a measurement is a measurement only if there is reason to believe it reduces uncertainty, and casts several doubts on these.
The remaining two chapters contain suggestions on using new technologies (GPS, the Internet as a whole, mobile phones) as instruments of measurement and also present case studies and examples.
The battle of the giants
As I said, I ‘m not changing my mind (yet) about quantifying the beauty of the Plastiras Lake. For example, I still have my doubts about the Willingness-To-Pay method, for the reasons I explain in the paper. However, the author really knows his stuff and it’s probably the first time I see someone using these methods and having serious arguments for them. So this is the battle of the giants, Hubbard vs. the Dreyfuses. The latter approach the problem of decision making from a very different point of view and seem to attack the wisdom of the former.
I’m not certain, however, that the authors disagree. Maybe they just present different sides of the same coin; but it’s good to know both sides of that coin, so I highly recommend both.
I thank Jonathan Stark and, especially, the author of the book, Douglas Hubbard, for reviewing the first draft of this post. Douglas Hubbard made the following comment, which I publish with his permission (I made several changes to the post before publishing it, which explain any apparent discrepancies):
Thanks for spending the time to review my book. I’ve looked at your review.
In terms of reactions to writing styles, I have no comment. Your reaction is your reaction and I won’t argue that you should have had a different reaction. All I can say is it’s not a universal or even common reaction to the book. The style is what I call a “narrative how to” where we stop to tell stories to make things tangible. Basically the way I think is the way I write. The difficulty in writing any book like this is anticipating the needs of every reader. Some need to take it much slower and some think of it as trivial. Some will appreciate the pragmatic stories and some may feel distracted by it.
The rest of your review is mostly a fairly objective listing of the content without judgement one way or the other. The only point you make which could be addressed is your last point. You say you are suspicious of quantifying the beauty of a lake. Yet your opening paragraph indicates the need to make choices among alternatives presented by groups with contradictory goals. Whatever the choice ends up being as long as someone has done something to it’s “beauty” in exchange for something else, it is a de facto measurement of the value of the lake. Imagine more extreme scenarios.
What if putting a large industrial plant right on the favorite shore lake could create and turning the lake into a huge aquaculture farm could employ 200 people and feed 500 starving families? How about 2000 people and 50,000 starving families? This may seem like a contrived “lifeboat” scenario but it is exactly the choice policy makers and other decision makers (for whom this book is really written) make routinely.
You can be suspicious of the willingness-to-pay and yet you make such decisions in your daily life. I bet you have already made a WTP choice today. In fact, you made one just to visit that lake by deciding to give up activities that could have led to higher income or you might have paid for travel costs. Either way, you said “Visiting this beautiful lake was worth at least X.” And, remember, even a wide range can count as a measurement.
Not only is the beauty of the lake measurable but you’ve already measured it for yourself with your own WTP.
Douglas W. Hubbard
Hubbard Decision Research