Naked statistics pdf download






















At the same time, the underlying probabilities for the relevant events—drawing 21 at blackjack or spinning red in roulette—are known. This turns out to be a powerful phenomenon in areas of life far beyond casinos.

Many businesses must assess the risks associated with assorted adverse outcomes. However, any business facing uncertainty can manage these risks by engineering processes so that the probability of an adverse outcome, anything from an environmental catastrophe to a defective product, becomes acceptably low.

Wall Street firms will often evaluate the risks posed to their portfolios under different scenarios, with each of those scenarios weighted based on its probability. The financial crisis of was precipitated in part by a series of market events that had been deemed extremely unlikely, as if every player in a casino drew blackjack all night. I will argue later in the book that these Wall Street models were flawed and that the data they used to assess the underlying risks were too limited, but the point here is that any model to deal with risk must have probability as its foundation.

The entire insurance industry is built upon charging customers to protect them against some adverse outcome, such as a car crash or a house fire.

The insurance industry does not make money by eliminating these events; cars crash and houses burn every day.

Sometimes cars even crash into houses, causing them to burn. Instead, the insurance industry makes money by charging premiums that are more than sufficient to pay for the expected payouts from car crashes and house fires. The insurance company may also try to lower its expected payouts by encouraging safe driving, fences around swimming pools, installation of smoke detectors in every bedroom, and so on.

Probability can even be used to catch cheats in some situations. The mathematical logic stems from the fact that we cannot learn much when a large group of students all answer a question correctly. But when those same test takers get an answer wrong, they should not all consistently have the same wrong answer.

If they do, it suggests that they are copying from one another or sharing answers via text. Of course, you can see the limitations of using probability. A large group of test takers might have the same wrong answers by coincidence; in fact, the more schools we evaluate, the more likely it is that we will observe such patterns just as a matter of chance.

A statistical anomaly does not prove wrongdoing. We cannot arrest Mr. Kinney for fraud on the basis of that calculation alone though we might inquire whether he has any relatives who work for the state lottery.

Probability is one weapon in an arsenal that requires good judgment. We have an answer for that question—but the process of answering it was not nearly as straightforward as one might think.

The scientific method dictates that if we are testing a scientific hypothesis, we should conduct a controlled experiment in which the variable of interest e. If we observe a marked difference in some outcome between the two groups e.

We cannot do that kind of experiment on humans. If our working hypothesis is that smoking causes cancer, it would be unethical to assign recent college graduates to two groups, smokers and nonsmokers, and then see who has cancer at the twentieth reunion.

Smokers and nonsmokers are likely to be different in ways other than their smoking behavior. For example, smokers may be more likely to have other habits, such as drinking heavily or eating badly, that cause adverse health outcomes. If the smokers are particularly unhealthy at the twentieth reunion, we would not know whether to attribute this outcome to smoking or to other unhealthy things that many smokers happen to do.

We would also have a serious problem with the data on which we are basing our analysis. Smokers who have become seriously ill with cancer are less likely to attend the twentieth reunion. As a result, any analysis of the health of the attendees at the twentieth reunion related to smoking or anything else will be seriously flawed by the fact that the healthiest members of the class are the most likely to show up.

The further the class gets from graduation, say, a fortieth or a fiftieth reunion, the more serious this bias will be. We cannot treat humans like laboratory rats.

As a result, statistics is a lot like good detective work. The data yield clues and patterns that can ultimately lead to meaningful conclusions. You have probably watched one of those impressive police procedural shows like CSI: New York in which very attractive detectives and forensic experts pore over minute clues—DNA from a cigarette butt, teeth marks on an apple, a single fiber from a car floor mat—and then use the evidence to catch a violent criminal.

The appeal of the show is that these experts do not have the conventional evidence used to find the bad guy, such as an eyewitness or a surveillance videotape. So they turn to scientific inference instead. Statistics does basically the same thing.

The data present unorganized clues—the crime scene. Statistical analysis is the detective work that crafts the raw data into some meaningful conclusion.

After Chapter 11, you will appreciate the television show I hope to pitch: CSI: Regression Analysis, which would be only a small departure from those other action-packed police procedurals. When you read in the newspaper that eating a bran muffin every day will reduce your chances of getting colon cancer, you need not fear that some unfortunate group of human experimental subjects has been force-fed bran muffins in the basement of a federal laboratory somewhere while the control group in the next building gets bacon and eggs.

Instead, researchers will gather detailed information on thousands of people, including how frequently they eat bran muffins, and then use regression analysis to do two crucial things: 1 quantify the association observed between eating bran muffins and contracting colon cancer e. Of course, CSI: Regression Analysis will star actors and actresses who are much better looking than the academics who typically pore over such data.

What individuals are most likely to become terrorists? Olympic beach volleyball team. When she gets the printout from her statistical analysis, she sees exactly what she has been looking for: a large and statistically significant relationship in her data set between some variable that she had hypothesized might be important and the onset of autism.

She must share this breakthrough immediately! The researcher takes the printout and runs down the hall, slowed somewhat by the fact that she is wearing high heels and a relatively small, tight black skirt. She finds her male partner, who is inexplicably fit and tan for a guy who works fourteen hours a day in a basement computer lab, and shows him the results.

Together the regression analysis experts walk briskly to see their boss, a grizzled veteran who has overcome failed relationships and a drinking problem. Just about every social challenge that we care about has been informed by the systematic analysis of large data sets. In many cases, gathering the relevant data, which is expensive and time-consuming, plays a crucial role in this process as will be explained in Chapter 7.

I may have embellished my characters in CSI: Regression Analysis but not the kind of significant questions they could examine. There is an academic literature on terrorists and suicide bombers—a subject that would be difficult to study by means of human subjects or lab rats for that matter.

One such book, What Makes a Terrorist , was written by one of my graduate school statistics professors. The book draws its conclusions from data gathered on terrorist attacks around the world. A sample finding: Terrorists are not desperately poor, or poorly educated. Well, that exposes one of the limitations of regression analysis.

We can isolate a strong association between two variables by using statistical analysis, but we cannot necessarily explain why that relationship exists, and in some cases, we cannot know for certain that the relationship is causal, meaning that a change in one variable is really causing a change in the other.

In the case of terrorism, Professor Krueger hypothesizes that since terrorists are motivated by political goals, those who are most educated and affluent have the strongest incentive to change society. These individuals may also be particularly rankled by suppression of freedom, another factor associated with terrorism.

This discussion leads me back to the question posed by the chapter title: What is the point? The point is not to do math, or to dazzle friends and colleagues with advanced statistical techniques.

The point is to learn things that inform our lives. As a result, there are numerous reasons that intellectually honest individuals may disagree about statistical results or their implications. At the most basic level, we may disagree on the question that is being answered. As the next chapter will point out, more socially significant questions fall prey to the same basic challenge.

What is happening to the economic health of the American middle class? Nor can we create two identical nations —except that one is highly repressive and the other is not—and then compare the number of suicide bombers that emerge in each. Even when we can conduct large, controlled experiments on human beings, they are neither easy nor cheap.

Researchers did a large-scale study on whether or not prayer reduces postsurgical complications, which was one of the questions raised earlier in this chapter.

We conduct statistical analysis using the best data and methodologies and resources available. Statistical analysis is more like good detective work hence the commercial potential of CSI: Regression Analysis. Smart and honest people will often disagree about what the data are trying to tell us. But who says that everyone using statistics is smart or honest? As mentioned, this book began as an homage to How to Lie with Statistics, which was first published in and has sold over a million copies.

The reality is that you can lie with statistics. Or you can make inadvertent errors. In either case, the mathematical precision attached to statistical analysis can dress up some serious nonsense. This book will walk through many of the most common statistical errors and misrepresentations so that you can recognize them, not put them to use.

So, to return to the title chapter, what is the point of learning statistics? To summarize huge quantities of data. To make better decisions. To answer important social questions. To recognize patterns that can refine how we do everything from selling diapers to catching criminals.

To catch cheaters and prosecute criminals. To evaluate the effectiveness of policies, programs, drugs, medical procedures, and other innovations. And to spot the scoundrels who use these very same powerful tools for nefarious ends. If you can do all of that while looking great in a Hugo Boss suit or a short black skirt, then you might also be the next star of CSI: Regression Analysis.

In that case, the United States would have a Gini Index of The first question is profoundly important. It tends to be at the core of presidential campaigns and other social movements. The second question is trivial in the literal sense of the word , but baseball enthusiasts can argue about it endlessly. What the two questions have in common is that they can be used to illustrate the strengths and limitations of descriptive statistics, which are the numbers and calculations we use to summarize raw data.

That would be raw data, and it would take a while to digest, given that Jeter has played seventeen seasons with the New York Yankees and taken 9, at bats. Or I can just tell you that at the end of the season Derek Jeter had a career batting average of. It is easy to understand, elegant in its simplicity—and limited in what it can tell us.

Baseball experts have a bevy of descriptive statistics that they consider to be more valuable than the batting average. I called Steve Moyer, president of Baseball Info Solutions a firm that provides a lot of the raw data for the Moneyball types , to ask him, 1 What are the most important statistics for evaluating baseball talent?

Ideally we would like to find the economic equivalent of a batting average, or something even better. We would like a simple but accurate measure of how the economic well-being of the typical American worker has been changing in recent years.

Are the people we define as middle class getting richer, poorer, or just running in place? Per capita income is a simple average: total income divided by the size of the population. Congratulations to us. There is just one problem. My quick calculation is technically correct and yet totally wrong in terms of the question I set out to answer.

To begin with, the figures above are not adjusted for inflation. Per capita income merely takes all of the income earned in the country and divides by the number of people, which tells us absolutely nothing about who is earning how much of that income—in or in As the Occupy Wall Street folks would point out, explosive growth in the incomes of the top 1 percent can raise per capita income significantly without putting any more money in the pockets of the other 99 percent.

In other words, average income can go up without helping the average American. As with the baseball statistic query, I have sought outside expertise on how we ought to measure the health of the American middle class. From baseball to income, the most basic task when working with data is to summarize a great deal of information. There are some million residents in the United States. A spreadsheet with the name and income history of every American would contain all the information we could ever want about the economic health of the country—yet it would also be so unwieldy as to tell us nothing at all.

The irony is that more data can often present less clarity. So we simplify. We perform calculations that reduce a complex array of data into a handful of numbers that describe those data, just as we might encapsulate a complex, multifaceted Olympic gymnastics performance with one number: 9.

The good news is that these descriptive statistics give us a manageable and meaningful summary of the underlying phenomenon. The bad news is that any simplification invites abuse. Descriptive statistics can be like online dating profiles: technically accurate and yet pretty darn misleading. You have finished reading about day seven of the marriage when your boss shows up with two enormous files of data.

One file has warranty claim information for each of the 57, laser printers that your firm sold last year. For each printer sold, the file documents the number of quality problems that were reported during the warranty period.

The other file has the same information for each of the , laser printers that your chief competitor sold during the same stretch. In this case, we want to know the average number of quality problems per printer sold for your firm and for your competitor. You would simply tally the total number of quality problems reported for all printers during the warranty period and then divide by the total number of printers sold.

Remember, the same printer can have multiple problems while under warranty. You would do that for each firm, creating an important descriptive statistic: the average number of quality problems per printer sold.

That was easy. Or maybe not. Bill Gates walks into the bar with a talking parrot perched on his shoulder. The parrot has nothing to do with the example, but it kind of spices things up. Obviously none of the original ten drinkers is any richer though it might be reasonable to expect Bill Gates to buy a round or two.

The sensitivity of the mean to outliers is why we should not gauge the economic health of the American middle class by looking at per capita income.

Because there has been explosive growth in incomes at the top end of the distribution—CEOs, hedge fund managers, and athletes like Derek Jeter—the average income in the United States could be heavily skewed by the megarich, making it look a lot like the bar stools with Bill Gates at the end.

The median is the point that divides a distribution in half, meaning that half of the observations lie above the median and half lie below. If there is an even number of observations, the median is the midpoint between the two middle observations.

If you literally envision lining up the bar patrons on stools in ascending order of their incomes, the income of the guy sitting on the sixth stool represents the median income for the group. If Warren Buffett comes in and sits down on the twelfth stool next to Bill Gates, the median still does not change. The number of quality problems per printer is arrayed along the bottom; the height of each bar represents the percentages of printers sold with that number of quality problems.

Because the distribution includes all possible quality outcomes, including zero defects, the proportions must sum to 1 or percent. The distribution is slightly skewed to the right by the small number of printers with many reported quality defects. These outliers move the mean slightly rightward but have no impact on the median. With a few keystrokes, you get the result. Because the Kardashian marriage is getting monotonous, and because you are intrigued by this finding, you print a frequency distribution for your own quality problems.

These outliers inflate the mean but not the median. More important from a production standpoint, you do not need to retool the whole manufacturing process; you need only figure out where the egregiously low-quality printers are coming from and fix that.

Meanwhile, the median has some useful relatives. The distribution can be further divided into quarters, or quartiles. The first quartile consists of the bottom 25 percent of the observations; the second quartile consists of the next 25 percent of the observations; and so on.

Or the distribution can be divided into deciles, each with 10 percent of the observations. If your income is in the top decile of the American income distribution, you would be earning more than 90 percent of your fellow workers. We can go even further and divide the distribution into hundredths, or percentiles. The benefit of these kinds of descriptive statistics is that they describe where a particular observation lies compared with everyone else.

If I tell you that your child scored in the 3rd percentile on a reading comprehension test, you should know immediately that the family should be logging more time at the library. If the test was easy, then most test takers will have a high number of answers correct, but your child will have fewer correct than most of the others. Here is a good point to introduce some useful terminology.

If I shoot 83 for eighteen holes of golf, that is an absolute figure. I may do that on a day that is 58 degrees, which is also an absolute figure. Absolute figures can usually be interpreted without any context or additional information. The exception might be if the conditions are particularly awful, or if the course is especially difficult or easy. If I place ninth in the golf tournament, that is a relative statistic.

Most standardized tests produce results that have meaning only as a relative statistic. But when I convert it to a percentile—meaning that I put that raw score into a distribution with the math scores for all other Illinois third graders—then it acquires a great deal of meaning. If 43 correct answers falls into the 83rd percentile, then this student is doing better than most of his peers statewide.

In this case, the percentile the relative score is more meaningful than the number of correct answers the absolute score. Another statistic that can help us describe what might otherwise be a jumble of numbers is the standard deviation, which is a measure of how dispersed the data are from their mean. In other words, how spread out are the observations? Suppose I collected data on the weights of people on an airplane headed for Boston, and I also collected the weights of a sample of qualifiers for the Boston Marathon.

Now assume that the mean weight for both groups is roughly the same, say pounds. Anyone who has been squeezed into a row on a crowded flight, fighting for the armrest, knows that many people on a typical commercial flight weigh more than pounds.

But you may recall from those same unpleasant, overcrowded flights that there were lots of crying babies and poorly behaved children, all of whom have enormous lung capacity but not much mass. When it comes to calculating the average weight on the flight, the heft of the pound football players on either side of your middle seat is likely offset by the tiny screaming infant across the row and the six- year-old kicking the back of your seat from the row behind.

On the basis of the descriptive tools introduced so far, the weights of the airline passengers and the marathoners are nearly identical.

My eight-year-old son might point out that the marathon runners look like they all weigh the same amount, while the airline passengers have some tiny people and some bizarrely large people. The standard deviation is the descriptive statistic that allows us to assign a single number to this dispersion around the mean. The formulas for calculating the standard deviation and the variance another common measure of dispersion from which the standard deviation is derived are included in an appendix at the end of the chapter.

Your doctor draws blood, and a few days later her assistant leaves a message on your answering machine to inform you that your HCb2 count a fictitious blood chemical is You rush to the Internet and discover that the mean HCb2 count for a person your age is and the median is about the same.

Holy crap! You might take up skydiving or try to write a novel very fast. None of these things may be necessary and the e-mail to your boss could turn out very badly. But how could that be? What the heck does that mean?

There is natural variation in the HCb2 count, as there is with most biological phenomena e. While the mean count for the fake chemical might be , plenty of healthy people have counts that are higher or lower. The danger arises only when the HCb2 count gets excessively high or low. For many typical distributions of data, a high proportion of the observations lie within one standard deviation of the mean meaning that they are in the range from one standard deviation below the mean to one standard deviation above the mean.

To illustrate with a simple example, the mean height for American adult men is 5 feet 10 inches. The standard deviation is roughly 3 inches. A high proportion of adult men are between 5 feet 7 inches and 6 feet 1 inch. Or, to put it slightly differently, any man in this height range would not be considered abnormally short or tall. Which brings us back to your troubling HCb2 results. Of course, far fewer observations lie two standard deviations from the mean, and fewer still lie three or four standard deviations away.

In the case of height, an American man who is three standard deviations above average in height would be 6 feet 7 inches or taller. Some distributions are more dispersed than others.

Hence, the standard deviation of the weights of the airline passengers will be higher than the standard deviation of the weights of the marathon runners. Once we know the mean and standard deviation for any collection of data, we have some serious intellectual traction. For example, suppose I tell you that the mean score on the SAT math test is with a standard deviation of As with height, the bulk of students taking the test will be within one standard deviation of the mean, or between and How many students do you think score or higher?

Probably not very many, since that is more than two standard deviations above the mean. Data that are distributed normally are symmetrical around their mean in a bell shape that will look familiar to you. The normal distribution describes many common phenomena. Imagine a frequency distribution describing popcorn popping on a stove top.

Some kernels start to pop early, maybe one or two pops per second; after ten or fifteen seconds, the kernels are exploding frenetically. Then gradually the number of kernels popping per second fades away at roughly the same rate at which the popping began. The heights of American men are distributed more or less normally, meaning that they are roughly symmetrical around the mean of 5 feet 10 inches. Each SAT test is specifically designed to produce a normal distribution of scores with mean and standard deviation of The beauty of the normal distribution—its Michael Jordan power, finesse, and elegance—comes from the fact that we know by definition exactly what proportion of the observations in a normal distribution lie within one standard deviation of the mean This may sound like trivia.

In fact, it is the foundation on which much of statistics is built. We will come back to this point in much great depth later in the book. Each band represents one standard deviation. Descriptive statistics are often used to compare two figures or quantities. Those comparisons make sense because most of us recognize the scale of the units involved. Conversely, nine degrees is a significant temperature deviation in just about any climate at any time of year, so nine degrees above average makes for a day that is much hotter than usual.

Unless you know an awful lot about sodium and the serving sizes for granola cereal , that statement is not going to be particularly informative. Should we be worried about Al? The easiest way to give meaning to these relative comparisons is by using percentages. Measuring change as a percentage gives us some sense of scale. You probably learned how to calculate percentages in fourth grade and will be tempted to skip the next few paragraphs.

Fair enough. But first do one simple exercise for me. The assistant manager marks down all merchandise by 25 percent. What is the final price of the dress? This is not merely a fun parlor trick that will win you applause and adulation at cocktail parties.

Percentages are useful—but also potentially confusing or even deceptive. The numerator the part on the top of the fraction gives us the size of the change in absolute terms; the denominator the bottom of the fraction is what puts this change in context by comparing it with our starting point.

The increase will be. The point is that a percentage change always gives the value of some figure relative to something else. Therefore, we had better understand what that something else is. I once invested some money in a company that my college roommate started.

Since it was a private venture, there were no requirements as to what information had to be provided to shareholders.

A number of years went by without any information on the fate of my investment; my former roommate was fairly tight-lipped on the subject. There was no information on the size of those profits in absolute terms, meaning that I still had absolutely no idea how my investment was performing. Suppose that last year the firm earned 27 cents—essentially nothing. To be fair to my roommate, he eventually sold the company for hundreds of millions of dollars, earning me a percent return on my investment.

Since you have no idea how much I invested, you also have no idea how much money I made—which reinforces my point here very nicely! Let me make one additional distinction. Percentage change must not be confused with a change in percentage points. Rates are often expressed in percentages. The sales tax rate in Illinois is 6. I pay my agent 15 percent of my book royalties. These rates are levied against some quantity, such as income in the case of the income tax rate.

Obviously the rates can go up or down; less intuitively, the changes in the rates can be described in vastly dissimilar ways. The best example of this was a recent change in the Illinois personal income tax, which was raised from 3 percent to 5 percent.

There are two ways to express this tax change, both of which are technically accurate. The Democrats, who engineered this tax increase, pointed out correctly that the state income tax rate was increased by 2 percentage points from 3 percent to 5 percent. The Republicans pointed out also correctly that the state income tax had been raised by 67 percent. Many phenomena defy perfect description with a single statistic. Suppose quarterback Aaron Rodgers throws for yards but no touchdowns. Meanwhile, Peyton Manning throws for a meager yards but three touchdowns.

Who played better? The passer rating is an example of an index, which is a descriptive statistic made up of other descriptive statistics.

Once these different measures of performance are consolidated into a single number, that statistic can be used to make comparisons, such as ranking quarterbacks on a particular day, or even over a whole career.

If baseball had a similar index, then the question of the best player ever would be solved. Or would it? The advantage of any index is that it consolidates lots of complex information into a single number.

We can then rank things that otherwise defy simple comparison—anything from quarterbacks to colleges to beauty pageant contestants. In the Miss America pageant, the overall winner is a combination of five separate competitions: personal interview, swimsuit, evening wear, talent, and onstage question.

Miss Congeniality is voted on separately by the participants themselves. Alas, the disadvantage of any index is that it consolidates lots of complex information into a single number.

There are countless ways to do that; each has the potential to produce a different outcome. Malcolm Gladwell makes this point brilliantly in a New Yorker piece critiquing our compelling need to rank things.

Using a formula that includes twenty-one different variables, Car and Driver ranked the Porsche number one. If styling is given more weight in the overall ranking 25 percent , then the Lotus comes out on top.

But wait. Gladwell also points out that the sticker price of the car gets relatively little weight in the Car and Driver formula. If value is weighted more heavily so that the ranking is based equally on price, exterior styling, and vehicle characteristics , the Chevy Corvette is ranked number one. Any index is highly sensitive to the descriptive statistics that are cobbled together to build it, and to the weight given to each of those components. As a result, indices range from useful but imperfect tools to complete charades.

The HDI was created as a measure of economic well-being that is broader than income alone. The HDI uses income as one of its components but also includes measures of life expectancy and educational attainment. The United States ranks eleventh in the world in terms of per capita economic output behind several oil-rich nations like Qatar, Brunei, and Kuwait but fourth in the world in human development. The HDI provides a handy and reasonably accurate snapshot of living standards around the globe.

Descriptive statistics give us insight into phenomena that we care about. In that spirit, we can return to the questions posed at the beginning of the chapter. Who is the best baseball player of all time? More important for the purposes of this chapter, what descriptive statistics would be most helpful in answering that question? According to Steve Moyer, president of Baseball Info Solutions, the three most valuable statistics other than age for evaluating any player who is not a pitcher would be the following: 1.

On-base percentage OBP , sometimes called the on-base average OBA : Measures the proportion of the time that a player reaches base successfully, including walks which are not counted in the batting average. Slugging percentage SLG : Measures power hitting by calculating the total bases reached per at bat. A single counts as 1, a double is 2, a triple is 3, and a home run is 4. At bats AB : Puts the above in context. Any mope can have impressive statistics for a game or two.

Babe Ruth still holds the Major League career record for slugging percentage at. Again, I deferred to the experts. Both gave variations on the same basic answer. They also recommended examining changes to wages at the 25th and 75th percentiles which can reasonably be interpreted as the upper and lower bounds for the middle class.

One more distinction is in order. When assessing economic health, we can examine income or wages. They are not the same thing. A wage is what we are paid for some fixed amount of labor, such as an hourly or weekly wage. Income is the sum of all payments from different sources.

If workers take a second job or work more hours, their income can go up without a change in the wage. For that matter, income can go up even if the wage is falling, provided a worker logs enough hours on the job. The wage is a less ambiguous measure of how Americans are being compensated for the work they do; the higher the wage, the more workers take home for every hour on the job. Having said all that, here is a graph of American wages over the past three decades.

A variety of conclusions can be drawn from these data. Workers at the 90th percentile have done much, much better. Descriptive statistics help to frame the issue. What we do about it, if anything, is an ideological and political question. However, the twist is that the difference between each observation and the mean is squared; the sum of those squared terms is then divided by the number of observations. Specifically: Because the difference between each term and the mean is squared, the formula for calculating variance puts particular weight on observations that lie far from the mean, or outliers, as the following table of student heights illustrates.

In this case, it represents the number of inches between the height of the individual and the mean. Both groups of students have a mean height of 70 inches. The heights of students in both groups also differ from the mean by the same number of total inches: By that measure of dispersion, the two distributions are identical.

However, the variance for Group 2 is higher because of the weight given in the variance formula to values that lie particularly far from the mean—Sahar and Narciso in this case. Variance is rarely used as a descriptive statistic on its own. Instead, the variance is most useful as a step toward calculating the standard deviation of a distribution, which is a more intuitive tool as a descriptive statistic.

The standard deviation for a set of observations is the square root of the variance: For any set of n observations x1, x2, x3. Both the perpetually drunk employees and the random missing pieces on the assembly line appear to have compromised the quality of the printers being produced there. Go figure! And so it is with statistics. Although the field of statistics is rooted in mathematics, and mathematics is exact, the use of statistics to describe complex phenomena is not exact. That leaves plenty of room for shading the truth.

Naked Statistics: Stripping the Dread from the Data. Nov 24th, Unlimited all-in-one ebooks in one place. Free trial account for registered user. Brittney I dislike writing reviews on books I had a hard time putting it down. Very well written, great characters and I loved the setting! Major retailers are predicting everything from when theircustomers are pregnant to when they want a new pair of ChuckTaylors. It's a brave new world where seemingly meaningless datacan be transformed into valuable insight to drive smart businessdecisions.

But how does. Chances Are is the first book to make statistics accessible to everyone, regardless of how much math you remember from school.

Charlie Wheelan and his family do what others dream of: They take a year off to travel the world. This is their story. What would happen if you quit your life for a year? Home Naked Statistics. Naked Statistics Stripping the Dread from the Data. Naked Money by Charles Wheelan.

The Art of Statistics by David Spiegelhalter. Statistics Done Wrong by Alex Reinhart. Head First Statistics by Dawn Griffiths. The Signal and the Noise by Nate Silver.



0コメント

  • 1000 / 1000