Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Thursday, May 31, 2012

Margin of Error -- What Talking Heads Don't Know

This could apply in the benefits or compensation world. Why? Because we look at lots of data. And, when we look at data, we often do sampling, whether we realize it or not.

But, this is my turn to rant. Why? Because it's been a while since I last ranted.

It is mind-boggling to me that most of the talking heads (newscasters if you prefer) on TV have college degrees, or for that matter, high school diplomas. They don't understand basic math.

From one of them today: "Scott Walker is ahead in all the polls we are seeing, but they are all within the margin of error, so the race is too close to call."

Think about it. If you take one poll and the margin of error is, say, plus or minus 5% and Walker leads by 51.8 to 48.2, then that poll is within the margin of error.

Note that the margin of error is based on the sample size not the whim of the pollster. 

Now, suppose that you have 20 polls and Walker leads in each one and in each one, his lead is near the outer limits of the margin of error. What does that tell you?

Well, the sample size has grown. If the sample size in poll #1 is n1 and the sample size in poll 2 is n2, etc, then the total sample size assuming that no single person was surveyed twice among the twenty polls is n1 + n2 + n3 + ... + n20. Ostensibly the margin of error is inversely proportional to the square root of the sample size. Or, in lay terms, every time the sample size gets multiplied by 4, the margin of error gets cut in half.

So, for simplicity, if each of n1 through n20 is 192, then the margin of error is roughly 7%. But, if there are no duplicates in the 20 populations, then the total sample size of the 20 is 3840 leading to a margin of error in the vicinity of 1.5%. 

So, if Walker is ahead by 3% in each of the 20 small polls, then each poll shows that the race is within the margin of error. However, taken together, the race is well outside the margin of error and they are predicting that Walker will win.

Duh!

Wednesday, May 2, 2012

Statistics ... Disraeli was Correct

Many Americans attribute the quote to Mark Twain. Twain, on the other hand, attributed it to former British Prime Minister, Benjamin Disraeli. Disraeli was purported to have written (or spoken): "There are three kinds of lies: lies, damned lies, and statistics."

Let's not worry about who actually coined the phrase. Even so, what does it have to do with our usual subject matter here? In the benefits and compensation arena (and many others), people love statistical data. They seem to love to create it, to cite it, to reference it, and to make decisions from it.

I beg them not to. Consider such data, but consider it as a point of reference. Use it intelligently.

I saw a survey last week or the week before that asked three questions of its respondents. Roughly recalling the questions, respondents were asked whether they were going to have enough money in retirement to make it to ages 75, 85, and 95 respectively. Less than half said they would have enough to live happily until age 85. The article reporting on the survey then told its readers that most Americans won't have enough money for their life expectancies.

Excuse me! Where did that come from? What made age 85 the life expectancy of any, let alone all, of the survey respondents. Were the respondents all the same age? Were they all the same gender? How many of them even know how much money they will need?

Just this morning, this summary of an article appeared in my inbox:
Fidelity Investments reported the average 401(k) balance in 401(k) plans it administers rose to $74,600 at the end of the first quarter, an increase of 8% from the end of the fourth quarter 2011. The first quarter balance also represents a 62% increase since the end of the first quarter 2009, often considered the low of the 2008-2009 market downturn, when the average balance was $46,200.
This is not to be construed as a condemnation of Fidelity. In fact, I feel quite certain that what they have said is correct. What concerns me is how others will use the data. Who will cite this information and how will they use it?

Let's consider what has happened between the end of the first quarter of 2009 and the end of the first quarter in 2012. I'm not looking for just investment performance and deferral behaviors, but how about external factors.

  • After the severe market losses from 4th quarter 2007 through 1st quarter 2009, many people who had been considering retirement were forced to defer it. These would be almost exclusively in the older group of workers. As a group, we would suspect that these would be people with higher account balances who were not withdrawing their 401(k) money from Fidelity. Does this not skew the data point upward?
  • As more companies freeze defined benefit plans, many of them have increased their 401(k) match. The way I see it, this makes it more likely that participants will take full advantage of the match. So, their deferrals will increase and the matching money will increase. Does this not move the data point upward?
  • Does Fidelity administer the same plans that it did three years ago? I'm sure that many are the same, but not all of them. Vendors gain new clients and lose existing ones. They don't all have the same participant profiles. I don't know which way this moved Fidelity's data, but it certainly moved it.
Let's go back to life expectancy. Let me ask you some questions. What is your life expectancy? Do you know? If you think you do, on what do you base it? Life expectancy from birth? From now? Is it based on IRS tables? Is it based on some other mortality tables? Newspaper articles? A guess? Does it consider your health? Does it consider the life spans of your ancestors? Does it consider your gender?

I am going to tell you something. Your life expectancy is a nice number, whatever it is. It may be the best estimate (at a point in time) of the age you will be when you die. That said, unless we are using a lot of rounding, you will not die at your life expectancy. That's right, if we are precise, you will not die at your life expectancy. 

If we were to choose a mortality table and calculate your life expectancy (or mine), it might tell you that your life expectancy is another 20 years (rounded). Or, if you are more precise, it might tell you that your life expectancy is another 20.3 years. Or 20.27 years. Or, 20.26637185943 years. 

Another table might generally increase those numbers by several months, or even years. Another one might shorten your life expectancy. In any event, given a table, your life expectancy would have your life ending on a particular day at a particular time. 

It's not going to happen.

But, people using statistical date cite life expectancy as if it were the most relevant point in time. 

Statistics are very useful. They are especially useful when they are not misused.

Monday, April 4, 2011

Litigation and Statistics -- The Wal-Mart Case

I had planned to have written a long piece by now about the long-awaited ruling in Dukes v Wal-Mart. Why am I so slow? Well, it's not me, in this case, it's our court system. The courts seem to have adopted a rhythm with which to hear cases that is reminiscent of the words of Oliver Wendell Holmes, the suggestions of Felix Frankfurter, and attributed by many to Earl Warren in Brown v Board of Education : "with all deliberate speed."

But that doesn't mean that I can't write about this case yet. There are interesting aspects, whether we have a decision or not. First, some background. If you are not at all familiar with this case, then you probably don't may much attention to labor issues. A class of more than one million women seeks standing to sue Wal-Mart for gender discrimination. What is at issue before the Supreme Court, at least as I understand it, is not whether there has been gender discrimination, but whether the group of female employees and ex-employees of Wal-Mart have standing to sue.

That's for the lawyers. I'm not one of them.

But, I have read a number of purportedly learned articles on the subject. The more right-leaning of the bunch say that Wal-Mart certainly has the right to compensate its employees based on any reasonable criteria that it finds appropriate. The more left-leaning point to data showing that this class of women, on average, have been compensated $1.17 less per hour than their male counterparts.

I haven't read the pleadings. I haven't heard the oral arguments. So, I don't have all the information that the Supreme Court justices do. And, I'm not here to tell you whether this class should have standing to sue as a class, or whether if they do have standing to sue, they should prevail on the merits. What I am here to tell you and I've said this before is that statistics, at least when used improperly or cherry-picked, lie.

Is a $1.17 per hour pay differential between the genders material? At Gibson, Dunn & Crutcher (the lead firm defending Wal-Mart in this case), it's probably not. Data that I have seen suggests that they pay their employees well enough that $1.17 per hour is just not meaningful. It's a bit like trying to find $500,000,000 in the US budget. Nobody seems to care.

But, similar data suggests that Wal-Mart does not pay its rank and file quite as well as Gibson, Dunn does. Supreme Court Justice Ruth Bader Ginsburg appears particularly passionate that the females of Wal-Mart have been wronged. One of the more conservative justices (I would call him out if I could remember which justice it was, but I don't) has said that $1.17 per hour just isn't material (my choice of words, not his).

I don't know, but I strongly suspect that neither of these justices is qualified to discuss statistics, sampling, margins of error or the like. Don't get me wrong, they are none brilliant scholars -- even the ones that I personally don't think much of are smarter than the average bear (sorry, Yogi, I know that's really you). But, I have seen the way that they toss around statistics. They generally don't get it.

I suspect that if this case does ever go to trial on the merits that lots of said statistics will be tossed around. And, this particularly cynical observer (who? me?) thinks that all will be meant to deceive rather than to inform.. Do you want to do a truly meaningful comparison? Then, compare similarly situated employees. As an example, consider all of the Wal-Mart employees who are between the ages of 40 and 50 who have been with the company as store clerks (my name not theirs) for between 5 and 7 consecutive years in stores in affluent suburbs in the southeast United States. Now compare the pay of men and women. Is it the same, on average. Is it different? Is it different beyond a margin of error given the sample size?

Methinks the courts don't care. They don't understand such mathematical minutiae and frankly, they are probably not worried about it.

So, the right will throw statistics around. The left will throw statistics around. Little of it will make any sense, and the courts will decide.

It just doesn't seem quite right to me.

Thursday, January 13, 2011

Don't Tell Me About Replacement Ratios

I read an article this morning and I'm not going to link you to it, but I am going to tell you that it made my teeth itch. In it, a head of research from an institutional investment house that is also a defined contribution recordkeeper said that their database shows that only X (they actually stated what X was) % of the participants in their database (the plans that they recordkeep) are on track to retire with a replacement ratio of at least 75%, which is what the head of research said is necessary for retirement.

Bullhonky!

Who said 75% is right? Doesn't that vary by pay level? How many of those people have inheritances? How many have defined benefit plans? How many have plans with other employers? How many have IRAs? How many have significant other assets?

I've gotten 'advice' in the mail from the recordkeeper of one of the 401(k) plans in which I have an account balance. It told me that I need to save more in the plan. Then, I looked at the amount that I was deferring for the year. Based on what were, at the time, my current elections (and I didn't change them during that year), I was on track to defer $22,000 to the plan ($16,500 regular contributions plus $5,500 catch-up contributions). The plan is not allowed to let me defer more than that. Period!

But, the wonderful formula used by the recordkeeper in its calculations showed that based on that plan only, I would never be able to retire if I continued to defer such a pittance. Did they know that I have vested benefits in defined benefit plans? Did they know that I have other investments? Did they know that I plan to win the lottery soon, and it will be a really big one?

I am supportive of communications to plan participants that encourage saving. But, this is ridiculous. The good news is that most plan participants are not informed enough to understand why these communications may be wrong. The good news is that the communications, no matter how faulty, often cause plan participants to save more.

I guess my beef is with the improper usage of data and statistics. The fact is that many defined contribution consultants, communication experts, and even heads of research for these firms, do not have sufficient training in data handling and statistics to make these statements and to give this advice.

When I become Congress and President (likely not in this lifetime), I am going to pass a law that the misuse of data and statistics resulting in citing some outlandish outcome is a felony punishable by ... something that will be as annoying to the misuser as their misuse is to me. So, if they don't understand what they are saying, I wish they would keep it to themselves.