Peter’s Axioms on Quoting Statistics

Go to any article on same-sex marriage and the controversy around gay unions, and at some point someone is going to make a claim and back it up by citing an academic paper. Knowing that many of you do that, here’s some little “Stats Axioms” to help you out.

  1. Make sure the academic paper exists – Seriously folks, make sure it exists before you copy the reference off someone else’s website. It is HUGELY bad form to just copy a quote from someone and then find out 24 hours later that the paper doesn’t even exist, that someone else made up the reference a decade ago and that everyone has been copying that ever since.
  2. NEVER rely on secondary sources – Unless you read the paper itself (and not just one person’s report of it) you shouldn’t cite it, let alone copy someone else’s report of it. Make sure you have actually seen the real thing in your hands or on your screen. It is not a valid excuse to say “I just relied on what xyz said”.
  3. READ the primary source. Read it again. Read it again. Make notes.
  4. Represent the primary source ACCURATELY and fairly. Remember, other people will read the paper (even if you haven’t) and they will pick up any teensy tiny exaggeration you make or the details you pass over that actually contradict the point you are trying to make.
  5. If the paper is part of a series of papers, READ THEM ALL. Read them all again. Longitudinal Cohort studies often find different results as the research progresses, results which might change those initially reported or clarify other findings.
  6. If you are using the research to claim a comparison between group A and group B, MAKE SURE the research actually covers group A and group B. Yes, you might find something bad about Group A, but if another bit of research says that Group B is just as bad on this particular issue, you’re going to look very silly when you claimed that the original research showed how heinous Group A was.
  7. NEVER FORGET that there are people out there who can calculate standard deviations in their head. Do you even know what a standard deviation is? Are you prepared to defend dismal sample sizes when challenged? Did you even read the paper to find out what the sample size was? I have sometimes dumped papers rather than citing from them because once you examined the sample sizes you realised the confidence intervals were so wide as to be meaningless.

There you go. If you follow these simple rules then you’re going to get 4 or 5 out of 5 every time on my handy Stats Watch ranking! If you don’t, you’re likely to get torn apart as we all see that you didn’t actually comprehend what the research actually told us and (and this is the absolutely worse crime in using statistics) you simply manipulated the figures to fit your bias and desired outcomes. Let the reader understand.

Any others anybody would like to contribute?

4 Comments on “Peter’s Axioms on Quoting Statistics

  1. Thanks, Peter, for this very necessary guide to how to review, compare, use and quote from research that includes statistical findings. I’m a development economist and if I draw conclusions in my profession that are not shown by my research I expect to get slammed. There should be no difference in our Christian lives and the arguments that we make. We should be just as rigorous.

  2. Peter, that sounds like excellent advice to me. I speak as someone who has trouble with figures of any kind. By the way, what exactly is a confidence interval? I’ve seen that expression before, but I have no idea what it means.

    • A confidence interval is a value that, assuming a particular distribution of a value (go and google “normal distribution” to see one example of a standard assumed distribution), says that we are x% certain the true value lies between the boundaries indicated.

      For example, you might report a finding as : 4.5, CI 0.25, 95%. What this means is that your sample of the overall population had an average value of 4.5, and even though that is probably not the same as if you asked absolutely everybody the same question (as opposed to just asking the sample), you are 95% certain that the true value for the whole population lies between 4.25 and 4.75 (your sampled value + or – the CI value).

      95% confidence is the “industry standard” for statistics, though in a lot of the work I do we often use 99% for absolute conservatism.

      Confidence intervals are often used to test one distribution against another. For example, in the work I’m doing right at this moment, a previous person has taken a full population and used 70% of it to build a statistical model. That model produces a particular value for field A and with that we have a confidence interval. I then take a different random 70% from the same population and replicate his work. As long as I get a value for field A that is within the previous model’s field A confidence interval I am pretty certain (95% certain in fact) that I am
      replicating his work EVEN THOUGH I didn’t get the exact same answer.

      This is often a fundamental misunderstanding when lay-statisticians try
      to interpret two different surveys. Just because one doesn’t get the
      same result as the other doesn’t mean that one is wrong and one is right
      – rather, we need to understand the limits of just sampling a large
      population, and measures like confidence intervals help us to do that.

      Another example – three different polling firms issue opinion poll
      results on the same day with different results. Are any of them
      incorrect? Not necessarily as all the results may be within the
      confidence intervals of each other. Often you will hear polling firms
      talk about “Margins of Error”, and nine times out of ten these are 95%
      confidence intervals.

      I’ll be writing a bit about surveys on public attitudes to same-sex
      marriage addressing these issues once we get a few more of them.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.