Scientists, Data Scientists And Significance |
Written by Mike James | ||||||
Monday, 15 April 2019 | ||||||
Page 2 of 2
Misusing SignificanceSo the procedure for significance testing is:
Once again note that it is the experiment that is quantified in terms of significance and power and not a particular realization of the experiment. It is important that the experiment is performed in exactly the way that would result in the probabilities calculated if it was repeated. For example, if you selected the best of three lots of 50 tosses then the repeated experiment would not be described by the significance and power. A common distortion of the procedure is to do the experiment and then work out the significance. If it is better than 0.05 then you quote the result as being significant at the 5% level. If it is better than 0.01 you quote it as being significant at the 1% level. This is not an experiment that has a significance of 0.01 simply because, if you repeat the procedure, you will erroneously reject the null hypothesis 5 times in 100 - the "extra" significance is spurious. A much bigger problem is the repeated experiment situation. If you are using experiments that have a significance of 5%, then if you repeat the experiment 100 times you will expect to see five significant results purely by chance. I once was asked why in a ten by ten correlation matrix there were always a handful of good significant correlations. When I explained why this was always the case, I was told that the researcher was going to forget what he had just discovered and I was never to repeat it. Yes measuring lots of things and being surprised at a handful of significant results is an important experimental tool. If repeated attempts at finding something significant were replaced by something more reliable, the number of papers in many subjects would drop to a trickle. This is a prime cause of the irreproducibility of results and a repeat generally finds the same number of significant results, just a different set. Another really big problem is that most researchers don't quote any sort of power figure for their results. The reason is that in many cases the sample sizes are so small that the power is very low. Many studies have a power below 0.5. For example, if you toss the coin only 10 times, the power to detect a bias as larger than 0.25 or 0.75 is just 0.5. What this means is that half of the experiments will fail to detect quite a sizable bias. There are plenty of real world cases where null results are very likely to be due to a lack of power and when it comes to, say, comparing the negative effects of drugs this can be lethal. Whenever I have tried to explain this, with the advice "you need more data", I have always been met with the response that "more data is impossible; what can we do with what we have?". There are many subjects that would dry up if power estimates were made mandatory. Not Significant EnoughThe suggestions for cleaning up significance range from getting researchers to rephrase their conclusions to estimating confidence intervals. All I can say to this is that if significance testing is misunderstood, confidence intervals are an even deeper mystery. Don't go there please. So what is the solution? There is a solution, but many disciplines will simply be unable to accept it. Consider for a moment physics - often a standard by which to judge a scientific procedure. When the apple fell on Newton's head, he didn't have to consider probability. The apple fell to earth with a probability so close to 1 that it wasn't even worth considering. An old-fashioned physics experiment is so certain that many physicists don't know anything much about statistics - a bit of error estimation is quite enough. Putting this another way, physics doesn't work with 5% significance; it uses fantastically small significance levels. When physicists have to take chance into account they continue this high standard. For example, the discovery of the Higg's boson needed data that was five standard deviations away from the results predicted by a model where it didn't exist. That is, a rough significance level of .0000003 or 1 in 3.5 million - think about that compared to 5 in 100. Particle physics in general requires a significance of 0.003 to announce evidence of a particle and 0.0000003 to announce a discovery. If we want reproducible results we need to increase significance level and be aware of the power of the experiments we perform. Of course, data scientists have lots of data and could use significance levels similar to physics. The big problem is that with publications decrying the use of significance and alternatives being suggested, the chances are that they will be seduced by lesser procedures that result in just as irreproducible results. Show me the data, show me the evidence, and make it good. More InformationStatistical Inference in the 21st Century: A World Beyond p < 0.05 Related ArticlesMINE - Finding Patterns in Big Data How Not To Shuffle - The Knuth Fisher-Yates Algorithm What's a Sample of Size One Worth? Reading Your Way Into Big Data What is a Data Scientist and How Do I Become One? Data Science Course For Everyone Now Online Coursera Offers MOOC-Based Master's in Data Science To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info
|
||||||
Last Updated ( Monday, 15 April 2019 ) |