
Bad science → bad headlines
Today’s #SSHOw (http://goo.gl/0eGrh) will be discussing GMO corn and in particular a poorly done experiment/publication that sparked the media storm (Séralini et al Food and Toxicology 2012). http://goo.gl/5GOWa
Orac dissects the paper quite nicely, although I think he repeatedly says mice when he means rats. http://goo.gl/SSE2F
The two areas I will comment on are the tumor rat model and a statistical issue dealing with multiple comparisons.
Spontaneous Tumors in Rats
Orac points to a study in 1979 where 81% of the Sprague-Dawley rats, the same strain used in the Séralini paper, develop tumors. When you use an animal model looking at tumor development, you need to know the prevalence of spontaneous tumor development. The control group(s) have to be designed such that you can differentiate between “normal” spontaneous tumor development in the control groups vs. the experimental groups. Part of that design is having sufficient number of animals to have statistical power. Using previously published data, the authors could have done a power analysis to determine the proper sample size. For these types of studies, where you are not doing intricate daily or weekly interventions/experiments, i.e., just keeping the animals for long periods while looking for mortality, it is not uncommon to have 3-5 times the number of animals used in the Séralini study.
As an example, I had the privilege to collaborate with Prof Morris Pollard at Notre Dame who developed the Lobund-Wistar rat model. Lobund-Wistar rats spontaneously develop prostate adenocarcinoma (PA) at a mean age of 26 months. In the publication of the model, out of 72 L-W rats, 19 (26%) developed large PAs. Imagine if Prof Pollard only used 10 male rats as in the Séralini study.
The Lobund-Wistar rat model of prostate cancer.
Pollard M.
J Cell Biochem Suppl. 1992;16H:84-8.
http://www.ncbi.nlm.nih.gov/pubmed/1289678
We use a transgenic mouse model for spontaneous ductal carcinoma in situ. Invasive carcinomas develop in 100% of the mice. The mice are called SV40-Tag mice based on the C3(1)/Tag mice. SV40-Tag stands for Simian virus 40 T-antigen, a trans-activating protein, which are essential for viral gene expression.
Green et al Oncogene 2000
http://www.ncbi.nlm.nih.gov/pubmed/10713685
The point is you have to know the tumor prevalence in the rodent model you are using and plan the control groups accordingly.
Multiple Comparisons/Sample Size
The study mentions that they use Discriminant Analysis (DA) to partition groups, i.e., you lump all the variables (factors) together and use DA to flesh out which factors influence the outcome, e.g., tumor size, biochemical markers, etc. In image analysis we use Linear Discriminant Analysis (LDA,http://en.wikipedia.org/wiki/Discriminant_analysis, http://goo.gl/oyNzh) to segment (classify) pixels. Say you want to automatically segment tumor from normal tissue using several image types of the same sample. You have thousands of pixels to work with, not 10. The method isn’t robust with 10 or less samples in the 20 groups used (note I separated the male and female groups in the Séralini study). Also, in the context of machine learning, you have to have a training set. In my example you give the program a set of pixels that you know belong to each group before testing the pixels you want to classify.
A quick review on null hypothesis testing. A type I error is when the null hypothesis is true but is rejected, i.e., a false positive. A type II error is when the null hypothesis is false but is incorrectly accepted as true, a false negative. Remember the null hypothesis can never be proven.
Here are examples from the Wiki:
Suppose the treatment is a new way of teaching writing to students, and the control is the standard way of teaching writing. Students in the two groups can be compared in terms of grammar, spelling, organization, content, and so on. As more attributes are compared, it becomes more likely that the treatment and control groups will appear to differ on at least one attribute by random chance alone.
Suppose we consider the efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. As more symptoms are considered, it becomes more likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom.
Suppose we consider the safety of a drug in terms of the occurrences of different types of side effects. As more types of side effects are considered, it becomes more likely that the new drug will appear to be less safe than existing drugs in terms of at least one side effect.
http://en.wikipedia.org/wiki/Multiple_comparisons
Statistical power is the probability of committing a type II error, false negative. Prof Pollard’s study has a statistical power of around 97% while the Séralini study is probably closer to 45%.
In Memoriam
I the process of digging up the study by Prof Pollard, I realized he had passed away. I met him when he was in his late 80s, to do an experiment for him at the University of Chicago. He was an impressive man and scientist. It is really a shame he is most often known for his son.
Pollard worked at all of these things until his very last days. “I can’t imagine doing anything else,” he said recently. “I think if you are doing something meaningful and important and you stop doing it, you’ll always look back with regret.”
L-W Rat pic: http://goo.gl/hzP6P
For #ScienceSunday curated by Allison Sekuler Rajini Rao Robby Bowles and me.
September 30, 2012
Excellent review and commentary, Chad Haney . Looking forward to hearing more in a couple of hours.
RIP, Prof. Pollard.
September 30, 2012
Thanks a lot for summing up the whole thing Chad Haney
Looking forward to the SSHOw, if I am able to stay awake till then 🙁
September 30, 2012
Great intro! I’d love to watch the SSHOw but can’t – will I be able to watch it later?
September 30, 2012
Thanks Rajini Rao and Deeksha Tare I would have liked to do more. This is a real rush job. I’m a little under the weather and there just isn’t enough time to dig into these issues enough.
September 30, 2012
I would have loved to have you on the SSHOw, Richard Smith ! Your comments on the EuroTech share on this topic were spot on, in my opinion. Yes, you will be able to watch it later on YouTube.
September 30, 2012
Richard Smith it will be on YouTube afterwards. Please feel free to comment there. I’m sure there will be plenty of questions raised.
September 30, 2012
Thanks Rajini Rao, next time there’s a plant-related thing I’d be very glad to join in. I’ve always got plenty to say :). I’ve spent the whole week trying to do damage control on this Seralini thing – am exhausted!
September 30, 2012
Much appreciated, Richard Smith . Will call on you for sure. We will be trying to do damage control as well.
September 30, 2012
Thanks Gnotic Pasta
September 30, 2012
I think it’s worth noting about Discriminant Analysis that it — like many variants of the General Linear Model — does not establish causality. Rather, it associates attributes with outcomes in a correlative, and not necessarily explanatory manner.
Secondly, while I second, endorse and applaud the consideration of statistical power in these discussions, one must be careful to also consider the impact of too large a sample on statistical significance. In such a case, what are trivial differences in reality acquire a certain “allure” for being found as statistically significant differences.
September 30, 2012
Such an important point William! And I always think the most powerful way to make it is: http://xkcd.com/552/
September 30, 2012
Always appreciate your statistical power William McGarvey
September 30, 2012
Chad Haney And to think I’m still a β in disguise…
September 30, 2012
No William McGarvey you are alpha male.
September 30, 2012
Chad Haney That’s because I’m error-prone, as a rule.
September 30, 2012
At least you have confidence…. at intervals.
September 30, 2012
I take your point, sir — at least, your point estimate, anyway.
September 30, 2012
Not sure if this is precisely germane to your later discussion, but Seralini et al.’s results might have also fallen prey to the problem of not setting an “experiment-wise α” and thus the risk of “probability pyramiding” (http://books.google.com/books?id=NnnuOZogrrkC&pg=PA73&lpg=PA73&dq=%22probability+pyramiding%22&source=bl&ots=khKahq0k47&sig=Ji2uoBucJFKUqsWRw7PiZjQE45Y&hl=en&sa=X&ei=2IFoUOLkD_SH0QGto4CgCQ&ved=0CEYQ6AEwBQ#v=onepage&q=%22probability%20pyramiding%22&f=false) in doing multiple comparisons. I don’t have their methods section at hand to make a firm call on that point, I’ll concede.
September 30, 2012
Great point William McGarvey I’ll have a look after the #SSHOw
September 30, 2012
Chad Haney Great — sorry to be so late in arriving. If you do a string search on the phrase “probability pyramiding” , look for the T.X. Barber on Pitfalls. Break a leg, as is said — on with the show.
October 1, 2012
I hate these small-sample studies with inflated claims. Hate. Hate. Hate.
(OK, enough hate speech for today).
November 30, 2013
Thanks for that.
Pour ceux qui ne lisent pas l’anglais, voici à peu près les mêmes idées en francais: http://www.sciencepresse.qc.ca/actualite/2012/09/20/letude-anti-ogm-syndrome-recherche-unique