分类: PAML, 分子进化

一场论战和7篇PNAS

原文发布日期:20131029

以前写过篇「人类基因组计划的八卦—在PNAS上的口水仗」,讲了人类基因组的两个团队在PNAS吵架的故事,一边是Human Genome Project(HGP)说Celera Genomics不可能只用鸟枪法(Whole genome sequencing)完成测序,而是利用了自己的数据,而另一边回应说我们当然是独立完成的……

最近在研究PAML,读到杨子恒老师在MBE上一篇关于Branch site model的文章,然后顺着发现了一场很有意思的争论。

从头讲起,2009年左右,一个日本的博后Masafumi Nozawaa想到了一个“研究领域”—— 「Reliabilities of statistical methods for detecting positive selection」,就是给分子进化里正向选择的统计方法挑刺儿。于是M兄2009年4月在PNAS上撰文说通过用计算机模体,我们发现利用杨子恒提出的利用似然法的Branch site methods(BSM),在核苷酸替代率比较低的时候会有很高的假阳性结果(False positive),所以该方法不!可!靠!1

同年9月,杨子恒在PNAS回应,说M兄你提的这个问题如果是真的那真是意义重大,可是你的结论和计算模拟结果不符,而且M兄声称过高的假阳性结果只含有0.23%,远低于显著性差异的5%。M兄声称的统计分布“异常”(Abnormal)也是没道理的2:

The authors’ conclusion, if true, would be important. But it is contradicted by their simulation results.

M兄在PNAS同一期上发表回应(编辑好险恶……),题目起为回应杨等人(Response to Yang et al.,:……),开篇先说:很遗憾杨等人故意错误的报导我们最近研究的内容和结论……

It is unfortunate that Yang et al. misrepresent the contents and conclusions of our recent study.

然后说我给你举个例子吧:经过我们的模拟,用Small sample Methods(SSM)方法,在Bakewell等人发表在PNAS的研究中3(Bakewell就这么莫名其妙的躺!枪!了!),13,888个基因中只有5个受到正向选择,但是用Branch Site Mothods (BSM)算出来竟然有154个!而且根据我们对蛋白进行的功能分析用BSM还会错误的预测有功能的位点。4

Our application of SSM to Bakewell et al.’s (3) data sug- gested that only 5 of the 13,888 genes studied were under positive selection in the human lineage, though BSM identified 154 genes.

这场争斗暂时告一段落,之后M兄换了个目标,2010年六月又在PNAS发文,炮轰一个2010年发表在PNAS上的关于正向选择影响蝴蝶视蛋白进化的论文5,说作者Briscoe等人所用的贝叶斯统计方法会有非常高的假阳性率,就算没有选择压力也会算出很高的假阳性来。你这方法甚至不比随机挑数据好,你这问题明显是一个先有鸡还是先有蛋的问题(chicken-and-egg problem)6。

Unfortunately, however, the mathematical basis of the BS Bayesian statistical method is not well-founded, and the method is known to produce significant false-positive results even when there is no selection (3). In fact, application of this method to opsin gene data from many different vertebrate species indicated that the predictability of λmax- shifting sites by the BS method is not much better than random prediction (3, 4).

作者Briscoe在同期PNAS弱弱的回应(喂喂,编辑同学,节操呢??),说关于正向选择的统计方法已经有很长的历史,但还是在不断变化的,我们搞实验的(As experimental biologists)哪懂这些啊,啥方法好我们就用啥啊!

As experimental biologists, we wel- come all methods that facilitate the detection of interesting parts of the genome for functional exploration.

总的来说,我们的结果还是有意义的!然后,Briscoe打圆场似的说我们期待M大哥和杨子恒大哥在这个领域开发出更好的方法给我等做实验的用……7

In isolation, none of these individual observations would have much meaning, but together, they paint a fascinating picture of the biology of these beautiful butterflies. We look forward to applying statistical work by Nozawa et al. and Yang et al. (2–5) in this field to our system.

后记:

语气写的略调侃,完全是为了让文章好玩些。不尊敬的地方请见谅;-)

在这个领域还是菜鸟,欢迎讨论。

Reference

1 Nozawa et al. Reliabilities of identifying positive selection by the branch-site and the site-prediction methods. PNAS(2009)↩

2 Yang et al. In defense of statistical methods for detecting positive selection. PNAS(2009)↩

3 Bakewell et al. More genes underwent positive selection in chimpanzee evolution than in human evolution. PNAS (2007)↩

4 Nozawa et al. Response to Yang et al.: Problems with Bayesian methods of detecting positive selection at the DNA sequence level. PNAS (2009)↩

5 Briscoe et al. Positive selection of a duplicated UV-sensitive visual pigment coincides with wing pigment evolution in Heliconius butterflies. PNAS(2010)↩

6 Nozawa et al. Is positive selection responsible for the evolution of a duplicate UV-sensitive opsin gene in Heliconius butterflies?. PNAS (2010)↩

7 Briscoe et al. Reply to Nozawa et al.: Complementary statistical methods support positive selection of a duplicated UV opsin gene in Heliconius. PNAS (2010)↩