The importance of being genomic: Non-coding and coding sequences suggest different models of toxin multi-gene family evolution
Research output: Contribution to journal › Article › peer-review
Final published version
Studies of multi-gene protein families, including many toxins, are crucial for understanding the role of gene duplication in generating protein diversity in general. However, many evolutionary analyses of gene families are based on coding sequences, and do not take into account many potentially confounding evolutionary factors, such as recombination and convergence due to selection. We illustrate this using snake venom gene sequences from the Phospholipase A2 (PLA2) subfamily. Novel gene sequences from 20 species of understudied Asian pitvipers were analyzed alongside available genomic PLA2 sequences from another four crotaline and several viperine species. In contrast to previous analyses of this toxin family based on cDNA sequences, we find that duplication events are concentrated at the tips of the tree, suggesting that major functions such as presynaptic neurotoxicity have evolved convergently multiple times in pitvipers. We provide evidence that this discrepancy is due to differing evolutionary patterns between introns and exons. The effects of several well-known sources of bias on the phylogeny were small, compared to the effect of analyses based on different partitions of the gene (whole gene sequence, non-coding regions, cDNA sequence). Switches of function were found to be largely associated with strong selection, and with duplication events. Use of coding sequences for phylogeny estimation potentially produces incorrect inferences about the action of selection on individual lineages and sites. Our results have major implications for phylogenomic methods of functional inference as well as for our understanding of the evolution of multigene families.
|Issue number||Part B|
|Publication status||Published - 7 Sep 2015|