专栏名称: 生信百科
依托高校科研平台,面向生物信息科研工作者。生物信息学习资料;常见数据分析技巧、流程;公共数据库分享;科研思路分享;
51好读  ›  专栏  ›  生信百科

一个计算同义突变(Ks)与非同义突变(Ka)的方法

生信百科  · 公众号  · 医学  · 2017-10-31 07:00

正文

Ka,Ks 的概念和意义

Ks(synonymous substitutions):同义替换,即一个密码子中碱基的突变不改变该密码子编码的氨基酸。

Ka(nonsynonymous substitutions):非同义替换,即一个密码子中碱基的突变改变该密码子编码的氨基酸。

dN/dS 相当于 Ka/Ks,其意义如下:

Neutral Evolution (drift): dN/dS ratio = 1 implies there has been equal numbers of synonymous (dna substitutions that do not affect the protein sequence) and non-synonymous changes (dna substitutions that do affect the protein sequence) during the time between ancestral to the modern versions of the protein.

Positive Selection (adaptive evolution): dN/dS ratio > 1 implies there has been more non-synonymous changes than synonymous changes. There has been evolutionary pressure to escape from the ancestral state - i.e. positive selection pressure. This can occur for example in paralogues that are required to serve a novel function, or in proteins of parasites that need to escape host immune recognition (e.g. changes to avoid MHC-1 binding to evade T-cell attack).

Negative Selection (conservation): dN/dS ratio < 1 implies there has been more synonymous changes than non-synonymous changes. There has been evolutionary pressure to conserve the ancestral state - i.e. negative selection pressure. This can occur for example in orthologues that are required to maintain (conserve) some function encoded in the protein sequence, since changes from this state would lead to disruption of function.

计算Ka,Ks

使用软件 KaKs_Calculator (https://code.google.com/archive/p/kaks-calculator/downloads)进行计算,该软件的输入文件是核酸水平的 AXT格式文件。下文讲解 axt 文件的准备和该软件的使用示例。

假如有如下一对同源基因的蛋白质序列以及其对应的 cds 序列

蛋白质序列(example_pep.fas)

>ENSP00000004982

MEIPVPVQPSWLRRASAPLPGLSAPGRLFDQRFGEGLLEAELAALCPTTLAPYYLRAPSVALPVAQVPTD

PGHFSVLLDVKHFSPEEIAVKVVGEHVEVHARHEERPDEHGFVAREFHRRYRLPPGVDPAAVTSALSPEG

VLSIQAAPASAQAPPPAAAK

>ENSMUSP00000039172

MEIPVPVQPSWLRRASAPLPGFSAPGRLFDQRFGEGLLEAELASLCPAAIAPYYLRAPSVALPTAQVSTD

SGYFSVLLDVKHFLPEEISVKVVDDHVEVHARHEERPDEHGFIAREFHRRYRLPPGVDPAAVTSALSPEG

VLSIQATPASAQAQLPSPPAAK

cds 序列(example_cds.fas)

>ENSP00000004982

ATGGAGATCCCTGTGCCTGTGCAGCCGTCTTGGCTGCGCCGCGCCTCGGCCCCGTTGCCCGGACTTTCGG

CGCCCGGACGCCTCTTTGACCAGCGCTTCGGCGAGGGGCTGCTGGAGGCCGAGCTGGCTGCGCTCTGCCC

CACCACGCTCGCCCCCTACTACCTGCGCGCACCCAGCGTGGCGCTGCCCGTCGCCCAGGTGCCGACGGAC

CCCGGCCACTTTTCGGTGCTGCTAGACGTGAAGCACTTCTCGCCGGAGGAAATTGCTGTCAAGGTGGTGG

GCGAACACGTGGAGGTGCACGCGCGCCACGAGGAGCGCCCGGATGAGCACGGATTCGTCGCGCGCGAGTT

CCACCGTCGCTACCGCCTGCCGCCTGGCGTGGATCCGGCTGCCGTGACGTCCGCGCTGTCCCCCGAGGGC

GTCCTGTCCATCCAGGCCGCACCAGCGTCGGCCCAGGCCCCACCGCCAGCCGCAGCCAAGTAG

>ENSMUSP00000039172

ATGGAGATCCCCGTGCCTGTGCAGCCTTCTTGGCTGCGCCGTGCTTCAGCTCCTTTACCAGGTTTCTCTG

CTCCGGGACGCCTCTTTGACCAGCGTTTCGGCGAAGGGCTGCTTGAGGCAGAGCTGGCTTCACTGTGCCC

TGCTGCGATCGCCCCCTACTATCTGCGCGCCCCCAGTGTGGCGTTACCCACAGCCCAGGTGTCCACGGAC

TCTGGGTATTTTTCCGTGCTGCTGGATGTGAAGCACTTCTTGCCAGAGGAAATCTCTGTCAAGGTGGTTG

ACGACCATGTGGAGGTCCATGCTCGGCACGAGGAGCGCCCGGATGAACACGGATTCATTGCTCGAGAGTT

CCACCGCCGATACCGCCTGCCTCCTGGTGTGGACCCTGCTGCTGTGACCTCAGCACTGTCTCCTGAGGGT

GTCCTGTCCATCCAGGCCACACCAGCGTCGGCCCAGGCCCAACTTCCGTCACCACCTGCTGCCAAGTAG

step 1:

蛋白质序列联配(alignment)

mafft --quiet --auto example_pep.fas > example_pep_aln.fas

Step 2:

将蛋白质序列联配转换成核酸序列联配

perl pal2nal.pl example_pep_aln.fas example_cds.fas -output fasta > example_cds_aln.fas

Step 3:

将核酸序列联配转为 AXT 格式

python FastaIntoAXT.py example_cds_aln.fas > example_cds_aln.axt

其中,FastaIntoAXT.py 的内容如下:

import sys

def parseFasta(filename):
    fas = {}
    idlis = []
    id = None
    with open(filename, 'r') as fh:
        for line in fh:
            if line[0] == '>':
                header = line[1:].rstrip()
                id = header.split()[0]
                idlis.append(id)
                fas[id] = []
            else:
                fas[id].append(line.rstrip())
        for id, seq in fas.iteritems():
            fas[id] = ''.join(seq)
    return fas, idlis

ALN, IDlis = parseFasta(sys.argv[1])

outid = "-".join(IDlis)
outseq = "\n".join([ALN[IDlis[0]],ALN[IDlis[1]]])

print outid
print outseq

Step 4:

计算 Ka,Ks

KaKs_Calculator -i example_cds_aln.axt -o example.kaks -m YN

其中输出文件是 example.kaks,其内容如下:

Sequence        Method  Ka      Ks      Ka/Ks   P-Value(Fisher) Length  S-Sites N-Sites Fold-Sites(0:2:4)       Substitutions   S-Substitutions   N-Substitutions Fold-S-Substitutions(0:2:4)     Fold-N-Substitutions(0:2:4)     Divergence-Time





请到「今天看啥」查看全文