使用 Python 反向互补 DNA 链

Question

提问by user3783999

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:

我有一个 DNA 序列，想用 Python 得到它的反向互补。它位于 CSV 文件的一列中，我想将反向补码写入同一文件中的另一列。棘手的部分是，有一些单元格除了 A、T、G 和 C 之外，还有一些单元格。我能够用这段代码获得反向补码：

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.

但是，当我尝试查找补码字典中不存在的项目时，使用下面的代码，我只会得到最后一个碱基的补码。它不迭代。我想知道如何修复它。

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))

Answer 1

采纳答案by Gabriel

The getmethod of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sniwith ins.

get如果键不在字典中，字典的方法允许您指定默认值。作为预处理步骤，我会将您所有的非“ATGC”碱基映射到单个字母（或标点符号或数字或任何不会出现在您的序列中的东西），然后颠倒序列，然后将单个字母替换为它们的原始字母。或者，您可以先反转它，然后搜索和替换诸如sniwith 之类的东西ins。

alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 

def reverse_complement(seq):    
    for k,v in alt_map.iteritems():
        seq = seq.replace(k,v)
    bases = list(seq) 
    bases = reversed([complement.get(base,base) for base in bases])
    bases = ''.join(bases)
    for k,v in alt_map.iteritems():
        bases = bases.replace(v,k)
    return bases

>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA

Answer 2

回答by Jason S

In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.

通常，生成器表达式比原始代码更简单，并且避免创建额外的列表对象。如果可以插入多个字符，请使用其他答案。

complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))

Answer 3

回答by xbello

The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest you Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?

其他答案非常好，但是如果您打算处理真实的 DNA 序列，我建议您使用Biopython。如果遇到像“-”、“*”或不确定的字符怎么办？如果您想对序列进行进一步操作怎么办？你想为每种文件格式创建一个解析器吗？

The code you ask for is as easy as:

您要求的代码很简单：

from Bio.Seq import Seq

seq = Seq("TCGGGCCC")

print seq.reverse_complement()
# GGGCCCGA

Now if you want to do another transformations:

现在，如果您想进行其他转换：

print seq.complement()
print seq.transcribe()
print seq.translate()

Outputs

输出

AGCCCGGG
UCGGGCCC
SG

And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:

如果您遇到奇怪的字符，则无需继续向您的程序添加代码。Biopython 处理它：

seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA

Answer 4

回答by Nathan M

import string
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print "AAAACCCGGT".translate(tab)[::-1]

that will give you the reverse compliment = ACCGGGTTTT

那会给你相反的赞美 = ACCGGGTTTT

Answer 5

回答by niksy

def ReverseComplement(Pattern):
    revcomp = []
    x = len(Pattern)
    for i in Pattern:
        x = x - 1
        revcomp.append(Pattern[x])
    return ''.join(revcomp)

# this if for the compliment 

def compliment(Nucleotide):
    comp = []
    for i in Nucleotide:
        if i == "T":
            comp.append("A")
        if i == "A":
            comp.append("T")
        if i == "G":
            comp.append("C")
        if i == "C":
            comp.append("G")

    return ''.join(comp)

Answer 6

回答by Akansha Rana

Give a try to below code,

试试下面的代码，

complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))

Answer 7

回答by alphahmed

The fastest one liner for reverse complement is the following:

反向补充最快的一种衬垫如下：

def rev_compl(st):
    nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
    return "".join(nn[n] for n in reversed(st))

使用 Python 反向互补 DNA 链

提问by user3783999

采纳答案by Gabriel

回答by Jason S

回答by xbello

回答by Nathan M

回答by niksy

回答by Akansha Rana

回答by alphahmed

相关推荐

最近更新

标签

使用 Python 反向互补 DNA 链

提问by user3783999

采纳答案by Gabriel

回答by Jason S

回答by xbello

回答by Nathan M

回答by niksy

回答by Akansha Rana

回答by alphahmed

相关推荐

Python 如何从flask调用另一个webservice api

如何通过 Jinja2 从 Python 传递列表到 JavaScript

Python ipynb 导入另一个 ipynb 文件

Python遍历对象属性

相关推荐

最近更新

标签