pandas Blast 解析:AttributeError:'float' 对象没有属性 'split'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46543996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Blast parsing: AttributeError: 'float' object has no attribute 'split'
提问by Brindha Lekshmisaran
I am trying to write a script to parse the Ncbi BLAST report. The column that is causing this error is the genome GI number.
我正在尝试编写一个脚本来解析 Ncbi BLAST 报告。导致此错误的列是基因组 GI 编号。
E.g. LT697097.1
例如 LT697097.1
There is a decimal at the end. When i try to split this and just get the GI number, I get this error.
最后有一个小数点。当我尝试拆分它并仅获取 GI 编号时,出现此错误。
Django AttributeError 'float' object has no attribute 'split'tells me that this error is because split assumes that it is a float value.
Django AttributeError 'float' object has no attribute 'split'告诉我这个错误是因为 split 假定它是一个浮点值。
So, I used the advice from Pandas reading csv as string typeto import the pandas column as string.
因此,我使用Pandas 将 csv 作为字符串类型读取的建议将Pandas列作为字符串导入。
I am using column number as the report doesn't automatically have column names.
我正在使用列号,因为报告不会自动包含列名。
import pandas as pd
df = pd.read_csv("out.txt", sep="\t", dtype=object, names = ['query id','subject ids','query acc.ver','subject acc.ver','% identity','alignment length', 'mismatches','gap opens','q.start','q.end','s.start','s.end','evalue','bit score'])
sacc = df['subject acc.ver']
sacc = [i.split('.',1)[0] for i in sacc]
I still get the error AttributeError: 'float' object has no attribute 'split'.
我仍然收到错误 AttributeError: 'float' object has no attribute 'split'。
I then tried astype(str) as suggested by Convert Columns to String in Pandas.
然后我按照Convert Columns to String in Pandas 的建议尝试了 astype(str) 。
This fails to read the column, and only has the columns names attribute as the output value.
这无法读取列,并且只有列名称属性作为输出值。
Can you please advice me where I'm going wrong in my approach?
你能告诉我我的方法哪里出错了吗?
采纳答案by jezrael
I think you need str.split
with selecting first list which working with NaN
s very nice. Another problem should be some values without .
:
我认为您需要str.split
选择第一个与NaN
s一起使用的列表非常好。另一个问题应该是一些没有的值.
:
df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
Sample:
样本:
df = pd.DataFrame({'subject acc.ver':['LT697097.1',np.nan,None, 'LT6']})
df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
print (df)
subject acc.ver
0 LT697097
1 NaN
2 None
3 LT6