Python Pandas read_table 使用第一列作为索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28200404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas read_table use first column as index
提问by fricadelle
I have a little bit of a problem here. I have a txt file containing lines of the form (let's say for line 1):
我这里有点问题。我有一个包含表单行的 txt 文件(假设第 1 行):
id1-a1-b1-c1
I want to load it in a data frame using pandas with the index being the id's and the columns name being 'A', 'B', 'C' and the values the corresponding ai, bi, ci
我想使用 Pandas 将其加载到数据框中,索引为 id,列名称为 'A'、'B'、'C' 以及相应的值 ai、bi、ci
at the end I want the dataframe to look like:
最后我希望数据框看起来像:
'A' 'B' 'C'
id1 a1 b1 c1
id2 a2 b2 c2
... ... ... ...
I may want to read by chunks in the file is large but let's assume I read at once:
我可能想按块读取文件很大,但让我们假设我一次读取:
with open('file.txt') as f:
table = pd.read_table(f, sep='-', index_col=0, header=None, lineterminator='\n')
and rename the columns
并重命名列
table.columns = ['A','B','C']
my current output is something like:
我目前的输出是这样的:
'A' 'B' 'C'
0
id1 a1 b1 c1
id2 a2 b2 c2
... ... ... ...
there is an extra row that I can't explain
有一行我无法解释
Thanks
谢谢
EDIT
编辑
when I try to add the field
当我尝试添加字段时
chunksize=20
and after doing:
并在做之后:
for chunk in table:
print(chunk)
I get the following error:
我收到以下错误:
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
回答by Bryan
If you know the column names before the file is read, pass the list using names
parameter of read_table:
如果您在读取文件之前知道列名,请使用read_table 的names
参数传递列表:
with open('file.txt') as f:
table = pd.read_table(f, sep='-', index_col=0, header=None, names=['A','B','C'],
lineterminator='\n')
Which outputs:
哪些输出:
A B C
id1 a1 b1 c1
id2 a2 b2 c2