Python Pandas read_table 使用第一列作为索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28200404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:54:00  来源:igfitidea点击:

Pandas read_table use first column as index

pythonpandas

提问by fricadelle

I have a little bit of a problem here. I have a txt file containing lines of the form (let's say for line 1):

我这里有点问题。我有一个包含表单行的 txt 文件(假设第 1 行):

id1-a1-b1-c1

I want to load it in a data frame using pandas with the index being the id's and the columns name being 'A', 'B', 'C' and the values the corresponding ai, bi, ci

我想使用 Pandas 将其加载到数据框中,索引为 id,列名称为 'A'、'B'、'C' 以及相应的值 ai、bi、ci

at the end I want the dataframe to look like:

最后我希望数据框看起来像:

    'A'   'B'  'C'
id1  a1    b1   c1
id2  a2    b2   c2
...   ...   ...  ...

I may want to read by chunks in the file is large but let's assume I read at once:

我可能想按块读取文件很大,但让我们假设我一次读取:

with open('file.txt') as f:
    table = pd.read_table(f, sep='-', index_col=0, header=None,   lineterminator='\n')

and rename the columns

并重命名列

table.columns = ['A','B','C']

my current output is something like:

我目前的输出是这样的:

    'A'   'B'  'C'
0
id1  a1    b1   c1
id2  a2    b2   c2
...   ...   ...  ...

there is an extra row that I can't explain

有一行我无法解释

Thanks

谢谢

EDIT

编辑

when I try to add the field

当我尝试添加字段时

chunksize=20

and after doing:

并在做之后:

for chunk in table:
    print(chunk)

I get the following error:

我收到以下错误:

pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

回答by Bryan

If you know the column names before the file is read, pass the list using namesparameter of read_table:

如果您在读取文件之前知道列名,请使用read_table 的names参数传递列表:

with open('file.txt') as f:
    table = pd.read_table(f, sep='-', index_col=0, header=None, names=['A','B','C'],
                          lineterminator='\n')

Which outputs:

哪些输出:

      A   B   C
id1  a1  b1  c1
id2  a2  b2  c2