如何将文本文件加载到 Pandas 数据框中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44157856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:41:12  来源:igfitidea点击:

How do I load a text file into a pandas dataframe?

pythonpandas

提问by deepayan das

I have a text file which looks something like this:

我有一个看起来像这样的文本文件:

`

`

 101   the   323
 103   to    324
 104   is    325

where the delimiter is four spaces. I am trying read_csvfunction inorder to convert it into a pandas data frame.

其中分隔符是四个空格。我正在尝试read_csv函数以将其转换为Pandas数据框。

data= pd.read_csv('file.txt', sep=" ", header = None)

However it is giving me lot of NaN values

然而它给了我很多 NaN 值

    101\tthe\tthe\t10115  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     102\tto\tto\t5491  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     103\tof\tof\t4767  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
       104\ta\ta\t4532  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  Na

Is there any way I can read the text file into a correct csv format.

有什么方法可以将文本文件读取为正确的 csv 格式。

回答by jezrael

If need separator exactly 4whitespaces:

如果需要分隔符正好是4空格:

data = pd.read_csv('file.txt', sep="\s{4}", header = None, engine='python')
print (data)
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325

Or use parameter delim_whitespace=True(thanks carthurs) or \s+if need separator one or more whitespaces:

或者使用参数delim_whitespace=True(感谢carhurs)或者\s+如果需要分隔符一个或多个空格:

data = pd.read_csv('file.txt', sep="\s+", header = None)
data = pd.read_csv('file.txt', delim_whitespace=True, header = None)

But if separator is tab:

但如果分隔符是tab

data = pd.read_csv('file.txt', sep="\t", header = None)

回答by EdChum

You have a fixed width file so you can use read_fwfwhich will just sniff the form of the file:

您有一个固定宽度的文件,因此您可以使用read_fwf它来嗅探文件的形式:

In[79]:
pd.read_fwf('file.txt', header=None)

Out[79]: 
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325