如何将文本文件加载到 Pandas 数据框中？

Question

提问by deepayan das

I have a text file which looks something like this:

我有一个看起来像这样的文本文件：

`

 101   the   323
 103   to    324
 104   is    325

where the delimiter is four spaces. I am trying read_csvfunction inorder to convert it into a pandas data frame.

其中分隔符是四个空格。我正在尝试read_csv函数以将其转换为Pandas数据框。

data= pd.read_csv('file.txt', sep=" ", header = None)

However it is giving me lot of NaN values

然而它给了我很多 NaN 值

    101\tthe\tthe\t10115  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     102\tto\tto\t5491  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
     103\tof\tof\t4767  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
       104\ta\ta\t4532  NaN  NaN     NaN  NaN  NaN  NaN  NaN  NaN  NaN  Na

Is there any way I can read the text file into a correct csv format.

有什么方法可以将文本文件读取为正确的 csv 格式。

Answer 1

回答by jezrael

If need separator exactly 4whitespaces:

如果需要分隔符正好是4空格：

data = pd.read_csv('file.txt', sep="\s{4}", header = None, engine='python')
print (data)
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325

Or use parameter delim_whitespace=True(thanks carthurs) or \s+if need separator one or more whitespaces:

或者使用参数delim_whitespace=True（感谢carhurs）或者\s+如果需要分隔符一个或多个空格：

data = pd.read_csv('file.txt', sep="\s+", header = None)
data = pd.read_csv('file.txt', delim_whitespace=True, header = None)

But if separator is tab:

但如果分隔符是tab：

data = pd.read_csv('file.txt', sep="\t", header = None)

Answer 2

回答by EdChum

You have a fixed width file so you can use read_fwfwhich will just sniff the form of the file:

您有一个固定宽度的文件，因此您可以使用read_fwf它来嗅探文件的形式：

In[79]:
pd.read_fwf('file.txt', header=None)

Out[79]: 
     0    1    2
0  101  the  323
1  103   to  324
2  104   is  325

如何将文本文件加载到 Pandas 数据框中？

提问by deepayan das

回答by jezrael

回答by EdChum

相关推荐

最近更新

标签

如何将文本文件加载到 Pandas 数据框中？

提问by deepayan das

回答by jezrael

回答by EdChum

相关推荐

如何按列和索引连接 Pandas DataFrames？

pandas 根据其他两列的相等性创建一个新列

pandas 在seaborn图表中对分类标签进行排序

pandas 用于随机生成器的 Python 熊猫种子

相关推荐

最近更新

标签