如何将文本文件加载到 Pandas 数据框中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44157856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I load a text file into a pandas dataframe?
提问by deepayan das
I have a text file which looks something like this:
我有一个看起来像这样的文本文件:
`
`
101 the 323
103 to 324
104 is 325
where the delimiter is four spaces. I am trying read_csv
function inorder to convert it into a pandas data frame.
其中分隔符是四个空格。我正在尝试read_csv
函数以将其转换为Pandas数据框。
data= pd.read_csv('file.txt', sep=" ", header = None)
However it is giving me lot of NaN values
然而它给了我很多 NaN 值
101\tthe\tthe\t10115 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
102\tto\tto\t5491 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
103\tof\tof\t4767 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
104\ta\ta\t4532 NaN NaN NaN NaN NaN NaN NaN NaN NaN Na
Is there any way I can read the text file into a correct csv format.
有什么方法可以将文本文件读取为正确的 csv 格式。
回答by jezrael
If need separator exactly 4
whitespaces:
如果需要分隔符正好是4
空格:
data = pd.read_csv('file.txt', sep="\s{4}", header = None, engine='python')
print (data)
0 1 2
0 101 the 323
1 103 to 324
2 104 is 325
Or use parameter delim_whitespace=True
(thanks carthurs) or \s+
if need separator one or more whitespaces:
或者使用参数delim_whitespace=True
(感谢carhurs)或者\s+
如果需要分隔符一个或多个空格:
data = pd.read_csv('file.txt', sep="\s+", header = None)
data = pd.read_csv('file.txt', delim_whitespace=True, header = None)
But if separator is tab
:
但如果分隔符是tab
:
data = pd.read_csv('file.txt', sep="\t", header = None)