Python 使用 Pandas 读取带有前导空格的文本文件会给出 NaN 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16022094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using pandas to read text file with leading whitespace gives a NaN column
提问by Caleb
I am using pandas.read_csv to read a whitespace delimited file. The file has a variable number of whitespace characters in front of every line (the numbers are right-aligned). When I read this file, it creates a column of NaN. Why does this happen, and what is the best way to prevent it?
我正在使用 pandas.read_csv 读取以空格分隔的文件。该文件的每一行前面都有可变数量的空白字符(数字右对齐)。当我阅读此文件时,它会创建一列 NaN。为什么会发生这种情况,预防它的最佳方法是什么?
Example:
例子:
Text file:
文本文件:
9.0 3.3 4.0
32.3 44.3 5.1
7.2 1.1 0.9
Command:
命令:
import pandas as pd
pd.read_csv("test.txt",delim_whitespace=True,header=None)
Output:
输出:
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
采纳答案by DSM
FWIW I tend to use \s+instead, and it doesn't suffer the same problem:
FWIW 我倾向于使用\s+,它不会遇到同样的问题:
>>> pd.read_csv("wspace.csv", header=None, delim_whitespace=True)
0 1 2 3
0 NaN 9.0 3.3 4.0
1 NaN 32.3 44.3 5.1
2 NaN 7.2 1.1 0.9
>>> pd.read_csv("wspace.csv", header=None, sep=r"\s+")
0 1 2
0 9.0 3.3 4.0
1 32.3 44.3 5.1
2 7.2 1.1 0.9

