使用 Pandas 使用分隔符读取 txt 文件创建 NaNs 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33509627/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Pandas to read in txt file using delimiters creates NaNs columns
提问by user3120266
I am trying to read in a text file into pandas but its creating NaNs for all for all of the rows. I tried to use a delimiter to break up the variables that are separated by a \ but this is not working correctly. Here is what the data file looks like in the text file
我正在尝试将文本文件读入Pandas,但它为所有行创建了 NaN。我尝试使用分隔符来分解由 \ 分隔的变量,但这无法正常工作。这是数据文件在文本文件中的样子
Data:
数据:
Date Name Group Direction
2015-01-01 Smith.John - In
2015-01-01 Smith.Jan Claims Out
2015-01-01 - Claims In
2015-01-01 Smith.Jessica Other In
Here is my first attempt to read in the data:
这是我第一次尝试读入数据:
pd.read_csv('C:\Users\Desktop\skills.txt',
names=['Date','AgentName','Group','Direction'])
However, this produce
然而,这种产
Date AgentID AssignedWorkGroup CallDirection
0 Date\tAgentID\tAssignedWorkGroup\tCallDire... NaN NaN NaN
1 2015-09-01\Smith.John\t-\tIn NaN NaN NaN
So I tried to get rid of the \ by doing:
所以我试图通过执行以下操作来摆脱 \:
pd.read_csv('C:\Users\Desktop\skills.txt',
names=['Date','AgentName','Group','Direction'],delimiter='\')
But this still produces the same results. So couple of things. One is that I can't break out the '\'. Additionally, looks like the first row getting read in are the headers. I tried using header=None to get rid of them but that didn't work out too well for me either. It also appears that their is a t (I assume for text?) being place in front of every variable
但这仍然会产生相同的结果。所以有几件事。一是我不能打破'\'。此外,看起来读入的第一行是标题。我尝试使用 header=None 来摆脱它们,但这对我来说也不是很好。似乎他们在(我假设是文本?)被放置在每个变量的前面
I feel as though I am approaching this incorrectly
我觉得好像我在错误地接近这个
回答by EdChum
Because you passed the alternate column names this means that the csv parser is interpreting the first row as a valid data row so you need to pass skiprows=1
to skip your header, additionally the default separator is comma ,
but it looks like your data is either tab or multi-space separated so you can pass sep='\t'
or sep='\s+'
.
因为您传递了备用列名,这意味着 csv 解析器将第一行解释为有效的数据行,因此您需要传递skiprows=1
以跳过标题,另外默认分隔符是逗号,,
但看起来您的数据是制表符或多- 空格分隔,因此您可以通过sep='\t'
或sep='\s+'
。
It's unclear if your data is tab or space separated but the following worked for me:
目前尚不清楚您的数据是制表符还是空格分隔,但以下内容对我有用:
In [18]:
t="""Date Name Group Direction
2015-01-01 Smith.John - In
2015-01-01 Smith.Jan Claims Out
2015-01-01 - Claims In
2015-01-01 Smith.Jessica Other In"""
pd.read_csv(io.StringIO(t), names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+')
Out[18]:
Date AgentName Group Direction
0 2015-01-01 Smith.John - In
1 2015-01-01 Smith.Jan Claims Out
2 2015-01-01 - Claims In
3 2015-01-01 Smith.Jessica Other In
so I expect
所以我期待
pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\t')
or
或者
pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+')
to work for you
为你工作
回答by Mike Müller
Using whitespace as delimeter works:
使用空格作为分隔符的工作原理:
df = pd.read_csv('C:\Users\Desktop\skills.txt', delim_whitespace=True)
df.columns = ['Date','AgentName','Group','Direction']
Output:
输出:
Date AgentName Group Direction
0 2015-01-01 Smith.John - In
1 2015-01-01 Smith.Jan Claims Out
2 2015-01-01 - Claims In
3 2015-01-01 Smith.Jessica Other In