使用 Pandas 使用分隔符读取 txt 文件创建 NaNs 列

Question

提问by user3120266

I am trying to read in a text file into pandas but its creating NaNs for all for all of the rows. I tried to use a delimiter to break up the variables that are separated by a \ but this is not working correctly. Here is what the data file looks like in the text file

我正在尝试将文本文件读入Pandas，但它为所有行创建了 NaN。我尝试使用分隔符来分解由 \ 分隔的变量，但这无法正常工作。这是数据文件在文本文件中的样子

Data:

数据：

Date         Name          Group    Direction
2015-01-01  Smith.John      -          In
2015-01-01  Smith.Jan       Claims     Out
2015-01-01     -            Claims     In
2015-01-01  Smith.Jessica   Other      In

Here is my first attempt to read in the data:

这是我第一次尝试读入数据：

pd.read_csv('C:\Users\Desktop\skills.txt',
        names=['Date','AgentName','Group','Direction'])

However, this produce

然而，这种产

    Date    AgentID     AssignedWorkGroup   CallDirection
 0  Date\tAgentID\tAssignedWorkGroup\tCallDire...   NaN     NaN     NaN
 1  2015-09-01\Smith.John\t-\tIn                    NaN     NaN     NaN

So I tried to get rid of the \ by doing:

所以我试图通过执行以下操作来摆脱 \：

 pd.read_csv('C:\Users\Desktop\skills.txt',
         names=['Date','AgentName','Group','Direction'],delimiter='\')

But this still produces the same results. So couple of things. One is that I can't break out the '\'. Additionally, looks like the first row getting read in are the headers. I tried using header=None to get rid of them but that didn't work out too well for me either. It also appears that their is a t (I assume for text?) being place in front of every variable

但这仍然会产生相同的结果。所以有几件事。一是我不能打破'\'。此外，看起来读入的第一行是标题。我尝试使用 header=None 来摆脱它们，但这对我来说也不是很好。似乎他们在（我假设是文本？）被放置在每个变量的前面

I feel as though I am approaching this incorrectly

我觉得好像我在错误地接近这个

Answer 1

回答by EdChum

Because you passed the alternate column names this means that the csv parser is interpreting the first row as a valid data row so you need to pass skiprows=1to skip your header, additionally the default separator is comma ,but it looks like your data is either tab or multi-space separated so you can pass sep='\t'or sep='\s+'.

因为您传递了备用列名，这意味着 csv 解析器将第一行解释为有效的数据行，因此您需要传递skiprows=1以跳过标题，另外默认分隔符是逗号，,但看起来您的数据是制表符或多- 空格分隔，因此您可以通过sep='\t'或sep='\s+'。

It's unclear if your data is tab or space separated but the following worked for me:

目前尚不清楚您的数据是制表符还是空格分隔，但以下内容对我有用：

In [18]:
t="""Date         Name          Group    Direction
2015-01-01  Smith.John      -          In
2015-01-01  Smith.Jan       Claims     Out
2015-01-01     -            Claims     In
2015-01-01  Smith.Jessica   Other      In"""
pd.read_csv(io.StringIO(t), names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+')

Out[18]:
         Date      AgentName   Group Direction
0  2015-01-01     Smith.John       -        In
1  2015-01-01      Smith.Jan  Claims       Out
2  2015-01-01              -  Claims        In
3  2015-01-01  Smith.Jessica   Other        In

so I expect

所以我期待

pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\t')

or

或者

pd.read_csv('C:\Users\Desktop\skills.txt', names=['Date','AgentName','Group','Direction'], skiprows=1, sep='\s+')

to work for you

为你工作

Answer 2

回答by Mike Müller

Using whitespace as delimeter works:

使用空格作为分隔符的工作原理：

df = pd.read_csv('C:\Users\Desktop\skills.txt', delim_whitespace=True)
df.columns = ['Date','AgentName','Group','Direction']

Output:

输出：

         Date      AgentName   Group Direction
0  2015-01-01     Smith.John       -        In
1  2015-01-01      Smith.Jan  Claims       Out
2  2015-01-01              -  Claims        In
3  2015-01-01  Smith.Jessica   Other        In

使用 Pandas 使用分隔符读取 txt 文件创建 NaNs 列

提问by user3120266

回答by EdChum

回答by Mike Müller

相关推荐

最近更新

标签

使用 Pandas 使用分隔符读取 txt 文件创建 NaNs 列

提问by user3120266

回答by EdChum

回答by Mike Müller

相关推荐

使用 pandas.read_sql 和 MSAccess 的特定表名的“sql 执行失败”

pandas 在附加中格式化数据帧

Pandas DataFrame 中每月记录的平均每日计数

比较 PandaS DataFrames 并返回第一个缺失的行

相关推荐

最近更新

标签