Python Pandas 错误“只能使用带有字符串值的 .str 访问器”

Question

提问by eleanora

I have the following input file:

我有以下输入文件：

"Name",97.7,0A,0A,65M,0A,100M,5M,75M,100M,90M,90M,99M,90M,0#,0N#,

And I am reading it in with:

我正在阅读它：

#!/usr/bin/env python

import pandas as pd
import sys
import numpy as np

filename = sys.argv[1]
df = pd.read_csv(filename,header=None)
for col in df.columns[2:]:
    df[col] = df[col].str.extract(r'(\d+\.*\d*)').astype(np.float)

print df

However, I get the error

但是，我收到错误

    df[col] = df[col].str.extract(r'(\d+\.*\d*)').astype(np.float)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 2241, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/base.py", line 188, in __get__
    return self.construct_accessor(instance)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/base.py", line 528, in _make_str_accessor
    raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

This worked OK in pandas 0.14 but does not work in pandas 0.17.0.

这在 Pandas 0.14 中工作正常，但在 Pandas 0.17.0 中不起作用。

Answer 1

采纳答案by EdChum

It's happening because your last column is empty so this becomes converted to NaN:

发生这种情况是因为您的最后一列是空的，因此将其转换为NaN：

In [417]:
t="""'Name',97.7,0A,0A,65M,0A,100M,5M,75M,100M,90M,90M,99M,90M,0#,0N#,"""
df = pd.read_csv(io.StringIO(t), header=None)
df

Out[417]:
       0     1   2   3    4   5     6   7    8     9    10   11   12   13  14  \
0  'Name'  97.7  0A  0A  65M  0A  100M  5M  75M  100M  90M  90M  99M  90M  0#   

    15  16  
0  0N# NaN

If you slice your range up to the last row then it works:

如果您将范围切片到最后一行，则它可以工作：

In [421]:
for col in df.columns[2:-1]:
    df[col] = df[col].str.extract(r'(\d+\.*\d*)').astype(np.float)
df

Out[421]:
       0     1   2   3   4   5    6   7   8    9   10  11  12  13  14  15  16
0  'Name'  97.7   0   0  65   0  100   5  75  100  90  90  99  90   0   0 NaN

Alternatively you can just select the cols that are objectdtype and run the code (skipping the first col as this is the 'Name' entry):

或者，您可以只选择objectdtype 的列并运行代码（跳过第一个列，因为这是“名称”条目）：

In [428]:
for col in df.select_dtypes([np.object]).columns[1:]:
    df[col] = df[col].str.extract(r'(\d+\.*\d*)').astype(np.float)
df

Out[428]:
       0     1   2   3   4   5    6   7   8    9   10  11  12  13  14  15  16
0  'Name'  97.7   0   0  65   0  100   5  75  100  90  90  99  90   0   0 NaN

Answer 2

回答by SPRBRN

I got this error while working in Eclipse. It turned out that the project interpreter was somehow (after an update I believe) reset to Python 2.7. Setting it back to Python 3.6 resolved this issue. It all resulted in several crashes, restarts and warnings. After several minutes of troubles it seems fixed now.

我在 Eclipse 中工作时遇到此错误。事实证明，项目解释器以某种方式（我相信在更新之后）重置为 Python 2.7。将其设置回 Python 3.6 解决了这个问题。这一切都导致了几次崩溃、重启和警告。经过几分钟的麻烦，现在似乎已经解决了。

While I know this is not a solution to the problem posed here, I thought it might be useful for others, as I came to this page after searching for this error.

虽然我知道这不是这里提出的问题的解决方案，但我认为它可能对其他人有用，因为我在搜索此错误后来到此页面。

Answer 3

回答by Knowledge Elegance

In this case we have to use the str.replace()method on that series, but first we have to convert it to strtype:

在这种情况下，我们必须使用该str.replace()系列的方法，但首先我们必须将其转换为str类型：

df1.Patient = 's125','s45',s588','s244','s125','s123'
df1 = pd.read_csv("C:\Users\Gangwar\Desktop\competitions\cancer prediction\kaggle_to_students.csv")
df1.Patient = df1.Patient.astype(str)
df1['Patient'] = df1['Patient'].str.replace('s','').astype(int)

Python Pandas 错误“只能使用带有字符串值的 .str 访问器”

提问by eleanora

采纳答案by EdChum

回答by SPRBRN

回答by Knowledge Elegance

相关推荐

最近更新

标签

Python Pandas 错误“只能使用带有字符串值的 .str 访问器”

提问by eleanora

采纳答案by EdChum

回答by SPRBRN

回答by Knowledge Elegance

相关推荐

Python 导入 input_data MNIST 张量流不起作用

Python 如何在 Django 1.5 中使用“用户”作为外键

Python 在 Excel 中调整单元格宽度

Python 散布 Flask 模型时，会引发 RuntimeError: 'application not register on db'

相关推荐

最近更新

标签