Pandas 错误只能将 .str 访问器与字符串一起使用

Question

提问by Mxracer888

This is my code,

这是我的代码

import pandas as pd
import os

os.chdir('path\to\input\file')

xl_file = pd.ExcelFile('newcustomers.xlsx')
df = xl_file.parse('Customers Export 1', index_col='Domain', na_values=['NA'])

df = df[(df["Customer phone"].str.startswith("+1")) & (df["Customer phone"].str.len() == 13)]

print
print "now changing to final CSV output directory"
print

os.chdir('path\to\output\directory')

print "Current working dir : %s" % os.getcwd()

df.to_csv('newcustomers.csv')

Basically the column has phone numbers, and I am using this to remove incomplete numbers/blank entries, and phone numbers that don't start with +1. (the US/CA country dial code). It worked great for a week, but then I started getting this error. And I have not updated python or pandas in between.

基本上该列有电话号码，我用它来删除不完整的号码/空白条目，以及不以 +1 开头的电话号码。（美国/加拿大国家/地区拨号代码）。它工作了一个星期，但后来我开始收到这个错误。而且我还没有在两者之间更新 python 或 pandas。

raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

I am using anaconda on windows 8.1 the versions are as follows:

我在 Windows 8.1 上使用 anaconda，版本如下：

conda update conda
    conda 3.18.8 py27_0 defaults
conda update anaconda
    anaconda 2.4.0 np110py27_0 <unknown>
conda update pandas
    pandas 0.17.1 np110py27_0 defaults

Nothing has changed between all last week up to Sunday when the code worked, then yesterday without any updates, or changes to the input file or anything it starts getting mad at me :/

从上周到周日代码工作没有任何变化，然后昨天没有任何更新，或者输入文件的更改或任何它开始生我气的东西：/

EDIT: Adding df.head(2) per @WoodChopper request

编辑：为每个@WoodChopper 请求添加 df.head(2)

Domain        Customer Name    Customer phone
example.com   John Doe         44.xxxxxx
google.com    Jane Doe         1.xxxxxx

In the original XLSX file that it is opening it does list the entire phone number with the '+' sign. But that is all that returns to CMD when I use:

在它打开的原始 XLSX 文件中，它确实用“+”号列出了整个电话号码。但这就是我使用时返回 CMD 的全部内容：

print df.head(2)

And that is just doing the xl_file variable, the df variable, and then print the above statement. I am blocking out with # the

而这只是做 xl_file 变量，df 变量，然后打印上面的语句。我用 # 挡住了

df = df[(df["Registrant phone"].str.startswith("+1")) & (df["Registrant phone"].str.len() == 13)]

EDIT x2

编辑 x2

Just to clarify, this is the code as of now

澄清一下，这是目前的代码

import pandas as pd
import os

os.chdir('path\to\input\file')

df = pd.read_excel('newcustomers.xlsx', sheetname = 'Customers Export 1')

#xl_file = pd.ExcelFile('newcustomers.xlsx')
#df = xl_file.parse('Customers Export 1', index_col='Domain', na_values=['NA'], convert_float=False)
#df.drop(df.columns[[0]], axis=1, inplace=True)

#print df.head(2)
#print (df["Registrant phone"])
df = df[(df["Registrant phone"].str.startswith("+1")) & (df["Registrant phone"].str.len() == 13)]

print
print "now changing to final CSV output directory"
print

os.chdir('path\to\output\directory')

print "Current working dir : %s" % os.getcwd()

df.to_csv('newcustomers.csv')

Still returns all the same results. Just to make sure we're not chasing down the wrong rabbit, Here is the exact error(imgur).

仍然返回所有相同的结果。只是为了确保我们没有追错兔子，这是确切的错误(imgur)。

Could this be something outside the code? Pandas, conda, and anaconda are up to date. Is there another library that Pandas is dependent on that could have gone out of date (which wouldn't entirely make sense since everything worked one day and the next it didn't)?

这可能是代码之外的东西吗？Pandas、conda 和 anaconda 是最新的。是否有另一个 Pandas 依赖的库可能已经过时（这并不完全有意义，因为前一天一切正常，下一天却没有）？

Answer 1

回答by WoodChopper

Basically phone number are parsed as float, but for your code to work it needs to be string.

基本上电话号码被解析为float，但要使您的代码正常工作，它需要是string。

Make convert_float as false:

将 convert_float 设为 false：

df = xl_file.parse('Customers Export 1', index_col='Domain',
                                 na_values=['NA'], convert_float=False)

Update

更新

df = pd.read_excel('file.xlsx', sheetname = 'sheet 1')

Pandas 错误只能将 .str 访问器与字符串一起使用

提问by Mxracer888

回答by WoodChopper

相关推荐

最近更新

标签

Pandas 错误只能将 .str 访问器与字符串一起使用

提问by Mxracer888

回答by WoodChopper

相关推荐

Excel 到 Pandas DataFrame 使用第一列作为索引

Pandas .fillna() 不填充 Python 3 中 DataFrame 中的值

在 Pandas DataFrame 中重新排序 MultiIndex 的级别

索引是否使 Slice of Pandas 数据帧更快？

相关推荐

最近更新

标签