Pandas 错误只能将 .str 访问器与字符串一起使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34162036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Error can only use .str accessor with string
提问by Mxracer888
This is my code,
这是我的代码
import pandas as pd
import os
os.chdir('path\to\input\file')
xl_file = pd.ExcelFile('newcustomers.xlsx')
df = xl_file.parse('Customers Export 1', index_col='Domain', na_values=['NA'])
df = df[(df["Customer phone"].str.startswith("+1")) & (df["Customer phone"].str.len() == 13)]
print
print "now changing to final CSV output directory"
print
os.chdir('path\to\output\directory')
print "Current working dir : %s" % os.getcwd()
df.to_csv('newcustomers.csv')
Basically the column has phone numbers, and I am using this to remove incomplete numbers/blank entries, and phone numbers that don't start with +1. (the US/CA country dial code). It worked great for a week, but then I started getting this error. And I have not updated python or pandas in between.
基本上该列有电话号码,我用它来删除不完整的号码/空白条目,以及不以 +1 开头的电话号码。(美国/加拿大国家/地区拨号代码)。它工作了一个星期,但后来我开始收到这个错误。而且我还没有在两者之间更新 python 或 pandas。
raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
I am using anaconda on windows 8.1 the versions are as follows:
我在 Windows 8.1 上使用 anaconda,版本如下:
conda update conda
conda 3.18.8 py27_0 defaults
conda update anaconda
anaconda 2.4.0 np110py27_0 <unknown>
conda update pandas
pandas 0.17.1 np110py27_0 defaults
Nothing has changed between all last week up to Sunday when the code worked, then yesterday without any updates, or changes to the input file or anything it starts getting mad at me :/
从上周到周日代码工作没有任何变化,然后昨天没有任何更新,或者输入文件的更改或任何它开始生我气的东西:/
EDIT: Adding df.head(2) per @WoodChopper request
编辑:为每个@WoodChopper 请求添加 df.head(2)
Domain Customer Name Customer phone
example.com John Doe 44.xxxxxx
google.com Jane Doe 1.xxxxxx
In the original XLSX file that it is opening it does list the entire phone number with the '+' sign. But that is all that returns to CMD when I use:
在它打开的原始 XLSX 文件中,它确实用“+”号列出了整个电话号码。但这就是我使用时返回 CMD 的全部内容:
print df.head(2)
And that is just doing the xl_file variable, the df variable, and then print the above statement. I am blocking out with # the
而这只是做 xl_file 变量,df 变量,然后打印上面的语句。我用 # 挡住了
df = df[(df["Registrant phone"].str.startswith("+1")) & (df["Registrant phone"].str.len() == 13)]
EDIT x2
编辑 x2
Just to clarify, this is the code as of now
澄清一下,这是目前的代码
import pandas as pd
import os
os.chdir('path\to\input\file')
df = pd.read_excel('newcustomers.xlsx', sheetname = 'Customers Export 1')
#xl_file = pd.ExcelFile('newcustomers.xlsx')
#df = xl_file.parse('Customers Export 1', index_col='Domain', na_values=['NA'], convert_float=False)
#df.drop(df.columns[[0]], axis=1, inplace=True)
#print df.head(2)
#print (df["Registrant phone"])
df = df[(df["Registrant phone"].str.startswith("+1")) & (df["Registrant phone"].str.len() == 13)]
print
print "now changing to final CSV output directory"
print
os.chdir('path\to\output\directory')
print "Current working dir : %s" % os.getcwd()
df.to_csv('newcustomers.csv')
Still returns all the same results. Just to make sure we're not chasing down the wrong rabbit, Here is the exact error(imgur).
仍然返回所有相同的结果。只是为了确保我们没有追错兔子,这是确切的错误(imgur)。
Could this be something outside the code? Pandas, conda, and anaconda are up to date. Is there another library that Pandas is dependent on that could have gone out of date (which wouldn't entirely make sense since everything worked one day and the next it didn't)?
这可能是代码之外的东西吗?Pandas、conda 和 anaconda 是最新的。是否有另一个 Pandas 依赖的库可能已经过时(这并不完全有意义,因为前一天一切正常,下一天却没有)?
回答by WoodChopper
Basically phone number are parsed as float
, but for your code to work it needs to be string
.
基本上电话号码被解析为float
,但要使您的代码正常工作,它需要是string
。
Make convert_float as false:
将 convert_float 设为 false:
df = xl_file.parse('Customers Export 1', index_col='Domain',
na_values=['NA'], convert_float=False)
Update
更新
df = pd.read_excel('file.xlsx', sheetname = 'sheet 1')