错误:无法解析位置 6116 处的字符串“*” - 将对象类型转换为 Int - Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45177209/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:02:13  来源:igfitidea点击:

Error: Unable to parse string "*" at position 6116 - Convert Object Type to Int - Pandas

pythonpandasdataframetypes

提问by i.n.n.m

This question has been asked in many threads and has worked for others, but not for me. I am trying to convert objectdata type into intto perform a group by aggregation. Following are what I tried and the errors I got so far, (I am using python 3) According to this link, I tried these two:

这个问题已经在许多线程中被问到并且对其他人有用,但对我却没有。我正在尝试将object数据类型转换int为通过聚合执行组。以下是我尝试过的以及到目前为止遇到的错误(我正在使用 python 3)根据此链接,我尝试了以下两个

df['my_var'] = df['my_var'].astype(str).astype(int)
df['my_var'] = df['my_var'].astype(int)

Same error for both:

两者都有相同的错误:

ValueError: invalid literal for int() with base 10: '*'

ValueError:int() 的无效文字,基数为 10:'*'

And then I tried,

然后我尝试了

df['my_var'] = pd.to_numeric(df['my_var'])

I got an error:

我有一个错误:

ValueError: Unable to parse string "*" at position 6116

ValueError:无法解析位置 6116 处的字符串“*”

This is how dtypeslooks like,

dtypes这样,

print (df.dtypes)
my_var object
dtype: object

I know some of the similar questions are down voted, however, I did not succeed using those answers. Is it a version error? I am finding it difficult to understand this error. Any help or suggestion would be appreciated.

我知道一些类似的问题被否决了,但是,我没有成功使用这些答案。是版本错误吗?我发现很难理解这个错误。任何帮助或建议将不胜感激。

采纳答案by i.n.n.m

After getting suggestions from #DYZ and #MaxU, it was an error due to the special character *in a row in in my DataFrame. (Error message was obvious)

从#DYZ 和#MaxU 获得建议后,由于*我的 DataFrame 中一行中的特殊字符,这是一个错误。(错误信息很明显)

As suggested, using,

按照建议,使用,

df[df['my_var']=='*']

and

df.loc[pd.to_numeric(df['my_var'], errors='coerce').isnull()]

I found where exactly the special character was. Then used regular expression method to strip off special characters using this thread.

我找到了特殊字符的确切位置。然后使用正则表达式方法使用此线程剥离特殊字符。

回答by A.Kot

I used 0 to replace anything that isn't a number but you can use any other value that makes sense to you e.g. -999999 (not a suggested practice obviously but just an example)

我用 0 来替换任何不是数字的东西,但你可以使用任何其他对你有意义的值,例如 -999999(显然不是建议的做法,只是一个例子)

pd.to_numeric(df.my_var, errors='coerce').fillna(0).astype(int)