pandas 如何将字符串转换为整数熊猫
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47927371/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert string into integer pandas
提问by Akash K
I have dataset with many columns. I want to take average of each column by grouping them with 'Club'
我有很多列的数据集。我想通过将它们与“俱乐部”分组来取每列的平均值
My data is in the form of string and some of the data is in the form '60+2' or '58-1'
我的数据是字符串的形式,有些数据是'60+2'或'58-1'的形式
I want to convert this string datatype into integer so that i can use them for calculating mean.
我想将此字符串数据类型转换为整数,以便我可以使用它们来计算平均值。
As i have searched and need re to skip '+ & -'. .str.split is used to split the string.
正如我已经搜索过的那样,需要重新跳过“+ & -”。.str.split 用于拆分字符串。
In my case pd.to_numeric wil not work as I want to edit bulk column at once
在我的情况下 pd.to_numeric 将不起作用,因为我想一次编辑批量列
complete_dataset is dataframe
cols = [i for i in complete_dataset.columns if i not in ['Name','Club', 'Nationality', 'Age', 'Overall', 'Potential', 'Special']]
for col in cols:
col = complete_dataset[col] #.str.split('+').astype(int)
print(col)
for x in col:
value = x
print(value)
# df[col]=pd.to_numeric(df[col])
This is giving output as
这是给输出
Name: Acceleration, dtype: object
89
92-4
94
88-6
58+2
79
70+9
76
94
Also I want to use this data for further calculations
我也想用这个数据做进一步的计算
Thank you
谢谢
回答by m_____z
Since your'e using Pandas, I would recommend using pandas.eval()
instead of Python's eval()
method as correctly pointed out by Coldspeed (Thanks!).
由于您使用的是 Pandas,我建议您使用Coldspeed 正确指出的pandas.eval()
而不是 Python 的eval()
方法(谢谢!)。
The advantage of using pandas.eval()
is that it only evaluates Python expressions and not Python statements, therefore is much safer and compared to the ast.literal_eval()
method (link to the documentation) should also run a little bit faster .
使用的好处pandas.eval()
是它只计算 Python 表达式而不是 Python 语句,因此更安全,并且与ast.literal_eval()
方法相比(链接到文档)也应该运行得更快一点。
Concretely, you can amend your code to do the following:
具体来说,您可以修改您的代码以执行以下操作:
import pandas
# Your code to read in the DataFrame goes in here.
complete_dataset['Acceleration'] = pandas.eval(complete_dataset['Acceleration'])
This evaluates all expressions that are stored in the column called Acceleration in df
. The method should perform much faster and the output will the an integer or float depending on the expressions stored in the column.
这将计算存储在名为 Acceleration in 的列中的所有表达式df
。该方法应该执行得更快,输出将是一个整数或浮点数,具体取决于存储在列中的表达式。
For more details, please take a look at the Pandas documentation.
有关更多详细信息,请查看Pandas 文档。
回答by cs95
I want to remove symbols and add or subtract the numbers as per symbol and convert the string into integer.
我想删除符号并根据符号添加或减去数字并将字符串转换为整数。
ast.literal_eval
ast.literal_eval
Alright, one good way of doing this is using python's safe eval - ast.literal_eval
.
好的,这样做的一种好方法是使用 python 的安全 eval - ast.literal_eval
。
import ast
df.Acceleration = df.Acceleration.apply(ast.literal_eval)
df
Acceleration
0 89
1 88
2 94
3 82
4 60
5 79
6 79
7 76
8 94
df.Acceleration.dtype
dtype('int64')
literal_eval
evaluates only certain string expressions. Assuming you have a column of strings with expressions that can be evaluated, this will evaluate them and return numeric results.
literal_eval
仅计算某些字符串表达式。假设您有一列带有可以计算的表达式的字符串,这将计算它们并返回数字结果。
Note that if you have a column of mixed integers and strings, the simplest thing to do would be to convert the entire column to string and apply literal_eval
.
请注意,如果您有一列混合整数和字符串,最简单的做法是将整个列转换为字符串并应用literal_eval
。
df['Acceleration'] = df.Acceleration.astype(str).apply(ast.literal_eval)
pd.eval
/df.eval
pd.eval
/df.eval
Another good way of doing this is using pandas
' safe eval - pd.Series.eval
, as mentioned by this answer.
这样做的另一个好方法是使用pandas
“安全 eval-” pd.Series.eval
,如本答案所述。
df.Acceleration = df.eval(df.Acceleration)
df
Acceleration
0 89
1 88
2 94
3 82
4 60
5 79
6 79
7 76
8 94
Handling Malformed Data
处理格式错误的数据
On the off chance that your data contains invalid strings, a slightly different solution is needed, because everything mentioned above is going to fail. We'll need to define a function that handles these errors accordingly.
如果您的数据包含无效字符串,则需要稍微不同的解决方案,因为上面提到的所有内容都会失败。我们需要定义一个函数来相应地处理这些错误。
def parse(x):
try:
return ast.literal_eval(x) # pd.eval(x)
except ValueError:
return np.nan
df.Acceleration = df.Acceleration.apply(parse)
df
Acceleration
0 89
1 88
2 94
3 82
4 60
5 79
6 79
7 76
8 94