pandas 如何将字符串转换为整数熊猫

Question

提问by Akash K

I have dataset with many columns. I want to take average of each column by grouping them with 'Club'

我有很多列的数据集。我想通过将它们与“俱乐部”分组来取每列的平均值

My data is in the form of string and some of the data is in the form '60+2' or '58-1'

我的数据是字符串的形式，有些数据是'60+2'或'58-1'的形式

I want to convert this string datatype into integer so that i can use them for calculating mean.

我想将此字符串数据类型转换为整数，以便我可以使用它们来计算平均值。

As i have searched and need re to skip '+ & -'. .str.split is used to split the string.

正如我已经搜索过的那样，需要重新跳过“+ & -”。.str.split 用于拆分字符串。

In my case pd.to_numeric wil not work as I want to edit bulk column at once

在我的情况下 pd.to_numeric 将不起作用，因为我想一次编辑批量列

    complete_dataset is dataframe

    cols = [i for i in complete_dataset.columns if i not in ['Name','Club', 'Nationality', 'Age', 'Overall', 'Potential', 'Special']]
    for col in cols:
        col = complete_dataset[col] #.str.split('+').astype(int)
        print(col)
        for x in col:
            value = x
            print(value)
    #     df[col]=pd.to_numeric(df[col])

This is giving output as

这是给输出

    Name: Acceleration, dtype: object
    89
    92-4
    94
    88-6
    58+2
    79
    70+9
    76
    94

Also I want to use this data for further calculations

我也想用这个数据做进一步的计算

Thank you

谢谢

Answer 1

回答by m_____z

Since your'e using Pandas, I would recommend using pandas.eval()instead of Python's eval()method as correctly pointed out by Coldspeed (Thanks!).

由于您使用的是 Pandas，我建议您使用Coldspeed 正确指出的pandas.eval()而不是 Python 的eval()方法（谢谢！）。

The advantage of using pandas.eval()is that it only evaluates Python expressions and not Python statements, therefore is much safer and compared to the ast.literal_eval()method (link to the documentation) should also run a little bit faster .

使用的好处pandas.eval()是它只计算 Python 表达式而不是 Python 语句，因此更安全，并且与ast.literal_eval()方法相比（链接到文档）也应该运行得更快一点。

Concretely, you can amend your code to do the following:

具体来说，您可以修改您的代码以执行以下操作：

import pandas

# Your code to read in the DataFrame goes in here.

complete_dataset['Acceleration'] = pandas.eval(complete_dataset['Acceleration'])

This evaluates all expressions that are stored in the column called Acceleration in df. The method should perform much faster and the output will the an integer or float depending on the expressions stored in the column.

这将计算存储在名为 Acceleration in 的列中的所有表达式df。该方法应该执行得更快，输出将是一个整数或浮点数，具体取决于存储在列中的表达式。

For more details, please take a look at the Pandas documentation.

有关更多详细信息，请查看Pandas 文档。

Answer 2

回答by cs95

I want to remove symbols and add or subtract the numbers as per symbol and convert the string into integer.

我想删除符号并根据符号添加或减去数字并将字符串转换为整数。

`ast.literal_eval`

Alright, one good way of doing this is using python's safe eval - ast.literal_eval.

好的，这样做的一种好方法是使用 python 的安全 eval - ast.literal_eval。

import ast

df.Acceleration = df.Acceleration.apply(ast.literal_eval)
df

   Acceleration
0            89
1            88
2            94
3            82
4            60
5            79
6            79
7            76
8            94

df.Acceleration.dtype
dtype('int64')

literal_evalevaluates only certain string expressions. Assuming you have a column of strings with expressions that can be evaluated, this will evaluate them and return numeric results.

literal_eval仅计算某些字符串表达式。假设您有一列带有可以计算的表达式的字符串，这将计算它们并返回数字结果。

Note that if you have a column of mixed integers and strings, the simplest thing to do would be to convert the entire column to string and apply literal_eval.

请注意，如果您有一列混合整数和字符串，最简单的做法是将整个列转换为字符串并应用literal_eval。

df['Acceleration'] = df.Acceleration.astype(str).apply(ast.literal_eval)

`pd.eval`/`df.eval`

Another good way of doing this is using pandas' safe eval - pd.Series.eval, as mentioned by this answer.

这样做的另一个好方法是使用pandas“安全 eval-” pd.Series.eval，如本答案所述。

df.Acceleration = df.eval(df.Acceleration)
df

   Acceleration
0            89
1            88
2            94
3            82
4            60
5            79
6            79
7            76
8            94

Handling Malformed Data

处理格式错误的数据

On the off chance that your data contains invalid strings, a slightly different solution is needed, because everything mentioned above is going to fail. We'll need to define a function that handles these errors accordingly.

如果您的数据包含无效字符串，则需要稍微不同的解决方案，因为上面提到的所有内容都会失败。我们需要定义一个函数来相应地处理这些错误。

def parse(x):
    try:
        return ast.literal_eval(x) # pd.eval(x)
    except ValueError:
        return np.nan

df.Acceleration = df.Acceleration.apply(parse)
df

   Acceleration
0            89
1            88
2            94
3            82
4            60
5            79
6            79
7            76
8            94

pandas 如何将字符串转换为整数熊猫

提问by Akash K

回答by m_____z

回答by cs95

`ast.literal_eval`

`ast.literal_eval`

`pd.eval`/`df.eval`

`pd.eval`/`df.eval`

Handling Malformed Data

处理格式错误的数据

相关推荐

最近更新

标签

pandas 如何将字符串转换为整数熊猫

提问by Akash K

回答by m_____z

回答by cs95

ast.literal_eval

ast.literal_eval

pd.eval/df.eval

pd.eval/df.eval

Handling Malformed Data

处理格式错误的数据

相关推荐

pandas ValueError：在预处理数据时，输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值

pandas read_excel(sheet name = None) 返回一个字符串字典，而不是数据框？

PANDAS 按唯一值行将数据帧拆分为多个

pandas AttributeError: 'Series' 对象没有属性 'notna'

相关推荐

最近更新

标签

`ast.literal_eval`

`ast.literal_eval`

`pd.eval`/`df.eval`

`pd.eval`/`df.eval`