pandas 如何在熊猫数据框中尽可能用 0 替换空单元格并将字符串更改为整数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40531255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:24:42  来源:igfitidea点击:

How to replace empty cells with 0 and change strings to integers where possible in a pandas dataframe?

pythonpandas

提问by RF_PY

I have a dataframe with a 3000+ columns. Many cells in the dataframe are empty strings (' '). Also, I have a lot of numerical values that are are strings but should actually be integers. I wrote two functions to fill all the empty cells with a 0 and where possible change the value to an integer, but when I run them nothing changes to my dataframe. The functions:

我有一个包含 3000 多列的数据框。数据框中的许多单元格都是空字符串 (' ')。另外,我有很多是字符串但实际上应该是整数的数值。我编写了两个函数来用 0 填充所有空单元格,并在可能的情况下将值更改为整数,但是当我运行它们时,我的数据帧没有任何变化。功能:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
        dataframe[column].replace(r'\s+', np.nan, regex=True)
        dataframe[column].fillna(0)

    return dataframe

def change_string_to_int(dataframe, list_of_columns):

    dataframe = recode_empty_cells(dataframe, list_of_columns)

    for column in list_of_columns:
        try:
            dataframe[column] = dataframe[column].astype(int)
        except ValueError:
            pass

    return dataframe

Note: I'm using a try/except statement because some columns contain text in some form. Thanks in advance for your help.

注意:我使用 try/except 语句是因为某些列包含某种形式的文本。在此先感谢您的帮助。

Edit:

编辑:

Thanks to your help I got the first part working. All the empty cells have 0s now. This is my code at this moment:

感谢您的帮助,我完成了第一部分工作。现在所有的空单元格都有 0。这是我此时的代码:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
        dataframe[column] = dataframe[column].replace(r'\s+', 0, regex=True)

    return dataframe

def change_string_to_int(dataframe, list_of_columns):

    dataframe = recode_empty_cells(dataframe, list_of_columns)

    for column in list_of_columns:
        try:
            dataframe[column] = dataframe[column].astype(int)
        except ValueError:
            pass

    return dataframe

However, this gives me the following error: OverflowError: Python int too large to convert to C long

但是,这给了我以下错误: OverflowError: Python int too large to convert to C long

采纳答案by Steven G

you are not saving your change in your function:

您没有在函数中保存更改:

def recode_empty_cells(dataframe, list_of_columns):

    for column in list_of_columns:
      dataframe[column] = dataframe[column].replace(r'\s+', np.nan, regex=True)
      dataframe[column] = dataframe[column].fillna(0)

    return dataframe

回答by piRSquared

consider the df

考虑 df

df = pd.DataFrame(dict(A=['2', 'hello'], B=['', '3']))
df

enter image description here

在此处输入图片说明



apply

apply

def convert_fill(df):
    return df.stack().apply(pd.to_numeric, errors='ignore').fillna(0).unstack()

convert_fill(df)

enter image description here

在此处输入图片说明