pandas 如何在熊猫数据框中尽可能用 0 替换空单元格并将字符串更改为整数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40531255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace empty cells with 0 and change strings to integers where possible in a pandas dataframe?
提问by RF_PY
I have a dataframe with a 3000+ columns. Many cells in the dataframe are empty strings (' '). Also, I have a lot of numerical values that are are strings but should actually be integers. I wrote two functions to fill all the empty cells with a 0 and where possible change the value to an integer, but when I run them nothing changes to my dataframe. The functions:
我有一个包含 3000 多列的数据框。数据框中的许多单元格都是空字符串 (' ')。另外,我有很多是字符串但实际上应该是整数的数值。我编写了两个函数来用 0 填充所有空单元格,并在可能的情况下将值更改为整数,但是当我运行它们时,我的数据帧没有任何变化。功能:
def recode_empty_cells(dataframe, list_of_columns):
for column in list_of_columns:
dataframe[column].replace(r'\s+', np.nan, regex=True)
dataframe[column].fillna(0)
return dataframe
def change_string_to_int(dataframe, list_of_columns):
dataframe = recode_empty_cells(dataframe, list_of_columns)
for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass
return dataframe
Note: I'm using a try/except statement because some columns contain text in some form. Thanks in advance for your help.
注意:我使用 try/except 语句是因为某些列包含某种形式的文本。在此先感谢您的帮助。
Edit:
编辑:
Thanks to your help I got the first part working. All the empty cells have 0s now. This is my code at this moment:
感谢您的帮助,我完成了第一部分工作。现在所有的空单元格都有 0。这是我此时的代码:
def recode_empty_cells(dataframe, list_of_columns):
for column in list_of_columns:
dataframe[column] = dataframe[column].replace(r'\s+', 0, regex=True)
return dataframe
def change_string_to_int(dataframe, list_of_columns):
dataframe = recode_empty_cells(dataframe, list_of_columns)
for column in list_of_columns:
try:
dataframe[column] = dataframe[column].astype(int)
except ValueError:
pass
return dataframe
However, this gives me the following error: OverflowError: Python int too large to convert to C long
但是,这给了我以下错误: OverflowError: Python int too large to convert to C long
采纳答案by Steven G
you are not saving your change in your function:
您没有在函数中保存更改:
def recode_empty_cells(dataframe, list_of_columns):
for column in list_of_columns:
dataframe[column] = dataframe[column].replace(r'\s+', np.nan, regex=True)
dataframe[column] = dataframe[column].fillna(0)
return dataframe