Python 将整个熊猫数据帧转换为熊猫中的整数(0.17.0)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34844711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:36:05  来源:igfitidea点击:

convert entire pandas dataframe to integers in pandas (0.17.0)

pythonpandas

提问by Bobe Kryant

My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numericfunction only works on one series at a time and is not a good replacement for the deprecated convert_objectscommand. Is there a way to get similar results to the convert_objects(convert_numeric=True)command in the new pandas release?

我的问题与这个问题非常相似,但我需要转换我的整个数据框,而不仅仅是一个系列。该to_numeric函数一次仅适用于一个系列,不能很好地替代已弃用的convert_objects命令。有没有办法convert_objects(convert_numeric=True)在新的 pandas 版本中获得与命令类似的结果?

Thank you Mike Müller for your example. df.apply(pd.to_numeric)works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers? Example:

谢谢 Mike Müller 的例子。df.apply(pd.to_numeric)如果所有值都可以转换为整数,则效果很好。如果在我的数据框中我有无法转换为整数的字符串怎么办?例子:

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
df.dtypes
Out[59]: 
Words    object
ints     object
dtype: object

Then I could run the deprecated function and get:

然后我可以运行已弃用的函数并获得:

df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[60]: 
Words    object
ints      int64
dtype: object

Running the applycommand gives me errors, even with try and except handling.

运行该apply命令会给我错误,即使使用 try 和 except 处理也是如此。

采纳答案by Mike Müller

All columns convertible

所有列可转换

You can apply the function to all columns:

您可以将该函数应用于所有列:

df.apply(pd.to_numeric)

Example:

例子:

>>> df = pd.DataFrame({'a': ['1', '2'], 
                       'b': ['45.8', '73.9'],
                       'c': [10.5, 3.7]})

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null object
b    2 non-null object
c    2 non-null float64
dtypes: float64(1), object(2)
memory usage: 64.0+ bytes

>>> df.apply(pd.to_numeric).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null int64
b    2 non-null float64
c    2 non-null float64
dtypes: float64(2), int64(1)
memory usage: 64.0 bytes

Not all columns convertible

并非所有列都可以转换

pd.to_numerichas the keyword argument errors:

pd.to_numeric有关键字参数errors

  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input
  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input

Setting it to ignorewill return the column unchanged if it cannot be converted into a numeric type.

ignore如果它不能转换为数字类型,将它设置为将返回不变的列。

As pointed out by Anton Protopopov, the most elegant way is to supply ignoreas keyword argument to apply():

正如 Anton Protopopov 所指出的,最优雅的方法是将ignore关键字参数作为关键字参数提供给apply()

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
>>> df.apply(pd.to_numeric, errors='ignore').info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

My previously suggested way, using partialfrom the module functools, is more verbose:

我之前建议的方式,使用模块中的部分functools,更冗长:

>>> from functools import partial
>>> df = pd.DataFrame({'ints': ['3', '5'], 
                       'Words': ['Kobe', 'Bryant']})
>>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

回答by Alon Lavian

apply()the pd.to_numericwitherrors='ignore'andassign it back to the DataFrame:

apply()pd.to_numericerrors='ignore'分配回数据框:

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
print ("Orig: \n",df.dtypes)

df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric: \n",df.dtypes)

df = df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric with assign: \n",df.dtypes)

Output:

输出:

Orig: 
 ints     object
Words    object
dtype: object

to_numeric: 
 ints     object
Words    object
dtype: object

to_numeric with assign: 
 ints      int64
Words    object
dtype: object