Python 将整个熊猫数据帧转换为熊猫中的整数（0.17.0）

Question

提问by Bobe Kryant

My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numericfunction only works on one series at a time and is not a good replacement for the deprecated convert_objectscommand. Is there a way to get similar results to the convert_objects(convert_numeric=True)command in the new pandas release?

我的问题与这个问题非常相似，但我需要转换我的整个数据框，而不仅仅是一个系列。该to_numeric函数一次仅适用于一个系列，不能很好地替代已弃用的convert_objects命令。有没有办法convert_objects(convert_numeric=True)在新的 pandas 版本中获得与命令类似的结果？

Thank you Mike Müller for your example. df.apply(pd.to_numeric)works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers? Example:

谢谢 Mike Müller 的例子。df.apply(pd.to_numeric)如果所有值都可以转换为整数，则效果很好。如果在我的数据框中我有无法转换为整数的字符串怎么办？例子：

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
df.dtypes
Out[59]: 
Words    object
ints     object
dtype: object

Then I could run the deprecated function and get:

然后我可以运行已弃用的函数并获得：

df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[60]: 
Words    object
ints      int64
dtype: object

Running the applycommand gives me errors, even with try and except handling.

运行该apply命令会给我错误，即使使用 try 和 except 处理也是如此。

Answer 1

采纳答案by Mike Müller

All columns convertible

所有列可转换

You can apply the function to all columns:

您可以将该函数应用于所有列：

df.apply(pd.to_numeric)

Example:

例子：

>>> df = pd.DataFrame({'a': ['1', '2'], 
                       'b': ['45.8', '73.9'],
                       'c': [10.5, 3.7]})

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null object
b    2 non-null object
c    2 non-null float64
dtypes: float64(1), object(2)
memory usage: 64.0+ bytes

>>> df.apply(pd.to_numeric).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null int64
b    2 non-null float64
c    2 non-null float64
dtypes: float64(2), int64(1)
memory usage: 64.0 bytes

Not all columns convertible

并非所有列都可以转换

pd.to_numerichas the keyword argument errors:

pd.to_numeric有关键字参数errors：

  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input

  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input

Setting it to ignorewill return the column unchanged if it cannot be converted into a numeric type.

ignore如果它不能转换为数字类型，将它设置为将返回不变的列。

As pointed out by Anton Protopopov, the most elegant way is to supply ignoreas keyword argument to apply():

正如 Anton Protopopov 所指出的，最优雅的方法是将ignore关键字参数作为关键字参数提供给apply()：

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
>>> df.apply(pd.to_numeric, errors='ignore').info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

My previously suggested way, using partialfrom the module functools, is more verbose:

我之前建议的方式，使用模块中的部分functools，更冗长：

>>> from functools import partial
>>> df = pd.DataFrame({'ints': ['3', '5'], 
                       'Words': ['Kobe', 'Bryant']})
>>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

Answer 2

回答by Alon Lavian

apply()the pd.to_numericwitherrors='ignore'andassign it back to the DataFrame:

apply()在 pd.to_numeric与errors='ignore'和分配回数据框：

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
print ("Orig: \n",df.dtypes)

df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric: \n",df.dtypes)

df = df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric with assign: \n",df.dtypes)

Output:

输出：

Orig: 
 ints     object
Words    object
dtype: object

to_numeric: 
 ints     object
Words    object
dtype: object

to_numeric with assign: 
 ints      int64
Words    object
dtype: object

Python 将整个熊猫数据帧转换为熊猫中的整数（0.17.0）

提问by Bobe Kryant

采纳答案by Mike Müller

All columns convertible

所有列可转换

Not all columns convertible

并非所有列都可以转换

回答by Alon Lavian

相关推荐

最近更新

标签

Python 将整个熊猫数据帧转换为熊猫中的整数（0.17.0）

提问by Bobe Kryant

采纳答案by Mike Müller

All columns convertible

所有列可转换

Not all columns convertible

并非所有列都可以转换

回答by Alon Lavian

相关推荐

如何在 Python 中显示列表元素的索引？

Openpyxl - 如何在 Python 中从 Excel 文件中仅读取一列？

Python 为什么 scikitlearn 说 F1 分数不明确，FN 大于 0？

在python中使用selenium获取所有href链接

相关推荐

最近更新

标签