Python 将整个熊猫数据帧转换为熊猫中的整数(0.17.0)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34844711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
convert entire pandas dataframe to integers in pandas (0.17.0)
提问by Bobe Kryant
My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numeric
function only works on one series at a time and is not a good replacement for the deprecated convert_objects
command. Is there a way to get similar results to the convert_objects(convert_numeric=True)
command in the new pandas release?
我的问题与这个问题非常相似,但我需要转换我的整个数据框,而不仅仅是一个系列。该to_numeric
函数一次仅适用于一个系列,不能很好地替代已弃用的convert_objects
命令。有没有办法convert_objects(convert_numeric=True)
在新的 pandas 版本中获得与命令类似的结果?
Thank you Mike Müller for your example. df.apply(pd.to_numeric)
works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers?
Example:
谢谢 Mike Müller 的例子。df.apply(pd.to_numeric)
如果所有值都可以转换为整数,则效果很好。如果在我的数据框中我有无法转换为整数的字符串怎么办?例子:
df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
df.dtypes
Out[59]:
Words object
ints object
dtype: object
Then I could run the deprecated function and get:
然后我可以运行已弃用的函数并获得:
df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[60]:
Words object
ints int64
dtype: object
Running the apply
command gives me errors, even with try and except handling.
运行该apply
命令会给我错误,即使使用 try 和 except 处理也是如此。
采纳答案by Mike Müller
All columns convertible
所有列可转换
You can apply the function to all columns:
您可以将该函数应用于所有列:
df.apply(pd.to_numeric)
Example:
例子:
>>> df = pd.DataFrame({'a': ['1', '2'],
'b': ['45.8', '73.9'],
'c': [10.5, 3.7]})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a 2 non-null object
b 2 non-null object
c 2 non-null float64
dtypes: float64(1), object(2)
memory usage: 64.0+ bytes
>>> df.apply(pd.to_numeric).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a 2 non-null int64
b 2 non-null float64
c 2 non-null float64
dtypes: float64(2), int64(1)
memory usage: 64.0 bytes
Not all columns convertible
并非所有列都可以转换
pd.to_numeric
has the keyword argument errors
:
pd.to_numeric
有关键字参数errors
:
Signature: pd.to_numeric(arg, errors='raise') Docstring: Convert argument to a numeric type. Parameters ---------- arg : list, tuple or array of objects, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise' - If 'raise', then invalid parsing will raise an exception - If 'coerce', then invalid parsing will be set as NaN - If 'ignore', then invalid parsing will return the input
Signature: pd.to_numeric(arg, errors='raise') Docstring: Convert argument to a numeric type. Parameters ---------- arg : list, tuple or array of objects, or Series errors : {'ignore', 'raise', 'coerce'}, default 'raise' - If 'raise', then invalid parsing will raise an exception - If 'coerce', then invalid parsing will be set as NaN - If 'ignore', then invalid parsing will return the input
Setting it to ignore
will return the column unchanged if it cannot be converted into a numeric type.
ignore
如果它不能转换为数字类型,将它设置为将返回不变的列。
As pointed out by Anton Protopopov, the most elegant way is to supply ignore
as keyword argument to apply()
:
正如 Anton Protopopov 所指出的,最优雅的方法是将ignore
关键字参数作为关键字参数提供给apply()
:
>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
>>> df.apply(pd.to_numeric, errors='ignore').info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words 2 non-null object
ints 2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes
My previously suggested way, using partialfrom the module functools
, is more verbose:
我之前建议的方式,使用模块中的部分functools
,更冗长:
>>> from functools import partial
>>> df = pd.DataFrame({'ints': ['3', '5'],
'Words': ['Kobe', 'Bryant']})
>>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words 2 non-null object
ints 2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes
回答by Alon Lavian
apply()
the pd.to_numeric
witherrors='ignore'
andassign it back to the DataFrame:
apply()
在 pd.to_numeric
与errors='ignore'
和分配回数据框:
df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
print ("Orig: \n",df.dtypes)
df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric: \n",df.dtypes)
df = df.apply(pd.to_numeric, errors='ignore')
print ("\nto_numeric with assign: \n",df.dtypes)
Output:
输出:
Orig:
ints object
Words object
dtype: object
to_numeric:
ints object
Words object
dtype: object
to_numeric with assign:
ints int64
Words object
dtype: object