Pandas - 如何在 DataFrame 系列中用零值替换字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33440234/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:07:40  来源:igfitidea点击:

Pandas - How to replace string with zero values in a DataFrame series?

pythonpandasdataframe

提问by Steve Maughan

I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

我正在将一些 csv 数据导入 Pandas DataFrame(在 Python 中)。一个系列意味着所有数值。但是,它还包含一些以字符串表示的虚假“$-”元素。这些是从以前的格式中遗留下来的。如果我只是导入该系列,Pandas 会将其报告为一系列“对象”。

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?

用零替换这些“$-”字符串的最佳方法是什么?或者更一般地说,如何用数值替换系列中的所有字符串(主要是数字),并将系列转换为浮点类型?

  • Steve
  • 史蒂夫

采纳答案by hellpanderr

Use Series.str.replaceand Series.astype

使用Series.str.replaceSeries.astype

df = pd.Series(['2$-32$-4','123$-12','00123','44'])

df = pd.Series(['2$-32$-4','123$-12','00123','44'])

df.str.replace(r'\$-','0').astype(float)

df.str.replace(r'\$-','0').astype(float)

0    203204
1    123012
2       123
3        44
dtype: float64

回答by tmdavison

You can use the convert_objectsmethod of the DataFrame, with convert_numeric=Trueto change the strings to NaNs

您可以使用, with的convert_objects方法将字符串更改为DataFrameconvert_numeric=TrueNaNs

From the docs:

从文档:

convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.

convert_numeric:如果为 True,则尝试强制转换为数字(包括字符串),不可转换的值变为 NaN。

In [17]: df
Out[17]: 
    a   b  c
0  1.  2.  4
1  sd  2.  4
2  1.  fg  5

In [18]: df2 = df.convert_objects(convert_numeric=True)

In [19]: df2
Out[19]: 
    a   b  c
0   1   2  4
1 NaN   2  4
2   1 NaN  5

Finally, if you want to convert those NaNsto 0's, you can use df.replace

最后,如果你想将它们转换NaNs0's,你可以使用df.replace

In [20]: df2.replace('NaN',0)
Out[20]: 
   a  b  c
0  1  2  4
1  0  2  4
2  1  0  5