pandas 熊猫如何在“loc”之后“替换”工作?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48314971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:04:11  来源:igfitidea点击:

Pandas how can 'replace' work after 'loc'?

pythonpandas

提问by Jonathan Zhou

I have tried many times, but seems the 'replace' can NOT work well after use 'loc'. For example I want to replace the 'conlumn_b' with an regex for the row that the 'conlumn_a' value is 'apple'.

我已经尝试了很多次,但似乎在使用“loc”后“replace”不能很好地工作。例如,我想将“conlumn_b”替换为“conlumn_a”值为“apple”的行的正则表达式。

Here is my sample code :

这是我的示例代码:

df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'].replace(r'^11*', 'XXX',inplace=True, regex=True)

Example:

例子:

conlumn_a       conlumn_b
apple           123
banana          11
apple           11
orange          33

The result that I expected for the 'df' is:

我对“df”的预期结果是:

conlumn_a       conlumn_b
apple           123
banana          11
apple           XXX
orange          33

Anyone has meet this issue that needs 'replace' with regex after 'loc' ?

任何人都遇到过这个需要在“loc”之后用正则表达式“替换”的问题?

OR you guys has some other good solutions ?

或者你们有其他一些好的解决方案?

Thank you so much for your help!

非常感谢你的帮助!

采纳答案by jezrael

I think you need filter in both sides:

我认为你需要两边过滤:

m = df['conlumn_a'] == 'apple'
df.loc[m,'conlumn_b'] = df.loc[m,'conlumn_b'].astype(str).replace(r'^(11+)','XXX',regex=True)
print (df)
  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

回答by cs95

inplace=Trueworks on the object that it was applied on.

inplace=True适用于应用它的对象。

When you call .loc, you're slicing your dataframe object to return a newone.

当您调用 时.loc,您正在对数据框对象进行切片以返回一个对象。

>>> id(df)
4587248608

And,

和,

>>> id(df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'])
4767716968

Now, calling an in-place replaceon this new slice will apply the replace operation, updating the new slice itself, and not the original.

现在,replace在这个新切片上就地调用将应用替换操作,更新新切片本身,而不是原始切片



Now, note that you're calling replaceon a column of int, and nothing is going to happen, because regular expressions work on strings.

现在,请注意您正在调用replace的列int,并且不会发生任何事情,因为正则表达式适用于字符串。

Here's what I offer you as a workaround. Don't use regex at all.

这是我为您提供的解决方法。根本不要使用正则表达式。

m = df['conlumn_a'] == 'apple'
df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b'].replace(11, 'XXX')

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Or, if you needregex based substitution, then -

或者,如果您需要基于正则表达式的替换,则 -

df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b']\
           .astype(str).replace('^11$', 'XXX', regex=True)

Although, this converts your column to an object column.

虽然,这会将您的列转换为对象列。

回答by piRSquared

I'm going to borrow from a recent answer of mine. This technique is a general purpose strategy for updating a dataframe in place:

我要借用我最近的一个回答。此技术是一种用于就地更新数据帧的通用策略:

df.update(
    df.loc[df['conlumn_a'] == 'apple', 'conlumn_b']
      .replace(r'^11$', 'XXX', regex=True)
)

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Note that all I did was remove the inplace=Trueand instead wrapped it in the pd.DataFrame.updatemethod.

请注意,我所做的只是删除了inplace=True,而是将其包装在pd.DataFrame.update方法中。