pandas 熊猫如何在“loc”之后“替换”工作？

Question

提问by Jonathan Zhou

I have tried many times, but seems the 'replace' can NOT work well after use 'loc'. For example I want to replace the 'conlumn_b' with an regex for the row that the 'conlumn_a' value is 'apple'.

我已经尝试了很多次，但似乎在使用“loc”后“replace”不能很好地工作。例如，我想将“conlumn_b”替换为“conlumn_a”值为“apple”的行的正则表达式。

Here is my sample code :

这是我的示例代码：

df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'].replace(r'^11*', 'XXX',inplace=True, regex=True)

Example:

例子：

conlumn_a       conlumn_b
apple           123
banana          11
apple           11
orange          33

The result that I expected for the 'df' is:

我对“df”的预期结果是：

conlumn_a       conlumn_b
apple           123
banana          11
apple           XXX
orange          33

Anyone has meet this issue that needs 'replace' with regex after 'loc' ?

任何人都遇到过这个需要在“loc”之后用正则表达式“替换”的问题？

OR you guys has some other good solutions ?

或者你们有其他一些好的解决方案？

Thank you so much for your help!

非常感谢你的帮助！

Answer 1

采纳答案by jezrael

I think you need filter in both sides:

我认为你需要两边过滤：

m = df['conlumn_a'] == 'apple'
df.loc[m,'conlumn_b'] = df.loc[m,'conlumn_b'].astype(str).replace(r'^(11+)','XXX',regex=True)
print (df)
  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Answer 2

回答by cs95

inplace=Trueworks on the object that it was applied on.

inplace=True适用于应用它的对象。

When you call .loc, you're slicing your dataframe object to return a newone.

当您调用时.loc，您正在对数据框对象进行切片以返回一个新对象。

>>> id(df)
4587248608

And,

和，

>>> id(df.loc[df['conlumn_a'] == 'apple', 'conlumn_b'])
4767716968

Now, calling an in-place replaceon this new slice will apply the replace operation, updating the new slice itself, and not the original.

现在，replace在这个新切片上就地调用将应用替换操作，更新新切片本身，而不是原始切片。

Now, note that you're calling replaceon a column of int, and nothing is going to happen, because regular expressions work on strings.

现在，请注意您正在调用replace的列int，并且不会发生任何事情，因为正则表达式适用于字符串。

Here's what I offer you as a workaround. Don't use regex at all.

这是我为您提供的解决方法。根本不要使用正则表达式。

m = df['conlumn_a'] == 'apple'
df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b'].replace(11, 'XXX')

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Or, if you needregex based substitution, then -

或者，如果您需要基于正则表达式的替换，则 -

df.loc[m, 'conlumn_b'] = df.loc[m, 'conlumn_b']\
           .astype(str).replace('^11$', 'XXX', regex=True)

Although, this converts your column to an object column.

虽然，这会将您的列转换为对象列。

Answer 3

回答by piRSquared

I'm going to borrow from a recent answer of mine. This technique is a general purpose strategy for updating a dataframe in place:

我要借用我最近的一个回答。此技术是一种用于就地更新数据帧的通用策略：

df.update(
    df.loc[df['conlumn_a'] == 'apple', 'conlumn_b']
      .replace(r'^11$', 'XXX', regex=True)
)

df

  conlumn_a conlumn_b
0     apple       123
1    banana        11
2     apple       XXX
3    orange        33

Note that all I did was remove the inplace=Trueand instead wrapped it in the pd.DataFrame.updatemethod.

请注意，我所做的只是删除了inplace=True，而是将其包装在pd.DataFrame.update方法中。

pandas 熊猫如何在“loc”之后“替换”工作？

提问by Jonathan Zhou

采纳答案by jezrael

回答by cs95

回答by piRSquared

相关推荐

最近更新

标签

pandas 熊猫如何在“loc”之后“替换”工作？

提问by Jonathan Zhou

采纳答案by jezrael

回答by cs95

回答by piRSquared

相关推荐

使用 if-else 创建新列时的 Pandas 错误：Series 的真值不明确

pandas 熊猫数据帧日期时间到时间然后到秒

获取多列的唯一值作为 Pandas 中的新数据框

pandas 相关矩阵图，一侧是系数，另一侧是散点图，对角线上是分布

相关推荐

最近更新

标签