pandas 如何删除数据框列中的字符串子串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38706813/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:43:23  来源:igfitidea点击:

How to Remove a Substring of String in a Dataframe Column?

pythonregexstringpandasdataframe

提问by MEhsan

I have this simplified dataframe:

我有这个简化的数据框:

ID, Date
1 8/24/1995
2 8/1/1899 :00

How can I use the power of pandas to recognize any date in the dataframe which has extra :00and removes it.

我怎样才能使用Pandas的力量来识别数据框中的任何日期,它有额外的:00并删除它。

Any idea how to solve this problem?

知道如何解决这个问题吗?

I have tried this syntax but did not help:

我试过这种语法但没有帮助:

df[df["Date"].str.replace(to_replace="\s:00", value="")]

The Output Should Be Like:

输出应该是这样的:

ID, Date
1 8/24/1995
2 8/1/1899

回答by Psidom

You need to assign the trimmed column back to the original column instead of doing subsetting, and also the str.replacemethod doesn't seem to have the to_replaceand valueparameter. It has patand replparameter instead:

您需要将修剪后的列分配回原始列而不是进行子集化,而且该str.replace方法似乎没有to_replaceandvalue参数。它具有patrepl参数:

df["Date"] = df["Date"].str.replace("\s:00", "")

df
#   ID       Date
#0   1  8/24/1995
#1   2   8/1/1899

回答by piRSquared

To apply this to an entire dataframe, I'd stackthen unstack

要将其应用于整个数据帧,stack然后我会unstack

df.stack().str.replace(r'\s:00', '').unstack()

enter image description here

在此处输入图片说明

functionalized

功能化

def dfreplace(df, *args, **kwargs):
    s = pd.Series(df.values.flatten())
    s = s.str.replace(*args, **kwargs)
    return pd.DataFrame(s.values.reshape(df.shape), df.index, df.columns)

Examples

例子

df = pd.DataFrame(['8/24/1995', '8/1/1899 :00'], pd.Index([1, 2], name='ID'), ['Date'])

dfreplace(df, '\s:00', '')

enter image description here

在此处输入图片说明



rng = range(5)
df2 = pd.concat([pd.concat([df for _ in rng]) for _ in rng], axis=1)

df2

enter image description here

在此处输入图片说明

dfreplace(df2, '\s:00', '')

enter image description here

在此处输入图片说明