Python:pandas apply 与 map

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42175526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:57:27  来源:igfitidea点击:

Python: pandas apply vs. map

pythonpandasapply

提问by FredMaster

I am struggling to understand how df.apply()exactly works.

我正在努力理解究竟df.apply()是如何工作的。

My problem is as follows: I have a dataframe df. Now I want to search in several columns for certain strings. If the string is found in any of the columns I want to add for each row where the string is found a "label" (in a new column).

我的问题如下:我有一个数据框df。现在我想在几列中搜索某些字符串。如果在任何列中找到字符串,我想为找到字符串的每一行添加一个“标签”(在新列中)。

I am able to solve the problem with mapand applymap(see below).

我能够用map和解决问题applymap(见下文)。

However, I would expect that the better solution would be to use applyas it applies a function to an entire column.

但是,我希望更好的解决方案是使用, apply因为它将函数应用于整个列。

Question: Is this not possible using apply? Where is my mistake?

问题:这是不可能使用的apply吗?我的错误在哪里?

Here are my solutions for using mapand applymap.

这是我使用map和的解决方案applymap

df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])

Solution using map

解决方案使用 map

def setlabel_func(column):
    return df[column].str.contains("A")

mask = sum(map(setlabel_func, ["h1","h5"]))
df.ix[mask==1,"New Column"] = "Label"

Solution using applymap

解决方案使用 applymap

mask = df[["h1","h5"]].applymap(lambda el: True if re.match("A",el) else False).T.any()
df.ix[mask == True, "New Column"] = "Label"

For applyI don't know how to pass the two columns into the function / or maybe don't understand the mechanics at all ;-)

因为apply我不知道如何将两列传递给函数/或者根本不了解机制;-)

def setlabel_func(column):
    return df[column].str.contains("A")

df.apply(setlabel_func(["h1","h5"]),axis = 1)

Above gives me alert.

以上让我警醒。

'DataFrame' object has no attribute 'str'

'DataFrame' 对象没有属性 'str'

Any advice? Please note that the search function in my real application is more complex and requires a regex function which is why I use .str.containin the first place.

有什么建议吗?请注意,我的实际应用程序中的搜索功能更复杂,并且需要一个正则表达式功能,这就是我.str.contain首先使用的原因。

回答by jezrael

Another solutions are use DataFrame.anyfor get at least one Trueper row:

另一种解决方案DataFrame.any用于获取True每行至少一个:

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
      h1     h5
0   True  False
1  False  False
2  False   True

print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0     True
1    False
2     True
dtype: bool


df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
                     'Label', '')

print (df)
  h1 h2 h3 h4 h5    new
0  A  B  C  D  Z  Label
1  E  A  G  H  Y       
2  I  J  K  L  A  Label


mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
  h1 h2 h3 h4 h5    New
0  A  B  C  D  Z  Label
1  E  A  G  H  Y    NaN
2  I  J  K  L  A  Label

回答by piRSquared

pd.DataFrame.applyiterates over each column, passing the column as a pd.Seriesto the function being applied. In you case, the function you're trying to apply doesn't lend itself to being used in apply

pd.DataFrame.apply迭代每一列,将列作为 a 传递pd.Series给正在应用的函数。在您的情况下,您尝试应用的功能不适合在apply

Do this instead to get your idea to work

这样做是为了让您的想法发挥作用

mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A').any(), 1)
df.loc[mask, 'New Column'] = 'Label'

  h1 h2 h3 h4 h5 New Column
0  A  B  C  D  Z      Label
1  E  A  G  H  Y        NaN
2  I  J  K  L  A      Label

?

回答by MaxU

IIUC you can do it this way:

IIUC 你可以这样做:

In [23]: df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A'))
                                             .sum(1) > 0,
                              'Label', '')

In [24]: df
Out[24]:
  h1 h2 h3 h4 h5    new
0  A  B  C  D  Z  Label
1  E  A  G  H  Y
2  I  J  K  L  A  Label

回答by Burgertron

Others have given good alternative methods. Here is a way to use apply 'row wise' (axis=1)to get your new column indicating presence of "A" for a bunch of columns.

其他人提供了很好的替代方法。这是一种使用 apply 'row wise' (axis=1)来获取新列的方法,该列指示一堆列存在“A”。

If you are passed a row, you can just join the strings together into one big string and then use a string comparison ("in") see below. here I am combing all columns, but you can do it with just H1 and h5 easily.

如果你被传递了一行,你可以将这些字符串连接成一个大字符串,然后使用字符串比较(“in”),见下文。在这里,我正在梳理所有列,但您只需使用 H1 和 h5 即可轻松完成。

df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])

def dothat(row):
    sep = ""
    return "A" in sep.join(row['h1':'h5'])
df['NewColumn'] = df.apply(dothat,axis=1)

This just squashes squashes each row into one string (e.g. ABCDZ) and looks for "A". This is not that efficient though if you just want to quit the first time you find the string then combining all the columns could be a waste of time. You could easily change the function to look column by column and quit (return true) when it finds a hit.

这只是将每一行压缩成一个字符串(例如 ABCDZ)并查找“A”。这不是那么有效,但如果您只想在第一次找到字符串时退出,那么组合所有列可能会浪费时间。您可以轻松地将函数更改为逐列查看,并在找到匹配项时退出(返回 true)。