Python:pandas apply 与 map
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42175526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: pandas apply vs. map
提问by FredMaster
I am struggling to understand how df.apply()
exactly works.
我正在努力理解究竟df.apply()
是如何工作的。
My problem is as follows: I have a dataframe df
. Now I want to search in several columns for certain strings. If the string is found in any of the columns I want to add for each row where the string is found a "label" (in a new column).
我的问题如下:我有一个数据框df
。现在我想在几列中搜索某些字符串。如果在任何列中找到字符串,我想为找到字符串的每一行添加一个“标签”(在新列中)。
I am able to solve the problem with map
and applymap
(see below).
我能够用map
和解决问题applymap
(见下文)。
However, I would expect that the better solution would be to use apply
as it applies a function to an entire column.
但是,我希望更好的解决方案是使用, apply
因为它将函数应用于整个列。
Question: Is this not possible using apply
? Where is my mistake?
问题:这是不可能使用的apply
吗?我的错误在哪里?
Here are my solutions for using map
and applymap
.
这是我使用map
和的解决方案applymap
。
df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])
Solution using map
解决方案使用 map
def setlabel_func(column):
return df[column].str.contains("A")
mask = sum(map(setlabel_func, ["h1","h5"]))
df.ix[mask==1,"New Column"] = "Label"
Solution using applymap
解决方案使用 applymap
mask = df[["h1","h5"]].applymap(lambda el: True if re.match("A",el) else False).T.any()
df.ix[mask == True, "New Column"] = "Label"
For apply
I don't know how to pass the two columns into the function / or maybe don't understand the mechanics at all ;-)
因为apply
我不知道如何将两列传递给函数/或者根本不了解机制;-)
def setlabel_func(column):
return df[column].str.contains("A")
df.apply(setlabel_func(["h1","h5"]),axis = 1)
Above gives me alert.
以上让我警醒。
'DataFrame' object has no attribute 'str'
'DataFrame' 对象没有属性 'str'
Any advice? Please note that the search function in my real application is more complex and requires a regex function which is why I use .str.contain
in the first place.
有什么建议吗?请注意,我的实际应用程序中的搜索功能更复杂,并且需要一个正则表达式功能,这就是我.str.contain
首先使用的原因。
回答by jezrael
Another solutions are use DataFrame.any
for get at least one True
per row:
另一种解决方案DataFrame.any
用于获取True
每行至少一个:
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')))
h1 h5
0 True False
1 False False
2 False True
print (df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1))
0 True
1 False
2 True
dtype: bool
df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A')).any(1),
'Label', '')
print (df)
h1 h2 h3 h4 h5 new
0 A B C D Z Label
1 E A G H Y
2 I J K L A Label
mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A')).any(1)
df.loc[mask, 'New'] = 'Label'
print (df)
h1 h2 h3 h4 h5 New
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label
回答by piRSquared
pd.DataFrame.apply
iterates over each column, passing the column as a pd.Series
to the function being applied. In you case, the function you're trying to apply doesn't lend itself to being used in apply
pd.DataFrame.apply
迭代每一列,将列作为 a 传递pd.Series
给正在应用的函数。在您的情况下,您尝试应用的功能不适合在apply
Do this instead to get your idea to work
这样做是为了让您的想法发挥作用
mask = df[['h1', 'h5']].apply(lambda x: x.str.contains('A').any(), 1)
df.loc[mask, 'New Column'] = 'Label'
h1 h2 h3 h4 h5 New Column
0 A B C D Z Label
1 E A G H Y NaN
2 I J K L A Label
?
回答by MaxU
IIUC you can do it this way:
IIUC 你可以这样做:
In [23]: df['new'] = np.where(df[['h1','h5']].apply(lambda x: x.str.contains('A'))
.sum(1) > 0,
'Label', '')
In [24]: df
Out[24]:
h1 h2 h3 h4 h5 new
0 A B C D Z Label
1 E A G H Y
2 I J K L A Label
回答by Burgertron
Others have given good alternative methods. Here is a way to use apply 'row wise' (axis=1)to get your new column indicating presence of "A" for a bunch of columns.
其他人提供了很好的替代方法。这是一种使用 apply 'row wise' (axis=1)来获取新列的方法,该列指示一堆列存在“A”。
If you are passed a row, you can just join the strings together into one big string and then use a string comparison ("in") see below. here I am combing all columns, but you can do it with just H1 and h5 easily.
如果你被传递了一行,你可以将这些字符串连接成一个大字符串,然后使用字符串比较(“in”),见下文。在这里,我正在梳理所有列,但您只需使用 H1 和 h5 即可轻松完成。
df = pd.DataFrame([list("ABCDZ"),list("EAGHY"), list("IJKLA")], columns = ["h1","h2","h3","h4", "h5"])
def dothat(row):
sep = ""
return "A" in sep.join(row['h1':'h5'])
df['NewColumn'] = df.apply(dothat,axis=1)
This just squashes squashes each row into one string (e.g. ABCDZ) and looks for "A". This is not that efficient though if you just want to quit the first time you find the string then combining all the columns could be a waste of time. You could easily change the function to look column by column and quit (return true) when it finds a hit.
这只是将每一行压缩成一个字符串(例如 ABCDZ)并查找“A”。这不是那么有效,但如果您只想在第一次找到字符串时退出,那么组合所有列可能会浪费时间。您可以轻松地将函数更改为逐列查看,并在找到匹配项时退出(返回 true)。