Python Pandas 中 map、applymap 和 apply 方法的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19798153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between map, applymap and apply methods in Pandas
提问by marillion
Can you tell me when to use these vectorization methods with basic examples?
你能告诉我什么时候用基本的例子来使用这些矢量化方法吗?
I see that mapis a Seriesmethod whereas the rest are DataFramemethods. I got confused about applyand applymapmethods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
我看到这map是一种Series方法,而其余的都是DataFrame方法。我对apply和applymap方法感到困惑。为什么我们有两种方法可以将函数应用于 DataFrame?同样,说明用法的简单示例会很棒!
采纳答案by jeremiahbuddha
Straight from Wes McKinney's Python for Data Analysisbook, pg. 132 (I highly recommended this book):
直接来自 Wes McKinney 的Python for Data Analysis一书,pg。132(我强烈推荐这本书):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame's apply method does exactly this:
另一个常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame 的 apply 方法正是这样做的:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
许多最常见的数组统计信息(如 sum 和 mean)都是 DataFrame 方法,因此没有必要使用 apply 。
也可以使用基于元素的 Python 函数。假设您想根据帧中的每个浮点值计算一个格式化的字符串。你可以用 applymap 做到这一点:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
命名为 applymap 的原因是 Series 有一个 map 方法来应用元素智能函数:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Summing up, applyworks on a row / column basis of a DataFrame, applymapworks element-wise on a DataFrame, and mapworks element-wise on a Series.
总而言之,apply在 DataFrame 的行/列基础上工作,在 DataFrameapplymap上按map元素工作,在 Series 上按元素工作。
回答by user2921752
@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....
@jeremiahbuddha 提到 apply 适用于行/列,而 applymap 适用于元素。但似乎您仍然可以使用 apply 进行元素计算....
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
回答by osa
Adding to the other answers, in a Seriesthere are also mapand apply.
Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.
Apply 可以将 DataFrame 组合成一个系列;但是, map 只会在另一个系列的每个单元格中放置一个系列,这可能不是您想要的。
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
Also if I had a function with side effects, such as "connect to a web server", I'd probably use applyjust for the sake of clarity.
此外,如果我有一个有副作用的函数,例如“连接到网络服务器”,我可能apply只是为了清楚起见而使用。
series.apply(download_file_for_every_element)
Mapcan use not only a function, but also a dictionary or another series.Let's say you want to manipulate permutations.
Map不仅可以使用函数,还可以使用字典或其他系列。假设您想操作permutations。
Take
拿
1 2 3 4 5
2 1 4 5 3
The square of this permutation is
这个排列的平方是
1 2 3 4 5
1 2 5 3 4
You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.
您可以使用map. 不确定是否记录了自我应用程序,但它在0.15.1.
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
回答by muon
Just wanted to point out, as I struggled with this for a bit
只是想指出,因为我为此挣扎了一段时间
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
this does not modify the dataframe itself, has to be reassigned
这不会修改数据帧本身,必须重新分配
df = df.applymap(f)
df.describe()
回答by Kath
Probably simplest explanation the difference between apply and applymap:
可能最简单的解释 apply 和 applymap 之间的区别:
applytakes the whole column as a parameter and then assign the result to this column
apply将整列作为参数,然后将结果分配给该列
applymaptakes the separate cell value as a parameter and assign the result back to this cell.
applymap将单独的单元格值作为参数并将结果分配回该单元格。
NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.
注意,如果 apply 返回单个值,则分配后您将拥有此值而不是列,最终将只有一行而不是矩阵。
回答by MarredCheese
There's great information in these answers, but I'm adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don't have the rep to comment.
这些答案中有很好的信息,但我正在添加我自己的信息以清楚地总结哪些方法是按数组工作还是按元素工作。jeremiahbuddha 主要是这样做的,但没有提到 Series.apply。我没有代表发表评论。
DataFrame.applyoperates on entire rows or columns at a time.DataFrame.applymap,Series.apply, andSeries.mapoperate on one element at time.
DataFrame.apply一次对整行或整列进行操作。DataFrame.applymap,Series.apply, 和Series.map一次对一个元素进行运算。
There is a lot of overlap between the capabilities of Series.applyand Series.map, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa's answer.
Series.apply和的功能之间有很多重叠Series.map,这意味着在大多数情况下,任何一个都可以使用。不过,它们确实有一些细微的差别,其中一些在 osa 的回答中进行了讨论。
回答by Vicky Miao
My understanding:
我的理解:
From the function point of view:
从功能上看:
If the function has variables that need to compare within a column/ row, use
apply.
如果函数具有需要在列/行内进行比较的变量,请使用
apply.
e.g.: lambda x: x.max()-x.mean().
例如:lambda x: x.max()-x.mean()。
If the function is to be applied to each element:
如果将函数应用于每个元素:
1> If a column/row is located, use apply
1> 如果定位到列/行,则使用 apply
2> If apply to entire dataframe, use applymap
2>如果适用于整个数据框,使用 applymap
majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)
def times10(x):
if type(x) is int:
x *= 10
return x
df2.applymap(times10)
回答by prosti
FOMO:
恐惧症:
The following example shows applyand applymapapplied to a DataFrame.
以下示例显示apply并applymap应用于DataFrame.
mapfunction is something you do apply on Series only. You cannot apply mapon DataFrame.
map功能是您只适用于系列的东西。您不能map在 DataFrame 上申请。
The thing to remember is that applycan do anythingapplymapcan, but applyhas eXtraoptions.
要记住的事情是 apply可以做任何事情applymap,但apply有额外的选择。
The X factor options are: axisand result_typewhere result_typeonly works when axis=1(for columns).
X 因子选项是:axis和result_typewhereresult_type仅适用于axis=1(对于列)。
df = DataFrame(1, columns=list('abc'),
index=list('1234'))
print(df)
f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only
# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1)) # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result
As a sidenote, Series mapfunction, should not be confused with the Python mapfunction.
作为旁注,Seriesmap函数不应与 Pythonmap函数混淆。
The first one is applied on Series, to map the values, and the second one to every item of an iterable.
第一个应用于系列,以映射值,第二个应用于可迭代的每个项目。
Lastly don't confuse the dataframe applymethod with groupby applymethod.
回答by cs95
Comparing map, applymapand apply: Context Matters
比较map,applymap并且:上下文事项apply
First major difference: DEFINITION
第一个主要区别:定义
mapis defined on Series ONLYapplymapis defined on DataFrames ONLYapplyis defined on BOTH
map仅在系列上定义applymap仅在 DataFrame 上定义apply在 BOTH 上定义
Second major difference: INPUT ARGUMENT
第二个主要区别:输入参数
mapacceptsdicts,Series, or callableapplymapandapplyaccept callables only
map接受dicts,Series, 或 callableapplymap并apply仅接受可调用对象
Third major difference: BEHAVIOR
第三个主要区别:行为
mapis elementwise for Seriesapplymapis elementwise for DataFramesapplyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
map是系列的元素applymap是 DataFrames 的元素apply也适用于元素,但适用于更复杂的操作和聚合。行为和返回值取决于函数。
Fourth major difference (the most important one): USE CASE
第四个主要区别(最重要的一个):USE CASE
mapis meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'}))applymapis good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip))applyis for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize))
map用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如,df['A'].map({1:'a', 2:'b', 3:'c'}))applymap适用于跨多行/多列的元素转换(例如,df[['A', 'B', 'C']].applymap(str.strip))apply用于应用任何无法向量化的函数(例如,df['sentences'].apply(nltk.sent_tokenize))
Summarising
总结
Footnotes
mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
applymapin more recent versions has been optimised for some operations. You will findapplymapslightly faster thanapplyin some cases. My suggestion is to test them both and use whatever works better.
mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.Series.applyreturns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply. Note thatapplyalso has fastpaths when called with certain NumPy functions such asmean,sum, etc.
脚注
map传递时,字典/系列将根据该字典/系列中的键映射元素。缺失值将在输出中记录为 NaN。
applymap在较新的版本中已针对某些操作进行了优化。您会发现applymap比apply某些情况下略快。我的建议是测试它们并使用更有效的方法。
map针对元素映射和转换进行了优化。涉及字典或系列的操作将使 Pandas 能够使用更快的代码路径以获得更好的性能。Series.apply返回一个用于聚合操作的标量,否则返回 Series。对于DataFrame.apply. 需要注意的是apply,当某些NumPy的功能,如所谓的也有FastPaths的mean,sum等等。
回答by Alpha
Based on the answer of cs95
基于cs95的答案
mapis defined on Series ONLYapplymapis defined on DataFrames ONLYapplyis defined on BOTH
map仅在系列上定义applymap仅在 DataFrame 上定义apply在 BOTH 上定义
give some examples
举一些例子
In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [4]: frame
Out[4]:
b d e
Utah 0.129885 -0.475957 -0.207679
Ohio -2.978331 -1.015918 0.784675
Texas -0.256689 -0.226366 2.262588
Oregon 2.605526 1.139105 -0.927518
In [5]: myformat=lambda x: f'{x:.2f}'
In [6]: frame.d.map(myformat)
Out[6]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [7]: frame.d.apply(myformat)
Out[7]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [8]: frame.applymap(myformat)
Out[8]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [10]: myfunc=lambda x: x**2
In [11]: frame.applymap(myfunc)
Out[11]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
In [12]: frame.apply(myfunc)
Out[12]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289


