Python Pandas 中 map、applymap 和 apply 方法的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19798153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between map, applymap and apply methods in Pandas
提问by marillion
Can you tell me when to use these vectorization methods with basic examples?
你能告诉我什么时候用基本的例子来使用这些矢量化方法吗?
I see that map
is a Series
method whereas the rest are DataFrame
methods. I got confused about apply
and applymap
methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
我看到这map
是一种Series
方法,而其余的都是DataFrame
方法。我对apply
和applymap
方法感到困惑。为什么我们有两种方法可以将函数应用于 DataFrame?同样,说明用法的简单示例会很棒!
采纳答案by jeremiahbuddha
Straight from Wes McKinney's Python for Data Analysisbook, pg. 132 (I highly recommended this book):
直接来自 Wes McKinney 的Python for Data Analysis一书,pg。132(我强烈推荐这本书):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame's apply method does exactly this:
另一个常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame 的 apply 方法正是这样做的:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
许多最常见的数组统计信息(如 sum 和 mean)都是 DataFrame 方法,因此没有必要使用 apply 。
也可以使用基于元素的 Python 函数。假设您想根据帧中的每个浮点值计算一个格式化的字符串。你可以用 applymap 做到这一点:
In [120]: format = lambda x: '%.2f' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
命名为 applymap 的原因是 Series 有一个 map 方法来应用元素智能函数:
In [122]: frame['e'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Summing up, apply
works on a row / column basis of a DataFrame, applymap
works element-wise on a DataFrame, and map
works element-wise on a Series.
总而言之,apply
在 DataFrame 的行/列基础上工作,在 DataFrameapplymap
上按map
元素工作,在 Series 上按元素工作。
回答by user2921752
@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....
@jeremiahbuddha 提到 apply 适用于行/列,而 applymap 适用于元素。但似乎您仍然可以使用 apply 进行元素计算....
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
回答by osa
Adding to the other answers, in a Series
there are also mapand apply.
Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.
Apply 可以将 DataFrame 组合成一个系列;但是, map 只会在另一个系列的每个单元格中放置一个系列,这可能不是您想要的。
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply
just for the sake of clarity.
此外,如果我有一个有副作用的函数,例如“连接到网络服务器”,我可能apply
只是为了清楚起见而使用。
series.apply(download_file_for_every_element)
Map
can use not only a function, but also a dictionary or another series.Let's say you want to manipulate permutations.
Map
不仅可以使用函数,还可以使用字典或其他系列。假设您想操作permutations。
Take
拿
1 2 3 4 5
2 1 4 5 3
The square of this permutation is
这个排列的平方是
1 2 3 4 5
1 2 5 3 4
You can compute it using map
. Not sure if self-application is documented, but it works in 0.15.1
.
您可以使用map
. 不确定是否记录了自我应用程序,但它在0.15.1
.
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
回答by muon
Just wanted to point out, as I struggled with this for a bit
只是想指出,因为我为此挣扎了一段时间
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
this does not modify the dataframe itself, has to be reassigned
这不会修改数据帧本身,必须重新分配
df = df.applymap(f)
df.describe()
回答by Kath
Probably simplest explanation the difference between apply and applymap:
可能最简单的解释 apply 和 applymap 之间的区别:
applytakes the whole column as a parameter and then assign the result to this column
apply将整列作为参数,然后将结果分配给该列
applymaptakes the separate cell value as a parameter and assign the result back to this cell.
applymap将单独的单元格值作为参数并将结果分配回该单元格。
NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.
注意,如果 apply 返回单个值,则分配后您将拥有此值而不是列,最终将只有一行而不是矩阵。
回答by MarredCheese
There's great information in these answers, but I'm adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don't have the rep to comment.
这些答案中有很好的信息,但我正在添加我自己的信息以清楚地总结哪些方法是按数组工作还是按元素工作。jeremiahbuddha 主要是这样做的,但没有提到 Series.apply。我没有代表发表评论。
DataFrame.apply
operates on entire rows or columns at a time.DataFrame.applymap
,Series.apply
, andSeries.map
operate on one element at time.
DataFrame.apply
一次对整行或整列进行操作。DataFrame.applymap
,Series.apply
, 和Series.map
一次对一个元素进行运算。
There is a lot of overlap between the capabilities of Series.apply
and Series.map
, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa's answer.
Series.apply
和的功能之间有很多重叠Series.map
,这意味着在大多数情况下,任何一个都可以使用。不过,它们确实有一些细微的差别,其中一些在 osa 的回答中进行了讨论。
回答by Vicky Miao
My understanding:
我的理解:
From the function point of view:
从功能上看:
If the function has variables that need to compare within a column/ row, use
apply
.
如果函数具有需要在列/行内进行比较的变量,请使用
apply
.
e.g.: lambda x: x.max()-x.mean()
.
例如:lambda x: x.max()-x.mean()
。
If the function is to be applied to each element:
如果将函数应用于每个元素:
1> If a column/row is located, use apply
1> 如果定位到列/行,则使用 apply
2> If apply to entire dataframe, use applymap
2>如果适用于整个数据框,使用 applymap
majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)
def times10(x):
if type(x) is int:
x *= 10
return x
df2.applymap(times10)
回答by prosti
FOMO:
恐惧症:
The following example shows apply
and applymap
applied to a DataFrame
.
以下示例显示apply
并applymap
应用于DataFrame
.
map
function is something you do apply on Series only. You cannot apply map
on DataFrame.
map
功能是您只适用于系列的东西。您不能map
在 DataFrame 上申请。
The thing to remember is that apply
can do anythingapplymap
can, but apply
has eXtraoptions.
要记住的事情是 apply
可以做任何事情applymap
,但apply
有额外的选择。
The X factor options are: axis
and result_type
where result_type
only works when axis=1
(for columns).
X 因子选项是:axis
和result_type
whereresult_type
仅适用于axis=1
(对于列)。
df = DataFrame(1, columns=list('abc'),
index=list('1234'))
print(df)
f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only
# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1)) # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result
As a sidenote, Series map
function, should not be confused with the Python map
function.
作为旁注,Seriesmap
函数不应与 Pythonmap
函数混淆。
The first one is applied on Series, to map the values, and the second one to every item of an iterable.
第一个应用于系列,以映射值,第二个应用于可迭代的每个项目。
Lastly don't confuse the dataframe apply
method with groupby apply
method.
回答by cs95
Comparing map
, applymap
and ap
ply
: Context Matters
比较map
,applymap
并且:上下文事项ap
ply
First major difference: DEFINITION
第一个主要区别:定义
map
is defined on Series ONLYapplymap
is defined on DataFrames ONLYapply
is defined on BOTH
map
仅在系列上定义applymap
仅在 DataFrame 上定义apply
在 BOTH 上定义
Second major difference: INPUT ARGUMENT
第二个主要区别:输入参数
map
acceptsdict
s,Series
, or callableapplymap
andapply
accept callables only
map
接受dict
s,Series
, 或 callableapplymap
并apply
仅接受可调用对象
Third major difference: BEHAVIOR
第三个主要区别:行为
map
is elementwise for Seriesapplymap
is elementwise for DataFramesapply
also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
map
是系列的元素applymap
是 DataFrames 的元素apply
也适用于元素,但适用于更复杂的操作和聚合。行为和返回值取决于函数。
Fourth major difference (the most important one): USE CASE
第四个主要区别(最重要的一个):USE CASE
map
is meant for mapping values from one domain to another, so is optimised for performance (e.g.,df['A'].map({1:'a', 2:'b', 3:'c'})
)applymap
is good for elementwise transformations across multiple rows/columns (e.g.,df[['A', 'B', 'C']].applymap(str.strip)
)apply
is for applying any function that cannot be vectorised (e.g.,df['sentences'].apply(nltk.sent_tokenize)
)
map
用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如,df['A'].map({1:'a', 2:'b', 3:'c'})
)applymap
适用于跨多行/多列的元素转换(例如,df[['A', 'B', 'C']].applymap(str.strip)
)apply
用于应用任何无法向量化的函数(例如,df['sentences'].apply(nltk.sent_tokenize)
)
Summarising
总结
Footnotes
map
when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
applymap
in more recent versions has been optimised for some operations. You will findapplymap
slightly faster thanapply
in some cases. My suggestion is to test them both and use whatever works better.
map
is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.Series.apply
returns a scalar for aggregating operations, Series otherwise. Similarly forDataFrame.apply
. Note thatapply
also has fastpaths when called with certain NumPy functions such asmean
,sum
, etc.
脚注
map
传递时,字典/系列将根据该字典/系列中的键映射元素。缺失值将在输出中记录为 NaN。
applymap
在较新的版本中已针对某些操作进行了优化。您会发现applymap
比apply
某些情况下略快。我的建议是测试它们并使用更有效的方法。
map
针对元素映射和转换进行了优化。涉及字典或系列的操作将使 Pandas 能够使用更快的代码路径以获得更好的性能。Series.apply
返回一个用于聚合操作的标量,否则返回 Series。对于DataFrame.apply
. 需要注意的是apply
,当某些NumPy的功能,如所谓的也有FastPaths的mean
,sum
等等。
回答by Alpha
Based on the answer of cs95
基于cs95的答案
map
is defined on Series ONLYapplymap
is defined on DataFrames ONLYapply
is defined on BOTH
map
仅在系列上定义applymap
仅在 DataFrame 上定义apply
在 BOTH 上定义
give some examples
举一些例子
In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [4]: frame
Out[4]:
b d e
Utah 0.129885 -0.475957 -0.207679
Ohio -2.978331 -1.015918 0.784675
Texas -0.256689 -0.226366 2.262588
Oregon 2.605526 1.139105 -0.927518
In [5]: myformat=lambda x: f'{x:.2f}'
In [6]: frame.d.map(myformat)
Out[6]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [7]: frame.d.apply(myformat)
Out[7]:
Utah -0.48
Ohio -1.02
Texas -0.23
Oregon 1.14
Name: d, dtype: object
In [8]: frame.applymap(myformat)
Out[8]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
b d e
Utah 0.13 -0.48 -0.21
Ohio -2.98 -1.02 0.78
Texas -0.26 -0.23 2.26
Oregon 2.61 1.14 -0.93
In [10]: myfunc=lambda x: x**2
In [11]: frame.applymap(myfunc)
Out[11]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289
In [12]: frame.apply(myfunc)
Out[12]:
b d e
Utah 0.016870 0.226535 0.043131
Ohio 8.870453 1.032089 0.615714
Texas 0.065889 0.051242 5.119305
Oregon 6.788766 1.297560 0.860289