Python Pandas 中 map、applymap 和 apply 方法的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19798153/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:40:06  来源:igfitidea点击:

Difference between map, applymap and apply methods in Pandas

pythonpandasdataframevectorization

提问by marillion

Can you tell me when to use these vectorization methods with basic examples?

你能告诉我什么时候用基本的例子来使用这些矢量化方法吗?

I see that mapis a Seriesmethod whereas the rest are DataFramemethods. I got confused about applyand applymapmethods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

我看到这map是一种Series方法,而其余的都是DataFrame方法。我对applyapplymap方法感到困惑。为什么我们有两种方法可以将函数应用于 DataFrame?同样,说明用法的简单示例会很棒!

采纳答案by jeremiahbuddha

Straight from Wes McKinney's Python for Data Analysisbook, pg. 132 (I highly recommended this book):

直接来自 Wes McKinney 的Python for Data Analysis一书,pg。132(我强烈推荐这本书):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame's apply method does exactly this:

另一个常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame 的 apply 方法正是这样做的:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

许多最常见的数组统计信息(如 sum 和 mean)都是 DataFrame 方法,因此没有必要使用 apply 。

也可以使用基于元素的 Python 函数。假设您想根据帧中的每个浮点值计算一个格式化的字符串。你可以用 applymap 做到这一点:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

命名为 applymap 的原因是 Series 有一个 map 方法来应用元素智能函数:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

Summing up, applyworks on a row / column basis of a DataFrame, applymapworks element-wise on a DataFrame, and mapworks element-wise on a Series.

总而言之,apply在 DataFrame 的行/列基础上工作,在 DataFrameapplymap上按map元素工作,在 Series 上按元素工作。

回答by user2921752

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....

@jeremiahbuddha 提到 apply 适用于行/列,而 applymap 适用于元素。但似乎您仍然可以使用 apply 进行元素计算....

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

回答by osa

Adding to the other answers, in a Seriesthere are also mapand apply.

除了其他答案之外,Series还有mapapply

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

Apply 可以将 DataFrame 组合成一个系列;但是, map 只会在另一个系列的每个单元格中放置一个系列,这可能不是您想要的。

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as "connect to a web server", I'd probably use applyjust for the sake of clarity.

此外,如果我有一个有副作用的函数,例如“连接到网络服务器”,我可能apply只是为了清楚起见而使用。

series.apply(download_file_for_every_element) 

Mapcan use not only a function, but also a dictionary or another series.Let's say you want to manipulate permutations.

Map不仅可以使用函数,还可以使用字典或其他系列。假设您想操作permutations

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

这个排列的平方是

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

您可以使用map. 不确定是否记录了自我应用程序,但它在0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

回答by muon

Just wanted to point out, as I struggled with this for a bit

只是想指出,因为我为此挣扎了一段时间

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

this does not modify the dataframe itself, has to be reassigned

这不会修改数据帧本身,必须重新分配

df = df.applymap(f)
df.describe()

回答by Kath

Probably simplest explanation the difference between apply and applymap:

可能最简单的解释 apply 和 applymap 之间的区别:

applytakes the whole column as a parameter and then assign the result to this column

apply将整列作为参数,然后将结果分配给该列

applymaptakes the separate cell value as a parameter and assign the result back to this cell.

applymap将单独的单元格值作为参数并将结果分配回该单元格。

NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.

注意,如果 apply 返回单个值,则分配后您将拥有此值而不是列,最终将只有一行而不是矩阵。

回答by MarredCheese

There's great information in these answers, but I'm adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don't have the rep to comment.

这些答案中有很好的信息,但我正在添加我自己的信息以清楚地总结哪些方法是按数组工作还是按元素工作。jeremiahbuddha 主要是这样做的,但没有提到 Series.apply。我没有代表发表评论。

  • DataFrame.applyoperates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.mapoperate on one element at time.

  • DataFrame.apply一次对整行或整列进行操作。

  • DataFrame.applymap, Series.apply, 和Series.map一次对一个元素进行运算。

There is a lot of overlap between the capabilities of Series.applyand Series.map, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa's answer.

Series.apply和的功能之间有很多重叠Series.map,这意味着在大多数情况下,任何一个都可以使用。不过,它们确实有一些细微的差别,其中一些在 osa 的回答中进行了讨论。

回答by Vicky Miao

My understanding:

我的理解:

From the function point of view:

从功能上看:

If the function has variables that need to compare within a column/ row, use apply.

如果函数具有需要在列/行内进行比较的变量,请使用 apply.

e.g.: lambda x: x.max()-x.mean().

例如:lambda x: x.max()-x.mean()

If the function is to be applied to each element:

如果将函数应用于每个元素:

1> If a column/row is located, use apply

1> 如果定位到列/行,则使用 apply

2> If apply to entire dataframe, use applymap

2>如果适用于整个数据框,使用 applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

回答by prosti

FOMO:

恐惧症:

The following example shows applyand applymapapplied to a DataFrame.

以下示例显示applyapplymap应用于DataFrame.

mapfunction is something you do apply on Series only. You cannot apply mapon DataFrame.

map功能是您只适用于系列的东西。您不能map在 DataFrame 上申请。

The thing to remember is that applycan do anythingapplymapcan, but applyhas eXtraoptions.

要记住的事情是 apply可以做任何事情applymap,但apply额外的选择。

The X factor options are: axisand result_typewhere result_typeonly works when axis=1(for columns).

X 因子选项是:axisresult_typewhereresult_type仅适用于axis=1(对于列)。

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

As a sidenote, Series mapfunction, should not be confused with the Python mapfunction.

作为旁注,Seriesmap函数不应与 Pythonmap函数混淆。

The first one is applied on Series, to map the values, and the second one to every item of an iterable.

第一个应用于系列,以映射值,第二个应用于可迭代的每个项目。



Lastly don't confuse the dataframe applymethod with groupby applymethod.

最后不要将 dataframeapply方法与 groupbyapply方法混淆。

回答by cs95

Comparing map, applymapand apply: Context Matters

比较mapapplymap并且:上下文事项apply

First major difference: DEFINITION

第一个主要区别:定义

  • mapis defined on Series ONLY
  • applymapis defined on DataFrames ONLY
  • applyis defined on BOTH
  • map仅在系列上定义
  • applymap仅在 DataFrame 上定义
  • apply在 BOTH 上定义

Second major difference: INPUT ARGUMENT

第二个主要区别:输入参数

  • mapaccepts dicts, Series, or callable
  • applymapand applyaccept callables only
  • map接受dicts, Series, 或 callable
  • applymapapply仅接受可调用对象

Third major difference: BEHAVIOR

第三个主要区别:行为

  • mapis elementwise for Series
  • applymapis elementwise for DataFrames
  • applyalso works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.
  • map是系列的元素
  • applymap是 DataFrames 的元素
  • apply也适用于元素,但适用于更复杂的操作和聚合。行为和返回值取决于函数。

Fourth major difference (the most important one): USE CASE

第四个主要区别(最重要的一个):USE CASE

  • mapis meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
  • applymapis good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
  • applyis for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))
  • map用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如,df['A'].map({1:'a', 2:'b', 3:'c'})
  • applymap适用于跨多行/多列的元素转换(例如,df[['A', 'B', 'C']].applymap(str.strip)
  • apply用于应用任何无法向量化的函数(例如,df['sentences'].apply(nltk.sent_tokenize)


Summarising

总结

enter image description here

在此处输入图片说明

Footnotes

  1. mapwhen passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
  2. applymapin more recent versions has been optimised for some operations. You will find applymapslightly faster than applyin some cases. My suggestion is to test them both and use whatever works better.

  3. mapis optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.applyreturns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that applyalso has fastpaths when called with certain NumPy functions such as mean, sum, etc.

脚注

  1. map传递时,字典/系列将根据该字典/系列中的键映射元素。缺失值将在输出中记录为 NaN。
  2. applymap在较新的版本中已针对某些操作进行了优化。您会发现applymapapply某些情况下略快。我的建议是测试它们并使用更有效的方法。

  3. map针对元素映射和转换进行了优化。涉及字典或系列的操作将使 Pandas 能够使用更快的代码路径以获得更好的性能。

  4. Series.apply返回一个用于聚合操作的标量,否则返回 Series。对于DataFrame.apply. 需要注意的是apply,当某些NumPy的功能,如所谓的也有FastPaths的meansum等等。

回答by Alpha

Based on the answer of cs95

基于cs95的答案

  • mapis defined on Series ONLY
  • applymapis defined on DataFrames ONLY
  • applyis defined on BOTH
  • map仅在系列上定义
  • applymap仅在 DataFrame 上定义
  • apply在 BOTH 上定义

give some examples

举一些例子

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289