在带有多个 if 语句的 Pandas Lambda 函数中使用 Apply

Question

提问by abutremutante

I'm trying to infer a classification according to the size of a person in a dataframe like this one:

我正在尝试根据数据框中人的大小来推断分类，如下所示：

      Size
1     80000
2     8000000
3     8000000000
...

I want it to look like this:

我希望它看起来像这样：

      Size        Classification
1     80000       <1m
2     8000000     1-10m
3     8000000000  >1bi
...

I understand that the ideal process would be to apply a lambda function like this:

我知道理想的过程是应用这样的 lambda 函数：

df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else "1-10m" if 1000000<x<10000000 else ...)

I checked a few posts regarding multiple ifs in a lambda function, here is an example link, but that synthax is not working for me for some reason in a multiple ifs statement, but it was working in a single if condition.

我检查了一些关于 lambda 函数中多个 ifs 的帖子，这里是一个示例链接，但是由于某种原因，在多个 ifs 语句中 synthax 对我不起作用，但它在单个 if 条件下工作。

So I tried this "very elegant" solution:

所以我尝试了这个“非常优雅”的解决方案：

df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "1-10m" if 1000000 < x < 10000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "10-50m" if 10000000 < x < 50000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "50-100m" if 50000000 < x < 100000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "100-500m" if 100000000 < x < 500000000 else pass)
df['Classification']=df['Size'].apply(lambda x: "500m-1bi" if 500000000 < x < 1000000000 else pass)
df['Classification']=df['Size'].apply(lambda x: ">1bi" if 1000000000 < x else pass)

Works out that "pass" seems not to apply to lambda functions as well:

计算出“pass”似乎也不适用于 lambda 函数：

df['Classification']=df['Size'].apply(lambda x: "<1m" if x<1000000 else pass)
SyntaxError: invalid syntax

Any suggestions on the correct synthax for a multiple if statement inside a lambda function in an apply method in Pandas? Either multi-line or single line solutions work for me.

对于 Pandas 中的 apply 方法中 lambda 函数内的多个 if 语句的正确合成器有什么建议吗？多线或单线解决方案都适合我。

Answer 1

回答by Anton vBR

Here is a small example that you can build upon:

这是一个小示例，您可以在此基础上进行构建：

Basically, lambda x: x..is the short one-liner of a function. What apply really asks for is a function which you can easily recreate yourself.

基本上，lambda x: x..是一个函数的短单行。apply 真正需要的是一个您可以轻松地重新创建自己的功能。

import pandas as pd

# Recreate the dataframe
data = dict(Size=[80000,8000000,800000000])
df = pd.DataFrame(data)

# Create a function that returns desired values
# You only need to check upper bound as the next elif-statement will catch the value
def func(x):
    if x < 1e6:
        return "<1m"
    elif x < 1e7:
        return "1-10m"
    elif x < 5e7:
        return "10-50m"
    else:
        return 'N/A'
    # Add elif statements....

df['Classification'] = df['Size'].apply(func)

print(df)

Returns:

返回：

        Size Classification
0      80000            <1m
1    8000000          1-10m
2  800000000            N/A

Answer 2

回答by MaxU

You can use pd.cutfunction:

您可以使用pd.cut功能：

bins = [0, 1000000, 10000000, 50000000, ...]
labels = ['<1m','1-10m','10-50m', ...]

df['Classification'] = pd.cut(df['Size'], bins=bins, labels=labels)

Answer 3

回答by piRSquared

Using Numpy's searchsorted

使用 Numpy searchsorted

labels = np.array(['<1m', '1-10m', '10-50m', '>50m'])
bins = np.array([1E6, 1E7, 5E7])

# Using assign is my preference as it produces a copy of df with new column
df.assign(Classification=labels[bins.searchsorted(df['Size'].values)])

If you wanted to produce new column in existing dataframe

如果您想在现有数据框中生成新列

df['Classification'] = labels[bins.searchsorted(df['Size'].values)]

Some Explanation

一些解释

From Docs:np.searchsorted

从文档：np.searchsorted

Find indices where elements should be inserted to maintain order.
Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.

查找应插入元素以保持顺序的索引。
找到排序数组 a 中的索引，如果 v 中的相应元素插入在索引之前，则将保留 a 的顺序。

The labelsarray has a length greater than that of binsby one. Because when something is greater than the maximum value in bins, searchsortedreturns a -1. When we slice labelsthis grabs the last label.

该labels数组的长度比的长度大一bins。因为当某物大于中的最大值时bins，searchsorted返回 a -1。当我们切片时，labels它会抓取最后一个标签。

在带有多个 if 语句的 Pandas Lambda 函数中使用 Apply

提问by abutremutante

回答by Anton vBR

回答by MaxU

回答by piRSquared

相关推荐

最近更新

标签

在带有多个 if 语句的 Pandas Lambda 函数中使用 Apply

提问by abutremutante

回答by Anton vBR

回答by MaxU

回答by piRSquared

相关推荐

pandas 获取类型错误：尝试使用 idxmax() 时，此 dtype 不允许缩减操作“argmax”

pandas 与熊猫的时间序列相关性

pandas ValueError：无法使用多维键建立索引

Pandas 绘制计数器随时间累积的总和

相关推荐

最近更新

标签