Python 替换熊猫数据框中大于数字的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43757977/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:23:11  来源:igfitidea点击:

Replacing values greater than a number in pandas dataframe

pythondatabasepandas

提问by Zanam

I have a large dataframe which looks as:

我有一个大数据框,它看起来像:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

I want to replace each element greater than 9 with 11.

我想用 11 替换大于 9 的每个元素。

So, the desired output for above example is:

因此,上述示例所需的输出是:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Edit:

编辑:

My actual dataframe has about 20,000 rows and each row has list of size 2000.

我的实际数据框有大约 20,000 行,每行都有大小为 2000 的列表。

Is there a way to use numpy.minimumfunction for each row? I assume that it will be faster than list comprehensionmethod?

有没有办法numpy.minimum为每一行使用函数?我认为它会比list comprehension方法更快?

采纳答案by jezrael

You can use applywith list comprehension:

你可以用applylist comprehension

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Faster solution is first convert to numpy arrayand then use numpy.where:

更快的解决方案是首先转换为numpy array然后使用numpy.where

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

回答by Edouard Cuny

Very simply : df[df > 9] = 11

很简单: df[df > 9] = 11

回答by D.Griffiths

You can use numpy indexing, accessed through the .valuesfunction.

您可以使用 numpy 索引,通过.values函数访问。

df['col'].values[df['col'].values > x] = y

df['col'].values[df['col'].values > x] = y

where you are replacing any value greater than x with the value of y.

用 y 的值替换任何大于 x 的值。

So for the example in the question:

因此,对于问题中的示例:

df1['A'].values[df1['A'] > 9] = 11

df1['A'].values[df1['A'] > 9] = 11

回答by CFW

I came for a solution to replacing each element larger than h by 1 else 0, which has the simple solution:

我来找一个解决方案,用 1 else 0 替换每个大于 h 的元素,它有一个简单的解决方案:

df = (df > h) * 1

(This does not solve the OP's question as all df <= h are replaced by 0.)

(这不能解决 OP 的问题,因为所有 df <= h 都被 0 替换。)