pandas 在熊猫数据框中随机插入 NA 的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39059032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:51:23  来源:igfitidea点击:

Randomly insert NA's values in a pandas dataframe

pythonpandasnumpymissing-data

提问by mitsi

How can I randomly insert np.nan's in a DataFrame ? Let's say I want 10% null values inside my DataFrame.

如何np.nan在 DataFrame 中随机插入's ?假设我希望 DataFrame 中有 10% 的空值。

My data looks like this :

我的数据如下所示:

df = pd.DataFrame(np.random.randn(5, 3), 
                  index=['a', 'b', 'c', 'd', 'e'],
                  columns=['one', 'two', 'three'])

        one       two     three
a  0.695132  1.044791 -1.059536
b -1.075105  0.825776  1.899795
c -0.678980  0.051959 -0.691405
d -0.182928  1.455268 -1.032353
e  0.205094  0.714192 -0.938242

Is there an easy way to insert the null values?

有没有一种简单的方法来插入空值?

回答by Kodiologist

Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).

这是一种精确清除 10% 的单元格的方法(或者更确切地说,使用现有数据框的大小可以达到的接近 10%)。

import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    df.iat[row, col] = np.nan

Here's a way to clear cells independently with a per-cell probability of 10%.

这是一种独立清除单元格的方法,每个单元格的概率为 10%。

df = df.mask(np.random.random(df.shape) < .1)

回答by Jaroslav Bezděk

I think you can easily iterate over data frame columns and assign NaNvalue to every cell produced by pandas.DataFrame.sample()method.

我认为您可以轻松地遍历数据框列并将NaN值分配给pandas.DataFrame.sample()方法生成的每个单元格。

The code is following.

代码如下。

for col in df.columns:
    df.loc[df.sample(frac=0.1).index, col] = pd.np.nan