pandas 在熊猫数据框中随机插入 NA 的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39059032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Randomly insert NA's values in a pandas dataframe
提问by mitsi
How can I randomly insert np.nan
's in a DataFrame ?
Let's say I want 10% null values inside my DataFrame.
如何np.nan
在 DataFrame 中随机插入's ?假设我希望 DataFrame 中有 10% 的空值。
My data looks like this :
我的数据如下所示:
df = pd.DataFrame(np.random.randn(5, 3),
index=['a', 'b', 'c', 'd', 'e'],
columns=['one', 'two', 'three'])
one two three
a 0.695132 1.044791 -1.059536
b -1.075105 0.825776 1.899795
c -0.678980 0.051959 -0.691405
d -0.182928 1.455268 -1.032353
e 0.205094 0.714192 -0.938242
Is there an easy way to insert the null values?
有没有一种简单的方法来插入空值?
回答by Kodiologist
Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).
这是一种精确清除 10% 的单元格的方法(或者更确切地说,使用现有数据框的大小可以达到的接近 10%)。
import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
df.iat[row, col] = np.nan
Here's a way to clear cells independently with a per-cell probability of 10%.
这是一种独立清除单元格的方法,每个单元格的概率为 10%。
df = df.mask(np.random.random(df.shape) < .1)
回答by Jaroslav Bezděk
I think you can easily iterate over data frame columns and assign NaN
value to every cell produced by pandas.DataFrame.sample()
method.
我认为您可以轻松地遍历数据框列并将NaN
值分配给pandas.DataFrame.sample()
方法生成的每个单元格。
The code is following.
代码如下。
for col in df.columns:
df.loc[df.sample(frac=0.1).index, col] = pd.np.nan