pandas 熊猫：填充组内的缺失值

Question

提问by Marius

I have some data from an experiment, and within each trial there are some single values, surrounded by NA's, that I want to fill out to the entire trial:

我有一些来自实验的数据，在每个试验中，有一些单独的值，用NA's包围，我想填写到整个试验中：

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], 
    'cs_name': [np.nan, 'A1', np.nan, np.nan, np.nan, np.nan, 'B2', 
                np.nan, 'A1', np.nan, np.nan, np.nan]})
Out[177]: 
   cs_name  trial
0      NaN      1
1       A1      1
2      NaN      1
3      NaN      1
4      NaN      2
5      NaN      2
6       B2      2
7      NaN      2
8       A1      3
9      NaN      3
10     NaN      3
11     NaN      3

I'm able to fill these values within the whole trial by using both bfill()and ffill(), but I'm wondering if there is a better way to achieve this.

我可以通过同时使用bfill()和来在整个试验中填充这些值ffill()，但我想知道是否有更好的方法来实现这一点。

df['cs_name'] = df.groupby('trial')['cs_name'].ffill()
df['cs_name'] = df.groupby('trial')['cs_name'].bfill()

Expected output:

预期输出：

   cs_name  trial
0       A1      1
1       A1      1
2       A1      1
3       A1      1
4       B2      2
5       B2      2
6       B2      2
7       B2      2
8       A1      3
9       A1      3
10      A1      3
11      A1      3

Answer 1

回答by Andy Hayden

An alternative approach is to use first_valid_indexand a transform:

另一种方法是使用first_valid_index和 a transform：

In [11]: g = df.groupby('trial')

In [12]: g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])
Out[12]: 
0     A1
1     A1
2     A1
3     A1
4     B2
5     B2
6     B2
7     B2
8     A1
9     A1
10    A1
11    A1
Name: cs_name, dtype: object

This ought to be more efficient then using ffill followed by a bfill...

这应该比使用 ffill 后跟 bfill 更有效......

And use this to change the cs_namecolumn:

并使用它来更改cs_name列：

df['cs_name'] = g['cs_name'].transform(lambda s: s.loc[s.first_valid_index()])

Note: I think it would be nice enhancement to have a method to grab the first non-null object in the pandas, in numpy it's an open request, I don't think there is currently a method (I could be wrong!)...

注意：我认为有一个方法来获取 Pandas 中的第一个非空对象会是一个很好的增强，在 numpy 中它是一个开放请求，我认为目前没有一个方法（我可能是错的！）。 ..

Answer 2

回答by Federico De Cillia

If you want to avoid the error that appears when some groups contain only NaN you could do the following (Note that I changed the df so there are only Nan for the group having trial=1):

如果您想避免在某些组仅包含 NaN 时出现的错误，您可以执行以下操作（请注意，我更改了 df，因此 Trial=1 的组只有 Nan）：

df = pd.DataFrame({'trial': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3,1,1], 
'cs_name': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 'B2', np.nan, 
'A3', np.nan, np.nan, np.nan, np.nan,np.nan]})

g = data.groupby('trial')

g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])

df['cs_name'] = g['cs_name'].transform(lambda s: 'No values to aggregate' if 
    pd.isnull(s).all() == True else s.loc[s.first_valid_index()])`

This way you input 'No Values to aggregate' (or whatever you want) when the program finds all NaN for a particular group, instead of an error.

这样，当程序找到特定组的所有 NaN 时，您可以输入“没有要聚合的值”（或您想要的任何值），而不是错误。

Hope this helps :)

希望这可以帮助：）

Federico

费德里科

pandas 熊猫：填充组内的缺失值

提问by Marius

回答by Andy Hayden

回答by Federico De Cillia

相关推荐

最近更新

标签

pandas 熊猫：填充组内的缺失值

提问by Marius

回答by Andy Hayden

回答by Federico De Cillia

相关推荐

pandas 熊猫系列的 groupby 不起作用

Pandas：使用 groupby 和函数进行数据帧过滤

使用 python pandas 将 hh:mm:ss 转换为分钟

Pandas：DataFrame 中的 DataFrame

相关推荐

最近更新

标签