pandas Python 错误无法从空轴执行非空取值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45138917/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:01:16  来源:igfitidea点击:

Python error cannot do a non empty take from an empty axes

pythonpython-2.7pandas

提问by ELI

I have a pandas dataframe with more than 400 thousands rows and now I want to calculate the interquartile range for each row but my code produced the following errors:

我有一个超过 40 万行的 Pandas 数据框,现在我想计算每一行的四分位距,但我的代码产生了以下错误:

cannot do a non empty take from an empty axes

不能从空轴进行非空取

My code:

我的代码:

def calIQR(x):
    x=x.dropna()
    return (np.percentile(x,75),np.percentile(x,25))

df["count"]=df.iloc[:,2:64].apply(calIQR,axis=1)

I am running Python 2.7.13

我正在运行 Python 2.7.13

I searched online but still had no idea why this error occurred.

我在网上搜索,但仍然不知道为什么会发生此错误。

The 2 to 64 columns of dataset basically look like that: dataset

数据集的 2 到 64 列基本上是这样的: 数据集

In each row, there are some NaN values, but I am sure that there is no row will all NaN.

在每一行中,都有一些 NaN 值,但我确信没有一行将全部为 NaN。

采纳答案by jezrael

I think here is problem row has all NaNs values in 2to 63columns and x = x.dropnareturn empty Series.

我觉得这里是问题行有所有NaN的价值观2,以63x = x.dropna空车返回Series

So need add dropnaafter iloc:

所以需要在dropna后面添加iloc

np.random.seed(100)
df = pd.DataFrame(np.random.random((5,5)))
df.loc[3, [3,4]] = np.nan
df.loc[2] = np.nan
print (df)
         0         1         2         3         4
0  0.543405  0.278369  0.424518  0.844776  0.004719
1  0.121569  0.670749  0.825853  0.136707  0.575093
2       NaN       NaN       NaN       NaN       NaN
3  0.978624  0.811683  0.171941       NaN       NaN
4  0.431704  0.940030  0.817649  0.336112  0.175410


def calIQR(x):
    x = x.dropna()
    return (np.percentile(x,75),np.percentile(x,25))

df["count"]=df.iloc[:,2:4].dropna(how='all').apply(calIQR,axis=1)
print (df)
          0         1         2         3         4  \
0  0.543405  0.278369  0.424518  0.844776  0.004719   
1  0.121569  0.670749  0.825853  0.136707  0.575093   
2       NaN       NaN       NaN       NaN       NaN   
3  0.978624  0.811683  0.171941       NaN       NaN   
4  0.431704  0.940030  0.817649  0.336112  0.175410   

                              count  
0  (0.739711496927, 0.529582226142)  
1    (0.65356621375, 0.30899313104)  
2                               NaN  
3  (0.171941012733, 0.171941012733)  
4  (0.697265021613, 0.456496307285)  

Or use Series.quantile:

或使用Series.quantile

 def calIQR(x):
    return (x.quantile(.75),x.quantile(.25))

#with real data change 2;4 to 2:64
df["count"]=df.iloc[:,2:4].apply(calIQR,axis=1)
print (df)
          0         1         2         3         4  \
0  0.543405  0.278369  0.424518  0.844776  0.004719   
1  0.121569  0.670749  0.825853  0.136707  0.575093   
2       NaN       NaN       NaN       NaN       NaN   
3  0.978624  0.811683  0.171941       NaN       NaN   
4  0.431704  0.940030  0.817649  0.336112  0.175410   

                                       count  
0   (0.7397114969272109, 0.5295822261418257)  
1    (0.653566213750024, 0.3089931310399766)  
2                                 (nan, nan)  
3   (0.1719410127325942, 0.1719410127325942)  
4  (0.6972650216127702, 0.45649630728485585)