pandas 在熊猫数据框中查找目标值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32037802/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:46:46  来源:igfitidea点击:

Find target values in pandas dataframe

pythonpandas

提问by Nic

I have a multilevel dataframe df. As columns, I have different "objects" I analyze. As rows index , I have a Case ID lc, and time t.

我有一个多级数据框df。作为专栏,我分析了不同的“对象”。作为行索引,我有一个案例 IDlc和时间t

I need to find, for each case lc, the time t(ideally interpolated, but closest value is fine enough) at which each object reached a target value.

对于每种情况lc,我需要找到t每个对象达到目标值的时间(理想情况下是内插的,但最接近的值就足够了)。

This target value is a function of the given object at time t==0.

该目标值是时间给定对象的函数t==0

import pandas as pd
print(pd.__version__)

0.16.2

Dummy data set example:

虚拟数据集示例:

data = {1: {(1014, 0.0): 20.25,
     (1014, 0.0991): 19.08,
     (1014, 0.1991): 18.43,
     (1014, 0.2991): 19.03,
     (1014, 0.3991): 18.71,
     (1015, 0.0): 20.22,
     (1015, 0.0991): 19.3,
     (1015, 0.1991): 18.68,
     (1015, 0.2991): 18.22,
     (1015, 0.3991): 17.84,
     (1016, 0.0): 21.75,
     (1016, 0.0991): 19.97,
     (1016, 0.1991): 19.65,
     (1016, 0.2991): 19.29,
     (1016, 0.3991): 18.94
    },
 2: {(1014, 0.0): 29.11,
     (1014, 0.0991): 28.68,
     (1014, 0.1991): 28.27,
     (1014, 0.2991): 27.46,
     (1014, 0.3991): 26.96,
     (1015, 0.0): 29.22,
     (1015, 0.0991): 28.64,
     (1015, 0.1991): 28.18,
     (1015, 0.2991): 27.74,
     (1015, 0.3991): 27.25,
     (1016, 0.0): 29.17,
     (1016, 0.0991): 28.68,
     (1016, 0.1991): 28.17,
     (1016, 0.2991): 27.68,
     (1016, 0.3991): 27.18
    },
 3: {(1014, 0.0): 22.01,
     (1014, 0.0991): 21.5,
     (1014, 0.1991): 21.18,
     (1014, 0.2991): 20.58,
     (1014, 0.3991): 20.21,
     (1015, 0.0): 21.81,
     (1015, 0.0991): 21.46,
     (1015, 0.1991): 21.11,
     (1015, 0.2991): 20.78,
     (1015, 0.3991): 20.42,
     (1016, 0.0): 21.82,
     (1016, 0.0991): 21.49,
     (1016, 0.1991): 21.11,
     (1016, 0.2991): 20.75,
     (1016, 0.3991): 20.37
    }}

df = pd.DataFrame(data).sort()
df.index.names=['case', 't']

Dataframe looks thus like:

数据框看起来像:

                 1      2      3
case t                          
1014 0.0000  20.25  29.11  22.01
     0.0991  19.08  28.68  21.50
     0.1991  18.43  28.27  21.18
     0.2991  19.03  27.46  20.58
     0.3991  18.71  26.96  20.21
1015 0.0000  20.22  29.22  21.81
     0.0991  19.30  28.64  21.46
     0.1991  18.68  28.18  21.11
     0.2991  18.22  27.74  20.78
     0.3991  17.84  27.25  20.42
1016 0.0000  21.75  29.17  21.82
     0.0991  19.97  28.68  21.49
     0.1991  19.65  28.17  21.11
     0.2991  19.29  27.68  20.75
     0.3991  18.94  27.18  20.37

Target values are a function of the values at time t==0. typically, this would be k=0.5 for half-time period. For the current sample,we will take k=0.926

目标值是时间值的函数t==0。通常,对于半场时间,这将是 k=0.5。对于当前样本,我们取 k=0.926

Since values are sorted, it is ok to take the first lines for each case.

由于值已排序,因此可以为每种情况取第一行。

targets = df.groupby(level='case').first() * 0.926
print(targets)

             1         2         3
case                              
1014  18.75150  26.95586  20.38126
1015  18.72372  27.05772  20.19606
1016  20.14050  27.01142  20.20532

Now, How could I simply build the following dataframe, which shows time tat wich each object reach target value calculated above?

现在,我怎么能简单地构建以下数据框,它显示t每个对象达到上面计算的目标值的时间?

             1         2         3
case                              
1014    0.3991    0.3991    0.2991
1015    0.1991    0.3991    0.3991
1016    0.0991    0.3991    0.3991

采纳答案by CT Zhu

These are somewhat of a hack, let's see if there are better solutions:

这些有点hack,让我们看看是否有更好的解决方案:

In [36]:
targets['t']=0

In [37]:
df2 = df.reset_index().set_index('case') - targets

In [38]:
df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs()))

In [39]:
df4 = pd.DataFrame({'1': df2.t[df3[1]],
                    '2': df2.t[df3[2]],
                    '3': df2.t[df3[3]]})

print df4

           1       2       3
case                        
1014  0.3991  0.3991  0.3991
1015  0.1991  0.3991  0.3991
1016  0.0991  0.3991  0.3991