pandas 在熊猫数据框中查找目标值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32037802/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find target values in pandas dataframe
提问by Nic
I have a multilevel dataframe df. As columns, I have different "objects"
I analyze. As rows index , I have a Case ID lc, and time t.
我有一个多级数据框df。作为专栏,我分析了不同的“对象”。作为行索引,我有一个案例 IDlc和时间t。
I need to find, for each case lc, the time t(ideally interpolated, but
closest value is fine enough) at which each object reached a target value.
对于每种情况lc,我需要找到t每个对象达到目标值的时间(理想情况下是内插的,但最接近的值就足够了)。
This target value is a function of the given object at time t==0.
该目标值是时间给定对象的函数t==0。
import pandas as pd
print(pd.__version__)
0.16.2
Dummy data set example:
虚拟数据集示例:
data = {1: {(1014, 0.0): 20.25,
(1014, 0.0991): 19.08,
(1014, 0.1991): 18.43,
(1014, 0.2991): 19.03,
(1014, 0.3991): 18.71,
(1015, 0.0): 20.22,
(1015, 0.0991): 19.3,
(1015, 0.1991): 18.68,
(1015, 0.2991): 18.22,
(1015, 0.3991): 17.84,
(1016, 0.0): 21.75,
(1016, 0.0991): 19.97,
(1016, 0.1991): 19.65,
(1016, 0.2991): 19.29,
(1016, 0.3991): 18.94
},
2: {(1014, 0.0): 29.11,
(1014, 0.0991): 28.68,
(1014, 0.1991): 28.27,
(1014, 0.2991): 27.46,
(1014, 0.3991): 26.96,
(1015, 0.0): 29.22,
(1015, 0.0991): 28.64,
(1015, 0.1991): 28.18,
(1015, 0.2991): 27.74,
(1015, 0.3991): 27.25,
(1016, 0.0): 29.17,
(1016, 0.0991): 28.68,
(1016, 0.1991): 28.17,
(1016, 0.2991): 27.68,
(1016, 0.3991): 27.18
},
3: {(1014, 0.0): 22.01,
(1014, 0.0991): 21.5,
(1014, 0.1991): 21.18,
(1014, 0.2991): 20.58,
(1014, 0.3991): 20.21,
(1015, 0.0): 21.81,
(1015, 0.0991): 21.46,
(1015, 0.1991): 21.11,
(1015, 0.2991): 20.78,
(1015, 0.3991): 20.42,
(1016, 0.0): 21.82,
(1016, 0.0991): 21.49,
(1016, 0.1991): 21.11,
(1016, 0.2991): 20.75,
(1016, 0.3991): 20.37
}}
df = pd.DataFrame(data).sort()
df.index.names=['case', 't']
Dataframe looks thus like:
数据框看起来像:
1 2 3
case t
1014 0.0000 20.25 29.11 22.01
0.0991 19.08 28.68 21.50
0.1991 18.43 28.27 21.18
0.2991 19.03 27.46 20.58
0.3991 18.71 26.96 20.21
1015 0.0000 20.22 29.22 21.81
0.0991 19.30 28.64 21.46
0.1991 18.68 28.18 21.11
0.2991 18.22 27.74 20.78
0.3991 17.84 27.25 20.42
1016 0.0000 21.75 29.17 21.82
0.0991 19.97 28.68 21.49
0.1991 19.65 28.17 21.11
0.2991 19.29 27.68 20.75
0.3991 18.94 27.18 20.37
Target values are a function of the values at time t==0.
typically, this would be k=0.5 for half-time period. For the current sample,we will take k=0.926
目标值是时间值的函数t==0。通常,对于半场时间,这将是 k=0.5。对于当前样本,我们取 k=0.926
Since values are sorted, it is ok to take the first lines for each case.
由于值已排序,因此可以为每种情况取第一行。
targets = df.groupby(level='case').first() * 0.926
print(targets)
1 2 3
case
1014 18.75150 26.95586 20.38126
1015 18.72372 27.05772 20.19606
1016 20.14050 27.01142 20.20532
Now, How could I simply build the following dataframe, which shows
time tat wich each object reach target value calculated above?
现在,我怎么能简单地构建以下数据框,它显示t每个对象达到上面计算的目标值的时间?
1 2 3
case
1014 0.3991 0.3991 0.2991
1015 0.1991 0.3991 0.3991
1016 0.0991 0.3991 0.3991
采纳答案by CT Zhu
These are somewhat of a hack, let's see if there are better solutions:
这些有点hack,让我们看看是否有更好的解决方案:
In [36]:
targets['t']=0
In [37]:
df2 = df.reset_index().set_index('case') - targets
In [38]:
df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs()))
In [39]:
df4 = pd.DataFrame({'1': df2.t[df3[1]],
'2': df2.t[df3[2]],
'3': df2.t[df3[3]]})
print df4
1 2 3
case
1014 0.3991 0.3991 0.3991
1015 0.1991 0.3991 0.3991
1016 0.0991 0.3991 0.3991

