Python 获取列名,其中值是熊猫数据框中的内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14734695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get column name where value is something in pandas dataframe
提问by leroygr
I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp.
我试图在每个时间戳处查找数据框中的列名,其值与同一时间戳的时间序列中的值相匹配。
Here is my dataframe:
这是我的数据框:
>>> df
col5 col4 col3 col2 col1
1979-01-01 00:00:00 1181.220328 912.154923 648.848635 390.986156 138.185861
1979-01-01 06:00:00 1190.724461 920.767974 657.099560 399.395338 147.761352
1979-01-01 12:00:00 1193.414510 918.121482 648.558837 384.632475 126.254342
1979-01-01 18:00:00 1171.670276 897.585930 629.201469 366.652033 109.545607
1979-01-02 00:00:00 1168.892579 900.375126 638.377583 382.584568 132.998706
>>> df.to_dict()
{'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}}
And the time series with values I want to match at each timestamp:
以及我想在每个时间戳匹配的值的时间序列:
>>> ts
1979-01-01 00:00:00 1181.220328
1979-01-01 06:00:00 657.099560
1979-01-01 12:00:00 126.254342
1979-01-01 18:00:00 109.545607
Freq: 6H
>>> ts.to_dict()
{<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}
Then the result would be:
那么结果将是:
>>> df_result
value Column
1979-01-01 00:00:00 1181.220328 col5
1979-01-01 06:00:00 657.099560 col3
1979-01-01 12:00:00 126.254342 col1
1979-01-01 18:00:00 109.545607 col1
I hope my question is clear enough. Anyone has an idea how to get df_result?
我希望我的问题足够清楚。有人知道如何获得 df_result 吗?
Thanks
谢谢
Greg
格雷格
采纳答案by Andy Hayden
Here is one, perhaps inelegant, way to do it:
这是一种可能不优雅的方法:
df_result = pd.DataFrame(ts, columns=['value'])
Set up a function which grabs the column name which contains the value (from ts):
设置一个函数来获取包含值的列名(来自ts):
def get_col_name(row):
b = (df.ix[row.name] == row['value'])
return b.index[b.argmax()]
for each row, test which elements equal the value, and extract column name of a True.
对于每一行,测试哪些元素等于值,并提取 True 的列名。
And applyit (row-wise):
而且apply它(按行):
In [3]: df_result.apply(get_col_name, axis=1)
Out[3]:
1979-01-01 00:00:00 col5
1979-01-01 06:00:00 col3
1979-01-01 12:00:00 col1
1979-01-01 18:00:00 col1
i.e. use df_result['Column'] = df_result.apply(get_col_name, axis=1).
即使用 df_result['Column'] = df_result.apply(get_col_name, axis=1)。
.
.
Note: there is quite a lot going on in get_col_nameso perhaps it warrants some further explanation:
注意:有很多事情发生,get_col_name所以也许需要进一步解释:
In [4]: row = df_result.irow(0) # an example row to pass to get_col_name
In [5]: row
Out[5]:
value 1181.220328
Name: 1979-01-01 00:00:00
In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>
In [7]: df.ix[row.name]
Out[7]:
col5 1181.220328
col4 912.154923
col3 648.848635
col2 390.986156
col1 138.185861
Name: 1979-01-01 00:00:00
In [8]: b = (df.ix[row.name] == row['value'])
#checks whether each elements equal row['value'] = 1181.220328
In [9]: b
Out[9]:
col5 True
col4 False
col3 False
col2 False
col1 False
Name: 1979-01-01 00:00:00
In [10]: b.argmax() # index of a True value
Out[10]: 0
In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'
It might be there is more efficient way to do this...
可能有更有效的方法来做到这一点......
回答by Mike
Following on from Andy's detailed answer, the solution to selecting the column name of the highest value per row can be simplified to a single line:
继安迪的详细回答之后,选择每行最高值的列名的解决方案可以简化为一行:
df['column'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)

