Python 获取列名,其中值是熊猫数据框中的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14734695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:16:45  来源:igfitidea点击:

Get column name where value is something in pandas dataframe

pythondataframepandas

提问by leroygr

I'm trying to find, at each timestamp, the column name in a dataframe for which the value matches with the one in a timeseries at the same timestamp.

我试图在每个时间戳处查找数据框中的列名,其值与同一时间戳的时间序列中的值相匹配。

Here is my dataframe:

这是我的数据框:

>>> df
                            col5        col4        col3        col2        col1
1979-01-01 00:00:00  1181.220328  912.154923  648.848635  390.986156  138.185861
1979-01-01 06:00:00  1190.724461  920.767974  657.099560  399.395338  147.761352
1979-01-01 12:00:00  1193.414510  918.121482  648.558837  384.632475  126.254342
1979-01-01 18:00:00  1171.670276  897.585930  629.201469  366.652033  109.545607
1979-01-02 00:00:00  1168.892579  900.375126  638.377583  382.584568  132.998706

>>> df.to_dict()
{'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}}

And the time series with values I want to match at each timestamp:

以及我想在每个时间戳匹配的值的时间序列:

>>> ts
1979-01-01 00:00:00    1181.220328
1979-01-01 06:00:00    657.099560
1979-01-01 12:00:00    126.254342
1979-01-01 18:00:00    109.545607
Freq: 6H

>>> ts.to_dict()
{<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}

Then the result would be:

那么结果将是:

>>> df_result
                             value  Column
1979-01-01 00:00:00    1181.220328  col5
1979-01-01 06:00:00    657.099560   col3
1979-01-01 12:00:00    126.254342   col1
1979-01-01 18:00:00    109.545607   col1

I hope my question is clear enough. Anyone has an idea how to get df_result?

我希望我的问题足够清楚。有人知道如何获得 df_result 吗?

Thanks

谢谢

Greg

格雷格

采纳答案by Andy Hayden

Here is one, perhaps inelegant, way to do it:

这是一种可能不优雅的方法:

df_result = pd.DataFrame(ts, columns=['value'])

Set up a function which grabs the column name which contains the value (from ts):

设置一个函数来获取包含值的列名(来自ts):

def get_col_name(row):    
    b = (df.ix[row.name] == row['value'])
    return b.index[b.argmax()]

for each row, test which elements equal the value, and extract column name of a True.

对于每一行,测试哪些元素等于值,并提取 True 的列名。

And applyit (row-wise):

而且apply它(按行):

In [3]: df_result.apply(get_col_name, axis=1)
Out[3]: 
1979-01-01 00:00:00    col5
1979-01-01 06:00:00    col3
1979-01-01 12:00:00    col1
1979-01-01 18:00:00    col1

i.e. use df_result['Column'] = df_result.apply(get_col_name, axis=1).

即使用 df_result['Column'] = df_result.apply(get_col_name, axis=1)

.

.

Note: there is quite a lot going on in get_col_nameso perhaps it warrants some further explanation:

注意:有很多事情发生,get_col_name所以也许需要进一步解释:

In [4]: row = df_result.irow(0) # an example row to pass to get_col_name

In [5]: row
Out[5]: 
value    1181.220328
Name: 1979-01-01 00:00:00

In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>

In [7]: df.ix[row.name]
Out[7]: 
col5    1181.220328
col4     912.154923
col3     648.848635
col2     390.986156
col1     138.185861
Name: 1979-01-01 00:00:00

In [8]: b = (df.ix[row.name] == row['value'])
        #checks whether each elements equal row['value'] = 1181.220328  

In [9]: b
Out[9]: 
col5     True
col4    False
col3    False
col2    False
col1    False
Name: 1979-01-01 00:00:00

In [10]: b.argmax() # index of a True value
Out[10]: 0

In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'

It might be there is more efficient way to do this...

可能有更有效的方法来做到这一点......

回答by Mike

Following on from Andy's detailed answer, the solution to selecting the column name of the highest value per row can be simplified to a single line:

继安迪的详细回答之后,选择每行最高值的列名的解决方案可以简化为一行:

df['column'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)