Python pandas -> 按列名中的条件选择

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43218364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:20:38  来源:igfitidea点击:

Python pandas -> select by condition in columns name

pythonpython-3.xpandasdata-science

提问by CezarySzulc

I have df with column names: 'a', 'b', 'c' ... 'z'.

我有 df 列名:'a', 'b', 'c' ... 'z'。

print(my_df.columns)
Index(['a', 'b', 'c', ... 'y', 'z'],
  dtype='object', name=0)

I have function which determine which columns should be displayed. For example:

我有确定应显示哪些列的功能。例如:

start = con_start()
stop = con_stop()
print(my_df.columns >= start) & (my_df <= stop)

My result is:

我的结果是:

[False False ... False False False False  True  True
True  True False False]

My goal is display dataframe only with columns that satisfy my condition. If start = 'a' and stop = 'b', I want to have:

我的目标是仅使用满足我的条件的列显示数据框。如果开始 = 'a' 和停止 = 'b',我想要:

0                                      a              b         
index1       index2                                                  
New York     New York           0.000000       0.000000          
California   Los Angeles   207066.666667  214466.666667     
Illinois     Chicago       138400.000000  143633.333333     
Pennsylvania Philadelphia   53000.000000   53633.333333      
Arizona      Phoenix       111833.333333  114366.666667 

采纳答案by piRSquared

I want to make this robust and with as few assumptions as possible.

我想让它变得健壮,并尽可能少做假设。

option 1
use ilocwith array slicing
Assumptions:

选项 1与数组切片一起
使用假设:iloc

  • my_df.columns.is_uniqueevaluates to True
  • columns are already in order
  • my_df.columns.is_unique评估为 True
  • 列已经排序


start = df.columns.get_loc(con_start())
stop = df.columns.get_loc(con_stop())

df.iloc[:, start:stop + 1]

option 2
use locwith boolean slicing
Assumptions:

选项 2与布尔切片一起
使用假设:loc

  • column values are comparable
  • 列值具有可比性


start = con_start()
stop = con_stop()

c = df.columns.values
m = (start <= c) & (stop >= c)

df.loc[:, m]

回答by Scott Boston

You can use slicing to achieve this with .loc:

您可以使用切片来通过 .loc 实现此目的:

 df.loc[:,'a':'b']

回答by Petr Matuska

If your conditions are on a similar level of complexity as you shown in your example there is no need to use any additional function but just do filtering e.g.

如果您的条件与示例中所示的复杂程度相似,则无需使用任何其他功能,只需进行过滤即可

sweet_and_red_fruit = fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")]
print(sweet_and_red_fruit)

OR if you want to just print

或者,如果您只想打印

print(fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")])

回答by acidtobi

Generate a list of colums to display:

生成要显示的列列表:

cols = [x for x in my_df.columns if start <= x <= stop]

Use only these columns in your DataFrame:

在您的 DataFrame 中仅使用这些列:

my_df[cols]

回答by Binyamin Even

assuming resultis your [true/false]array and that lettersis [a...z]:

假设result是你的[true/false]数组,那letters就是[a...z]

res=[letters[i] for i,r in enumerate(result) if r]
new_df=df[res]