Python pandas -> 按列名中的条件选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43218364/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas -> select by condition in columns name
提问by CezarySzulc
I have df with column names: 'a', 'b', 'c' ... 'z'.
我有 df 列名:'a', 'b', 'c' ... 'z'。
print(my_df.columns)
Index(['a', 'b', 'c', ... 'y', 'z'],
dtype='object', name=0)
I have function which determine which columns should be displayed. For example:
我有确定应显示哪些列的功能。例如:
start = con_start()
stop = con_stop()
print(my_df.columns >= start) & (my_df <= stop)
My result is:
我的结果是:
[False False ... False False False False True True
True True False False]
My goal is display dataframe only with columns that satisfy my condition. If start = 'a' and stop = 'b', I want to have:
我的目标是仅使用满足我的条件的列显示数据框。如果开始 = 'a' 和停止 = 'b',我想要:
0 a b
index1 index2
New York New York 0.000000 0.000000
California Los Angeles 207066.666667 214466.666667
Illinois Chicago 138400.000000 143633.333333
Pennsylvania Philadelphia 53000.000000 53633.333333
Arizona Phoenix 111833.333333 114366.666667
采纳答案by piRSquared
I want to make this robust and with as few assumptions as possible.
我想让它变得健壮,并尽可能少做假设。
option 1
use iloc
with array slicing
Assumptions:
选项 1与数组切片一起
使用假设:iloc
my_df.columns.is_unique
evaluates toTrue
- columns are already in order
my_df.columns.is_unique
评估为True
- 列已经排序
start = df.columns.get_loc(con_start())
stop = df.columns.get_loc(con_stop())
df.iloc[:, start:stop + 1]
option 2
use loc
with boolean slicing
Assumptions:
选项 2与布尔切片一起
使用假设:loc
- column values are comparable
- 列值具有可比性
start = con_start()
stop = con_stop()
c = df.columns.values
m = (start <= c) & (stop >= c)
df.loc[:, m]
回答by Scott Boston
You can use slicing to achieve this with .loc:
您可以使用切片来通过 .loc 实现此目的:
df.loc[:,'a':'b']
回答by Petr Matuska
If your conditions are on a similar level of complexity as you shown in your example there is no need to use any additional function but just do filtering e.g.
如果您的条件与示例中所示的复杂程度相似,则无需使用任何其他功能,只需进行过滤即可
sweet_and_red_fruit = fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")]
print(sweet_and_red_fruit)
OR if you want to just print
或者,如果您只想打印
print(fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")])
回答by acidtobi
Generate a list of colums to display:
生成要显示的列列表:
cols = [x for x in my_df.columns if start <= x <= stop]
Use only these columns in your DataFrame:
在您的 DataFrame 中仅使用这些列:
my_df[cols]
回答by Binyamin Even
assuming result
is your [true/false]
array and that letters
is [a...z]
:
假设result
是你的[true/false]
数组,那letters
就是[a...z]
:
res=[letters[i] for i,r in enumerate(result) if r]
new_df=df[res]