pandas 使用for循环在范围之间过滤数据框的列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41281442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter columns of a dataframe between a range using a for loop?
提问by Dheeraj
I have a DataFrame like this:
我有一个像这样的数据帧:
+----------------------------------------------------------------------------------+
| Total_Production Utilization_rate Avg_Count |
+----------------------------------------------------------------------------------+
| 0 6.503907 96.027778 26.194017 |
| 9 6.171308 95.638889 31.500943 |
| 18 6.144897 95.986111 27.494776 |
| 27 6.056882 95.916667 27.525495 |
| 36 6.107343 105.541667 21.500208 |
| 45 2.139576 96.166667 27.480307 |
| 54 6.161222 96.486111 27.498256 |
| 63 1.034555 56.388889 27.568885 |
| 72 5.021524 91.069444 30.931702 |
| 81 5.831919 96.277778 28.284872 |
| 90 2.689860 62.486111 18.691440 |
| 99 5.227672 95.555556 31.441761 |
| 108 1.465271 95.541667 30.064098 |
+----------------------------------------------------------------------------------+
The range is in two series. Highest Range: Total Production 7.744379 Utilization rate 104.534796 Avg Count 29.691733
该系列分为两个系列。最高范围:总产量 7.744379 利用率 104.534796 平均计数 29.691733
Lowest Range: Total Production 3.880623 Utilization rate 64.315015 Avg Count 22.652148
最低范围:总产量 3.880623 利用率 64.315015 平均计数 22.652148
What is the best possible way to filter out data of columns? Can i do it using a for loop by iterating rows?
过滤列数据的最佳方法是什么?我可以通过迭代行来使用 for 循环吗?
回答by musically_ut
You can use the &
operatorto limit the ranges of individual columns:
您可以使用&
运算符来限制单个列的范围:
df[
(3.880623 < df['Total_Production']) & (df['Total_Production'] < 7.744379) &
(64.315015 < df['Utilization_rate']) & (df['Utilization_rate'] < 104.534796) &
(22.652148 < df['Avg_Count']) & (df['Avg_Count'] < 29.691733)
]
回答by Zero
You could use query
你可以用 query
In [233]: df.query('3.880623 < Total_Production < 7.744379 and 64.315015 < Utiliza
...: tion_rate < 104.534796 and 22.652148 < Avg_Count < 29.691733')
Out[233]:
Total_Production Utilization_rate Avg_Count
0 6.503907 96.027778 26.194017
18 6.144897 95.986111 27.494776
27 6.056882 95.916667 27.525495
54 6.161222 96.486111 27.498256
81 5.831919 96.277778 28.284872
回答by MYGz
def foo():
df[
(3.880623 < df['Total_Production']) & (df['Total_Production'] < 7.744379) &
(64.315015 < df['Utilization_rate']) & (df['Utilization_rate'] < 104.534796) &
(22.652148 < df['Avg_Count']) & (df['Avg_Count'] < 29.691733) ]
def foo1():
df[df.Total_Production.between(left=3.880623, right=7.744379) &
df.Utilization_rate.between(left=64.315015, right=104.534796) &
df.Avg_Count.between(left=22.652148, right=29.691733)]
def foo2():
df.query("3.880623 < Total_Production < 7.744379 and 64.315015 < Utilization_rate < 104.534796\
and 22.652148 < Avg_Count < 29.691733")
%timeit foo()
%timeit foo1()
%timeit foo2()
Output:
输出:
100 loops, best of 3: 2.95 ms per loop
100 loops, best of 3: 2.92 ms per loop
100 loops, best of 3: 3.67 ms per loop