使用 pandas dataframe.query() 选择列

Question

提问by javadba

The documentation on dataframe.query()is veryterse http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html. I was also unable to find examples of projections by web search.

在该文档dataframe.query()是非常简洁http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html。我也无法通过网络搜索找到预测的例子。

So I tried simply providing the column names: that gave a syntax error. Likewise for typing selectand then the column names. So .. how to do this?

所以我试着简单地提供列名：这给出了一个语法错误。同样用于键入select然后是列名。那么..如何做到这一点？

Answer 1

采纳答案by Max Power

After playing around with this for a while and reading through the source codefor DataFrame.query, I can't figure out a way to do it.

玩弄了一会儿，并通过阅读后的源代码的DataFrame.query，我不能想出一个办法做到这一点。

If it's not impossible, apparently it's at least strongly discouraged. When this question came up on github, prolific Pandas dev/maintainer jreback suggested using df.eval()for selecting columns and df.query()for filtering on rows.

如果这不是不可能的，显然它至少是强烈不鼓励的。当这个问题出现在 github 上时，多产的 Pandas dev/maintainer jreback建议使用df.eval()fordf.query()selection columns 和 filter on rows。

UPDATE:

更新：

javadba points out that the return value of evalis not a dataframe. For example, to flesh out jreback's example a bit more...

javadba 指出的返回值eval不是数据帧。例如，更多地充实 jreback 的例子......

df.eval('A')

returns a Pandas Series, but

返回一个 Pandas 系列，但是

df.eval(['A', 'B'])

does not return at DataFrame, it returns a list (of Pandas Series).

不返回 DataFrame，它返回一个列表（Pandas 系列）。

So it seems ultimately the best way to maintain flexibility to filter on rows and columns is to use iloc/loc, e.g.

因此，似乎最终保持对行和列进行过滤的灵活性的最佳方法是使用iloc/ loc，例如

df.loc[0:4, ['A', 'C']]

output

输出

          A         C
0 -0.497163 -0.046484
1  1.331614  0.741711
2  1.046903 -2.511548
3  0.314644 -0.526187
4 -0.061883 -0.615978

Answer 2

回答by Scott Boston

Dataframe.queryis more like the where clausein a SQL statement than the select part.

Dataframe.query更像是 SQL 语句中的where 子句，而不是select 部分。

import pandas as pd
import numpy as np
np.random.seed(123)
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])

To select a column or columns you can use the following:

要选择一列或多列，您可以使用以下命令：

df['A'] or df.loc[:,'A']

or

或者

df[['A','B']] or df.loc[:,['A','B']]

To use the .querymethod you do something like

要使用该.query方法，您可以执行以下操作

df.query('A > B')which would return all the rows where the value in column A is greater than the value in column b.

df.query('A > B')这将返回 A 列中的值大于 b 列中的值的所有行。

                   A         B         C         D
2000-01-03  1.265936 -0.866740 -0.678886 -0.094709
2000-01-04  1.491390 -0.638902 -0.443982 -0.434351
2000-01-05  2.205930  2.186786  1.004054  0.386186
2000-01-08 -0.140069 -0.861755 -0.255619 -2.798589

Which is more readable in my opinion that boolean index selection with

在我看来，布尔索引选择与

df[df['A'] > df['B']]

Answer 3

回答by javadba

pandasql

PandasSQL

https://pypi.python.org/pypi/pandasql/0.1.0

Here is an example from the following blog http://blog.yhat.com/posts/pandasql-sql-for-pandas-dataframes.html. The inputs are two DataFrames meatand births: and this approach gives the projections, filtering, aggregationand sortingexpected from sql.

以下是来自以下博客http://blog.yhat.com/posts/pandasql-sql-for-pandas-dataframes.html的示例。输入两个DataFrame小号 meat和births：这种方法使projections，filtering，aggregation和sorting从SQL预期。

@maxpower did mention this package is buggy: so let's see.. At least the code from the blog and shown below works fine.

@maxpower 确实提到这个包有问题：所以让我们看看.. 至少来自博客的代码和下面显示的代码工作正常。

pysqldf = lambda q: sqldf(q, globals())

q  = """
SELECT
  m.date
  , m.beef
  , b.births
FROM
  meat m
LEFT JOIN
  births b
    ON m.date = b.date
WHERE
    m.date > '1974-12-31';
"""

meat = load_meat()
births = load_births()

df = pysqldf(q)

The output is a pandas DataFrameas desired.

输出是DataFrame所需的Pandas。

It is working great for my particular use case (evaluating us crimes)

它非常适合我的特定用例（评估我们的犯罪）

odf = pysqldf("select %s from df where sweapons > 10 order by sweapons desc limit 10" %scols)
p('odf\n', odf)

 odf
:    SMURDER  SRAPE  SROBBERY  SAGASSLT  SOTHASLT  SVANDLSM  SWEAPONS
0        0      0         0         1         1        10        54
1        0      0         0         0         1         0        52
2        0      0         0         0         1         0        46
3        0      0         0         0         1         0        43
4        0      0         0         0         1         0        33
5        1      0         2        16        28         4        32
6        0      0         0         7        17         4        30
7        0      0         0         0         1         0        29
8        0      0         0         7        16         3        29
9        0      0         0         1         0         5        28

UpdateI have done a bunch of stuff with pandasqlnow: calculated fields, limits, aliases, cascaded dataframes.. it is just soproductive.

更新我现在做了很多事情pandasql：计算字段、限制、别名、级联数据帧......它是如此高效。

Answer 4

回答by gonkan

How about

怎么样

df_new = df.query('col1==1 & col2=="x" ')[['col1', 'col3']]

Would filter rows where col1 equals 1 and col2 equals "X" and return only columns 1 and 3.

将过滤 col1 等于 1 且 col2 等于“X”的行并仅返回第 1 列和第 3 列。

but you would need to filter for rows otherwise it doesn't work.

但您需要过滤行，否则它不起作用。

for filtering columns only better use .locor .iloc

过滤列只能更好地使用.loc或.iloc

使用 pandas dataframe.query() 选择列

提问by javadba

采纳答案by Max Power

回答by Scott Boston

回答by javadba

pandasql

PandasSQL

回答by gonkan

相关推荐

最近更新

标签

使用 pandas dataframe.query() 选择列

提问by javadba

采纳答案by Max Power

回答by Scott Boston

回答by javadba

pandasql

PandasSQL

回答by gonkan

相关推荐

Pandas DataFrame 条形图 - 从特定颜色图中绘制不同颜色的条形图

pandas 在使用熊猫绘图方法创建的图表上格式化 x 轴

pandas 如何将字符串标签转换为数值

pandas 在熊猫数据框中插入值

相关推荐

最近更新

标签