使用 pd.eval() 在 Pandas 中进行动态表达式评估

Question

提问by cs95

Given two DataFrames

给定两个数据帧

np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

df1
   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
3  8  8  1  6
4  7  7  8  1

df2
   A  B  C  D
0  5  9  8  9
1  4  3  0  3
2  5  0  2  3
3  8  1  3  3
4  3  7  0  1

I would like to perform arithmetic on one or more columns using pd.eval. Specifically, I would like to port the following code:

我想使用pd.eval. 具体来说，我想移植以下代码：

x = 5
df2['D'] = df1['A'] + (df1['B'] * x)

...to code using eval. The reason for using evalis that I would like to automate many workflows, so creating them dynamically will be useful to me.

...使用eval. 使用的原因eval是我想自动化许多工作流程，因此动态创建它们对我很有用。

I am trying to better understand the engineand parserarguments to determine how best to solve my problem. I have gone through the documentationbut the difference was not made clear to me.

我试图更好地理解engine和parser论据，以确定如何最好地解决我的问题。我已经阅读了文档，但我并不清楚其中的区别。

What arguments should be used to ensure my code is working at max performance?
Is there a way to assign the result of the expression back to df2?
Also, to make things more complicated, how do I pass xas an argument inside the string expression?

应该使用哪些参数来确保我的代码以最高性能运行？
有没有办法将表达式的结果分配回df2？
另外，为了让事情变得更复杂，我如何x在字符串表达式中作为参数传递？

Answer 1

回答by cs95

This answer dives into the various features and functionality offered by pd.eval, df.query, and df.eval.

这个答案潜入各种特性和功能的提供pd.eval，df.query和df.eval。

Setup
Examples will involve these DataFrames (unless otherwise specified).

设置
示例将涉及这些数据帧（除非另有说明）。

np.random.seed(0)
df1 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df3 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))
df4 = pd.DataFrame(np.random.choice(10, (5, 4)), columns=list('ABCD'))

`pandas.eval`- The "Missing Manual"

`pandas.eval`- “丢失的手册”

Note
Of the three functions being discussed, pd.evalis the most important. df.evaland df.querycall pd.evalunder the hood. Behaviour and usage is more or less consistent across the three functions, with some minor semantic variations which will be highlighted later. This section will introduce functionality that is common across all the three functions - this includes, (but not limited to) allowed syntax, precedence rules, and keyword arguments.

注意
在所讨论的三个函数中，pd.eval是最重要的。df.eval并在引擎盖下df.query打电话 pd.eval。这三个函数的行为和用法或多或少是一致的，有一些细微的语义变化，稍后将重点介绍。本节将介绍所有三个函数共有的功能 - 这包括（但不限于）允许的语法、优先规则和关键字参数。

pd.evalcan evaluate arithmetic expressions which can consist of variables and/or literals. These expressions must be passed as strings. So, to answer the questionas stated, you can do

pd.eval可以计算由变量和/或文字组成的算术表达式。这些表达式必须作为字符串传递。因此，要回答上述问题，您可以这样做

x = 5
pd.eval("df1.A + (df1.B * x)")

Some things to note here:

这里需要注意的一些事项：

The entire expression is a string
df1, df2, and xrefer to variables in the global namespace, these are picked up by evalwhen parsing the expression
Specific columns are accessed using the attribute accessor index. You can also use "df1['A'] + (df1['B'] * x)"to the same effect.

整个表达式是一个字符串
df1, df2, 和x引用全局命名空间中的变量，这些eval在解析表达式时被拾取
使用属性访问器索引访问特定列。您也可以使用"df1['A'] + (df1['B'] * x)"到相同的效果。

I will be addressing the specific issue of reassignment in the section explaining the target=...attribute below. But for now, here are more simple examples of valid operations with pd.eval:

我将在解释以下target=...属性的部分中解决重新分配的具体问题。但是现在，这里有更简单的有效操作示例pd.eval：

pd.eval("df1.A + df2.A")   # Valid, returns a pd.Series object
pd.eval("abs(df1) ** .5")  # Valid, returns a pd.DataFrame object

...and so on. Conditional expressions are also supported in the same way. The statements below are all valid expressions and will be evaluated by the engine.

...等等。也以同样的方式支持条件表达式。下面的语句都是有效的表达式，将由引擎评估。

pd.eval("df1 > df2")        
pd.eval("df1 > 5")    
pd.eval("df1 < df2 and df3 < df4")      
pd.eval("df1 in [1, 2, 3]")
pd.eval("1 < 2 < 3")

A list detailing all the supported features and syntax can be found in the documentation. In summary,

可以在文档中找到详细说明所有支持的功能和语法的列表。总之，

Arithmetic operations except for the left shift (<<) and right shift (>>) operators, e.g., df + 2 * pi / s ** 4 % 42- the_golden_ratio
Comparison operations, including chained comparisons, e.g., 2 < df < df2
Boolean operations, e.g., df < df2 and df3 < df4or not df_boollistand tupleliterals, e.g., [1, 2]or (1, 2)
Attribute access, e.g., df.a
Subscript expressions, e.g., df[0]
Simple variable evaluation, e.g., pd.eval('df')(this is not very useful)
Math functions: sin, cos, exp, log, expm1, log1p, sqrt, sinh, cosh, tanh, arcsin, arccos, arctan, arccosh, arcsinh, arctanh, abs and arctan2.

除了左移 ( <<) 和右移 ( >>) 运算符之外的算术运算，例如df + 2 * pi / s ** 4 % 42- the_golden_ratio
比较操作，包括链式比较，例如， 2 < df < df2
布尔运算，例如，df < df2 and df3 < df4或not df_boollist和tuple文字，例如，[1, 2]或(1, 2)
属性访问，例如， df.a
下标表达式，例如， df[0]
简单的变量评估，例如，pd.eval('df')（这不是很有用）
数学函数：sin、cos、exp、log、expm1、log1p、sqrt、sinh、cosh、tanh、arcsin、arccos、arctan、arccosh、arcsinh、arctanh、abs 和 arctan2。

This section of the documentation also specifies syntax rules that are not supported, including set/dictliterals, if-else statements, loops, and comprehensions, and generator expressions.

文档的这一部分还指定了不受支持的语法规则，包括set/dict文字、if-else 语句、循环和推导式以及生成器表达式。

From the list, it is obvious you can also pass expressions involving the index, such as

从列表中，很明显您还可以传递涉及索引的表达式，例如

pd.eval('df1.A * (df1.index > 1)')

Parser Selection: The `parser=...`argument

解析器选择：`parser=...`参数

pd.evalsupports two different parser options when parsing the expression string to generate the syntax tree: pandasand python. The main difference between the two is highlighted by slightly differing precedence rules.

pd.eval在解析表达式字符串以生成语法树时支持两种不同的解析器选项：pandas和python. 两者之间的主要区别通过略有不同的优先规则突出显示。

Using the default parser pandas, the overloaded bitwise operators &and |which implement vectorized AND and OR operations with pandas objects will have the same operator precedence as andand or. So,

使用默认解析器pandas，重载的按位运算符&和|使用 Pandas 对象实现矢量化 AND 和 OR 操作将具有与andand相同的运算符优先级or。所以，

pd.eval("(df1 > df2) & (df3 < df4)")

Will be the same as

将与

pd.eval("df1 > df2 & df3 < df4")
# pd.eval("df1 > df2 & df3 < df4", parser='pandas')

And also the same as

而且也一样

pd.eval("df1 > df2 and df3 < df4")

Here, the parentheses are necessary. To do this conventionally, the parens would be required to override the higher precedence of bitwise operators:

在这里，括号是必要的。按照惯例，要做到这一点，需要括号覆盖按位运算符的更高优先级：

(df1 > df2) & (df3 < df4)

Without that, we end up with

没有那个，我们最终得到

df1 > df2 & df3 < df4

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Use parser='python'if you want to maintain consistency with python's actual operator precedence rules while evaluating the string.

parser='python'如果要在评估字符串时保持与 python 的实际运算符优先级规则的一致性，请使用。

pd.eval("(df1 > df2) & (df3 < df4)", parser='python')

The other difference between the two types of parsers are the semantics of the ==and !=operators with list and tuple nodes, which have the similar semantics as inand not inrespectively, when using the 'pandas'parser. For example,

两种类型的解析器之间的另一个区别是具有列表和元组节点的==and!=运算符的语义，当使用解析器时，它们分别具有与in和相似的语义。例如，not in'pandas'

pd.eval("df1 == [1, 2, 3]")

Is valid, and will run with the same semantics as

是有效的，并将以相同的语义运行

pd.eval("df1 in [1, 2, 3]")

OTOH, pd.eval("df1 == [1, 2, 3]", parser='python')will throw a NotImplementedErrorerror.

OTOH，pd.eval("df1 == [1, 2, 3]", parser='python')会抛出NotImplementedError错误。

Backend Selection: The `engine=...`argument

后端选择：`engine=...`参数

There are two options - numexpr(the default) and python. The numexproption uses the numexprbackend which is optimized for performance.

有两个选项 - numexpr（默认）和python. 该numexpr选项使用针对性能进行了优化的numexpr后端。

With 'python'backend, your expression is evaluated similar to just passing the expression to python's evalfunction. You have the flexibility of doing more inside expressions, such as string operations, for instance.

使用'python'后端，您的表达式的计算类似于将表达式传递给 python 的eval函数。您可以灵活地执行更多内部表达式，例如字符串操作。

df = pd.DataFrame({'A': ['abc', 'def', 'abacus']})
pd.eval('df.A.str.contains("ab")', engine='python')

0     True
1    False
2     True
Name: A, dtype: bool

Unfortunately, this method offers noperformance benefits over the numexprengine, and there are very few security measures to ensure that dangerous expressions are not evaluated, so USE AT YOUR OWN RISK! It is generally not recommended to change this option to 'python'unless you know what you're doing.

不幸的是，与引擎相比，此方法没有提供任何性能优势numexpr，并且几乎没有安全措施可以确保不评估危险表达式，因此请自行承担风险！通常不建议将此选项更改为，'python'除非您知道自己在做什么。

`local_dict`and `global_dict`arguments

`local_dict`和`global_dict`论据

Sometimes, it is useful to supply values for variables used inside expressions, but not currently defined in your namespace. You can pass a dictionary to local_dict

有时，为表达式中使用的变量提供值很有用，但当前未在您的命名空间中定义。您可以将字典传递给local_dict

For example,

例如，

pd.eval("df1 > thresh")

UndefinedVariableError: name 'thresh' is not defined

This fails because threshis not defined. However, this works:

这失败了，因为thresh没有定义。但是，这有效：

pd.eval("df1 > thresh", local_dict={'thresh': 10})

This is useful when you have variables to supply from a dictionary. Alternatively, with the 'python'engine, you could simply do this:

当您需要从字典中提供变量时，这很有用。或者，使用'python'引擎，您可以简单地执行以下操作：

mydict = {'thresh': 5}
# Dictionary values with *string* keys cannot be accessed without 
# using the 'python' engine.
pd.eval('df1 > mydict["thresh"]', engine='python')

But this is going to possibly be muchslower than using the 'numexpr'engine and passing a dictionary to local_dictor global_dict. Hopefully, this should make a convincing argument for the use of these parameters.

但是，这将可能是很多比使用较慢的'numexpr'发动机和传递一个字典local_dict或global_dict。希望这应该为使用这些参数提供令人信服的论据。

The `target`(+ `inplace`) argument, and Assignment Expressions

的`target`（+ `inplace`）参数，并赋值表达式

This is not often a requirement because there are usually simpler ways of doing this, but you can assign the result of pd.evalto an object that implements __getitem__such as dicts, and (you guessed it) DataFrames.

这通常不是必需的，因为通常有更简单的方法来执行此操作，但是您可以将的结果分配给pd.eval实现__getitem__诸如dicts 和（您猜对了）DataFrames 的对象。

Consider the example in the question

考虑问题中的例子

x = 5
df2['D'] = df1['A'] + (df1['B'] * x)

x = 5
df2['D'] = df1['A'] + (df1['B'] * x)

To assign a column "D" to df2, we do

要将列“D”分配给df2，我们执行

pd.eval('D = df1.A + (df1.B * x)', target=df2)

   A  B  C   D
0  5  9  8   5
1  4  3  0  52
2  5  0  2  22
3  8  1  3  48
4  3  7  0  42

This is not an in-place modification of df2(but it can be... read on). Consider another example:

这不是对df2（但它可以......继续阅读）的就地修改。考虑另一个例子：

pd.eval('df1.A + df2.A')

0    10
1    11
2     7
3    16
4    10
dtype: int32

If you wanted to (for example) assign this back to a DataFrame, you could use the targetargument as follows:

如果您想（例如）将其分配回 DataFrame，您可以target按如下方式使用该参数：

df = pd.DataFrame(columns=list('FBGH'), index=df1.index)
df
     F    B    G    H
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN

df = pd.eval('B = df1.A + df2.A', target=df)
# Similar to 
# df = df.assign(B=pd.eval('df1.A + df2.A'))

df
     F   B    G    H
0  NaN  10  NaN  NaN
1  NaN  11  NaN  NaN
2  NaN   7  NaN  NaN
3  NaN  16  NaN  NaN
4  NaN  10  NaN  NaN

If you wanted to perform an in-place mutation on df, set inplace=True.

如果您想对执行就地突变df，请设置inplace=True。

pd.eval('B = df1.A + df2.A', target=df, inplace=True)
# Similar to 
# df['B'] = pd.eval('df1.A + df2.A')

df
     F   B    G    H
0  NaN  10  NaN  NaN
1  NaN  11  NaN  NaN
2  NaN   7  NaN  NaN
3  NaN  16  NaN  NaN
4  NaN  10  NaN  NaN

If inplaceis set without a target, a ValueErroris raised.

如果inplace在没有目标的情况下设置，ValueError则提高 a。

While the targetargument is fun to play around with, you will seldom need to use it.

虽然target玩这个参数很有趣，但你很少需要使用它。

If you wanted to do this with df.eval, you would use an expression involving an assignment:

如果你想用来做这件事df.eval，你可以使用一个涉及赋值的表达式：

df = df.eval("B = @df1.A + @df2.A")
# df.eval("B = @df1.A + @df2.A", inplace=True)
df

     F   B    G    H
0  NaN  10  NaN  NaN
1  NaN  11  NaN  NaN
2  NaN   7  NaN  NaN
3  NaN  16  NaN  NaN
4  NaN  10  NaN  NaN

Note
One of pd.eval's unintended uses is parsing literal strings in a manner very similar to ast.literal_eval:

注意的
一个pd.eval非预期用途是以一种非常类似于的方式解析文字字符串ast.literal_eval：

pd.eval("[1, 2, 3]")
array([1, 2, 3], dtype=object)

It can also parse nested lists with the 'python'engine:

它还可以使用'python'引擎解析嵌套列表：

pd.eval("[[1, 2, 3], [4, 5], [10]]", engine='python')
[[1, 2, 3], [4, 5], [10]]

And lists of strings:

和字符串列表：

pd.eval(["[1, 2, 3]", "[4, 5]", "[10]"], engine='python')
[[1, 2, 3], [4, 5], [10]]

The problem, however, is for lists with length larger than 100:

然而，问题在于长度大于 100 的列表：

pd.eval(["[1]"] * 100, engine='python') # Works
pd.eval(["[1]"] * 101, engine='python') 

AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

More information can this error, causes, fixes, and workarounds can be found here.

可以在此处找到有关此错误、原因、修复和解决方法的更多信息。

`DataFrame.eval`- A Juxtaposition with `pandas.eval`

`DataFrame.eval`- 并列 `pandas.eval`

As mentioned above, df.evalcalls pd.evalunder the hood. The v0.23 source codeshows this:

如上所述，df.eval幕后调用pd.eval。的v0.23源代码示出了该：

def eval(self, expr, inplace=False, **kwargs):

    from pandas.core.computation.eval import eval as _eval

    inplace = validate_bool_kwarg(inplace, 'inplace')
    resolvers = kwargs.pop('resolvers', None)
    kwargs['level'] = kwargs.pop('level', 0) + 1
    if resolvers is None:
        index_resolvers = self._get_index_resolvers()
        resolvers = dict(self.iteritems()), index_resolvers
    if 'target' not in kwargs:
        kwargs['target'] = self
    kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
    return _eval(expr, inplace=inplace, **kwargs)

evalcreates arguments, does a little validation, and passes the arguments on to pd.eval.

eval创建参数，进行一些验证，然后将参数传递给pd.eval.

For more, you can read on: when to use DataFrame.eval() versus pandas.eval() or python eval()

有关更多信息，您可以继续阅读：何时使用 DataFrame.eval() 与 pandas.eval() 或 python eval()

Usage Differences

用法差异

Expressions with DataFrames v/s Series Expressions

带有 DataFrames 的表达式 v/s 系列表达式

For dynamic queries associated with entire DataFrames, you should prefer pd.eval. For example, there is no simple way to specify the equivalent of pd.eval("df1 + df2")when you call df1.evalor df2.eval.

对于与整个 DataFrame 关联的动态查询，您应该更喜欢pd.eval. 例如，没有简单的方法可以指定pd.eval("df1 + df2")调用df1.eval或时的等效项df2.eval。

Specifying Column Names

指定列名

Another other major difference is how columns are accessed. For example, to add two columns "A" and "B" in df1, you would call pd.evalwith the following expression:

另一个主要区别是访问列的方式。例如，要在中添加两列“A”和“B” df1，您可以pd.eval使用以下表达式进行调用：

pd.eval("df1.A + df1.B")

With df.eval, you need only supply the column names:

使用 df.eval，您只需要提供列名：

df1.eval("A + B")

Since, within the context of df1, it is clear that "A" and "B" refer to column names.

因为，在的上下文中df1，很明显“A”和“B”指的是列名。

You can also refer to the index and columns using index(unless the index is named, in which case you would use the name).

您还可以使用引用索引和列index（除非索引已命名，在这种情况下您将使用名称）。

df1.eval("A + index")

Or, more generally, for any DataFrame with an index having 1 or more levels, you can refer to the k^thlevel of the index in an expression using the variable "ilevel_k"which stands for "index at level k". IOW, the expression above can be written as df1.eval("A + ilevel_0").

或者，更一般地，对于具有1或更多级的索引数据帧的任何，可以参考第k^个使用变量索引的水平在表达“ilevel_k”表示“我ndex在等级k”。IOW，上面的表达式可以写成df1.eval("A + ilevel_0").

These rules also apply to query.

这些规则也适用于query.

Accessing Variables in Local/Global Namespace

访问本地/全局命名空间中的变量

Variables supplied inside expressions must be preceeded by the "@" symbol, to avoid confusion with column names.

表达式中提供的变量必须以“@”符号开头，以避免与列名混淆。

A = 5
df1.eval("A > @A")

The same goes for query.

也是如此query。

It goes without saying that your column names must follow the rules for valid identifier naming in python to be accessible inside eval. See herefor a list of rules on naming identifiers.

不用说，您的列名必须遵循 python 中有效标识符命名的规则才能在eval. 有关命名标识符的规则列表，请参见此处。

Multiline Queries and Assignment

多行查询和赋值

A little known fact is that evalsupport multiline expressions that deal with assignment. For example, to create two new columns "E" and "F" in df1 based on some arithmetic operations on some columns, and a third column "G" based on the previously created "E" and "F", we can do

一个鲜为人知的事实是eval支持处理赋值的多行表达式。例如，要基于对某些列的一些算术运算在 df1 中创建两个新列“E”和“F”，以及基于先前创建的“E”和“F”的第三列“G”，我们可以这样做

df1.eval("""
E = A + B
F = @df2.A + @df2.B
G = E >= F
""")

   A  B  C  D   E   F      G
0  5  0  3  3   5  14  False
1  7  9  3  5  16   7   True
2  2  4  7  6   6   5   True
3  8  8  1  6  16   9   True
4  7  7  8  1  14  10   True

...Nifty! However, note that this is not supported by query.

……漂亮！但是，请注意，这不受query.

`eval`v/s `query`- Final Word

`eval`v/s `query`- 最后一句话

It helps to think of df.queryas a function that uses pd.evalas a subroutine.

将其df.query视为pd.eval用作子例程的函数会有所帮助。

Typically, query(as the name suggests) is used to evaluate conditional expressions (i.e., expressions that result in True/False values) and return the rows corresponding to the Trueresult. The result of the expression is then passed to loc(in most cases) to return the rows that satisfy the expression. According to the documentation,

通常，query（顾名思义）用于评估条件表达式（即导致 True/False 值的表达式）并返回与True结果对应的行。然后将表达式的结果传递给loc（在大多数情况下）以返回满足表达式的行。根据文档，

The result of the evaluation of this expression is first passed to DataFrame.locand if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().
This method uses the top-level pandas.eval()function to evaluate the passed query.

此表达式的计算结果首先传递给 DataFrame.loc，如果由于多维键（例如，DataFrame）而失败，则结果将传递给 DataFrame.__getitem__()。
此方法使用顶级pandas.eval()函数来评估传递的查询。

In terms of similarity, queryand df.evalare both alike in how they access column names and variables.

在相似的条件，query并df.eval在他们如何访问列名和变量都一样。

This key difference between the two, as mentioned above is how they handle the expression result. This becomes obvious when you actually run an expression through these two functions. For example, consider

如上所述，两者之间的主要区别在于它们如何处理表达式结果。当您通过这两个函数实际运行表达式时，这一点变得显而易见。例如，考虑

df1.A

0    5
1    7
2    2
3    8
4    7
Name: A, dtype: int32

df1.B

0    9
1    3
2    0
3    1
4    7
Name: B, dtype: int32

To get all rows where "A" >= "B" in df1, we would use evallike this:

要获取 "A" >= "B" in 的所有行df1，我们将使用eval如下：

m = df1.eval("A >= B")
m
0     True
1    False
2    False
3     True
4     True
dtype: bool

mrepresents the intermediate result generated by evaluating the expression "A >= B". We then use the mask to filter df1:

m表示通过评估表达式“A >= B”生成的中间结果。然后我们使用掩码过滤df1：

df1[m]
# df1.loc[m]

   A  B  C  D
0  5  0  3  3
3  8  8  1  6
4  7  7  8  1

However, with query, the intermediate result "m" is directly passed to loc, so with query, you would simply need to do

但是，使用query，中间结果“m”直接传递给loc，因此使用query，您只需要执行

df1.query("A >= B")

   A  B  C  D
0  5  0  3  3
3  8  8  1  6
4  7  7  8  1

Performance wise, it is exactlythe same.

性能方面，完全一样。

df1_big = pd.concat([df1] * 100000, ignore_index=True)

%timeit df1_big[df1_big.eval("A >= B")]
%timeit df1_big.query("A >= B")

14.7 ms ± 33.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14.7 ms ± 24.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But the latter is more concise, and expresses the same operation in a single step.

但后者更简洁，在一个步骤中表达相同的操作。

Note that you can also do weird stuff with querylike this (to, say, return all rows indexed by df1.index)

请注意，您也可以用query这样的方式做奇怪的事情（例如，返回由 df1.index 索引的所有行）

df1.query("index")
# Same as df1.loc[df1.index] # Pointless,... I know

   A  B  C  D
0  5  0  3  3
1  7  9  3  5
2  2  4  7  6
3  8  8  1  6
4  7  7  8  1

But don't.

但是不要。

Bottom line: Please use querywhen querying or filtering rows based on a conditional expression.

底线：请query在基于条件表达式查询或过滤行时使用。

Answer 2

回答by astro123

Great tutorial already, but bear in mind that before jumping wildly into the usage of eval/queryattracted by its simpler syntax, it has severe performance issues if your dataset has less than 15,000 rows.

已经很棒的教程，但请记住，在eval/query被其更简单的语法吸引到疯狂使用之前，如果您的数据集少于 15,000 行，它会出现严重的性能问题。

In that case, simply use df.loc[mask1, mask2].

在这种情况下，只需使用df.loc[mask1, mask2].

Refer: https://pandas.pydata.org/pandas-docs/version/0.22/enhancingperf.html#enhancingperf-eval

参考：https: //pandas.pydata.org/pandas-docs/version/0.22/enhancingperf.html#enhancingperf-eval

使用 pd.eval() 在 Pandas 中进行动态表达式评估

提问by cs95

回答by cs95

`pandas.eval`- The "Missing Manual"

`pandas.eval`- “丢失的手册”

Parser Selection: The `parser=...`argument

解析器选择：`parser=...`参数

Backend Selection: The `engine=...`argument

后端选择：`engine=...`参数

`local_dict`and `global_dict`arguments

`local_dict`和`global_dict`论据

The `target`(+ `inplace`) argument, and Assignment Expressions

的`target`（+ `inplace`）参数，并赋值表达式

`DataFrame.eval`- A Juxtaposition with `pandas.eval`

`DataFrame.eval`- 并列 `pandas.eval`

Usage Differences

用法差异

Expressions with DataFrames v/s Series Expressions

带有 DataFrames 的表达式 v/s 系列表达式

Specifying Column Names

指定列名

Accessing Variables in Local/Global Namespace

访问本地/全局命名空间中的变量

Multiline Queries and Assignment

多行查询和赋值

`eval`v/s `query`- Final Word

`eval`v/s `query`- 最后一句话

回答by astro123

相关推荐

最近更新

标签

使用 pd.eval() 在 Pandas 中进行动态表达式评估

提问by cs95

回答by cs95

pandas.eval- The "Missing Manual"

pandas.eval- “丢失的手册”

Parser Selection: The parser=...argument

解析器选择：parser=...参数

Backend Selection: The engine=...argument

后端选择：engine=...参数

local_dictand global_dictarguments

local_dict和global_dict论据

The target(+ inplace) argument, and Assignment Expressions

的target（+ inplace）参数，并赋值表达式

DataFrame.eval- A Juxtaposition with pandas.eval

DataFrame.eval- 并列 pandas.eval

Usage Differences

用法差异

Expressions with DataFrames v/s Series Expressions

带有 DataFrames 的表达式 v/s 系列表达式

Specifying Column Names

指定列名

Accessing Variables in Local/Global Namespace

访问本地/全局命名空间中的变量

Multiline Queries and Assignment

多行查询和赋值

evalv/s query- Final Word

evalv/s query- 最后一句话

回答by astro123

相关推荐

pandas 从字符串转换为熊猫数据框

pandas 如何重新索引多索引数据帧

pandas 大熊猫在行上迭代作为字典

Pandas 数据框选择列表列包含任何字符串列表的行

相关推荐

最近更新

标签

`pandas.eval`- The "Missing Manual"

`pandas.eval`- “丢失的手册”

Parser Selection: The `parser=...`argument

解析器选择：`parser=...`参数

Backend Selection: The `engine=...`argument

后端选择：`engine=...`参数

`local_dict`and `global_dict`arguments

`local_dict`和`global_dict`论据

The `target`(+ `inplace`) argument, and Assignment Expressions

的`target`（+ `inplace`）参数，并赋值表达式

`DataFrame.eval`- A Juxtaposition with `pandas.eval`

`DataFrame.eval`- 并列 `pandas.eval`

`eval`v/s `query`- Final Word

`eval`v/s `query`- 最后一句话