Python 中的功能管道,如来自 R 的 magritrr 的 %>%

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28252585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:00:00  来源:igfitidea点击:

Functional pipes in python like %>% from R's magritrr

pythonfunctional-programmingpipeline

提问by cantdutchthis

In R (thanks to magritrr) you can now perform operations with a more functional piping syntax via %>%. This means that instead of coding this:

在 R 中(感谢magritrr),您现在可以通过%>%. 这意味着,而不是编码这个:

> as.Date("2014-01-01")
> as.character((sqrt(12)^2)

You could also do this:

你也可以这样做:

> "2014-01-01" %>% as.Date 
> 12 %>% sqrt %>% .^2 %>% as.character

To me this is more readable and this extends to use cases beyond the dataframe. Does the python language have support for something similar?

对我来说,这更具可读性,并且可以扩展到数据框以外的用例。python 语言是否支持类似的东西?

采纳答案by Dunes

One possible way of doing this is by using a module called macropy. Macropy allows you to apply transformations to the code that you have written. Thus a | bcan be transformed to b(a). This has a number of advantages and disadvantages.

一种可能的方法是使用名为macropy. Macropy 允许您将转换应用于您编写的代码。从而a | b可以转化为b(a)。这有许多优点和缺点。

In comparison to the solution mentioned by Sylvain Leroux, The main advantage is that you do not need to create infix objects for the functions you are interested in using -- just mark the areas of code that you intend to use the transformation. Secondly, since the transformation is applied at compile time, rather than runtime, the transformed code suffers no overhead during runtime -- all the work is done when the byte code is first produced from the source code.

与 Sylvain Leroux 提到的解决方案相比,主要优点是您不需要为您有兴趣使用的函数创建中缀对象——只需标记您打算使用转换的代码区域。其次,由于转换是在编译时而不是运行时应用的,所以转换后的代码在运行时没有任何开销——所有工作都是在字节码第一次从源代码中生成时完成的。

The main disadvantages are that macropy requires a certain way to be activated for it to work (mentioned later). In contrast to a faster runtime, the parsing of the source code is more computationally complex and so the program will take longer to start. Finally, it adds a syntactic style that means programmers who are not familiar with macropy may find your code harder to understand.

主要缺点是 macropy 需要某种方式激活才能工作(后面会提到)。与更快的运行时间相比,源代码的解析在计算上更加复杂,因此程序启动需要更长的时间。最后,它增加了一种句法风格,这意味着不熟悉 macropy 的程序员可能会发现您的代码更难理解。

Example Code:

示例代码:

run.py

运行文件

import macropy.activate 
# Activates macropy, modules using macropy cannot be imported before this statement
# in the program.
import target
# import the module using macropy

target.py

目标文件

from fpipe import macros, fpipe
from macropy.quick_lambda import macros, f
# The `from module import macros, ...` must be used for macropy to know which 
# macros it should apply to your code.
# Here two macros have been imported `fpipe`, which does what you want
# and `f` which provides a quicker way to write lambdas.

from math import sqrt

# Using the fpipe macro in a single expression.
# The code between the square braces is interpreted as - str(sqrt(12))
print fpipe[12 | sqrt | str] # prints 3.46410161514

# using a decorator
# All code within the function is examined for `x | y` constructs.
x = 1 # global variable
@fpipe
def sum_range_then_square():
    "expected value (1 + 2 + 3)**2 -> 36"
    y = 4 # local variable
    return range(x, y) | sum | f[_**2]
    # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here

print sum_range_then_square() # prints 36

# using a with block.
# same as a decorator, but for limited blocks.
with fpipe:
    print range(4) | sum # prints 6
    print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']

And finally the module that does the hard work. I've called it fpipe for functional pipe as its emulating shell syntax for passing output from one process to another.

最后是完成艰苦工作的模块。我将它称为功能管道的 fpipe,因为它模拟 shell 语法,用于将输出从一个进程传递到另一个进程。

fpipe.py

fpipe.py

from macropy.core.macros import *
from macropy.core.quotes import macros, q, ast

macros = Macros()

@macros.decorator
@macros.block
@macros.expr
def fpipe(tree, **kw):

    @Walker
    def pipe_search(tree, stop, **kw):
        """Search code for bitwise or operators and transform `a | b` to `b(a)`."""
        if isinstance(tree, BinOp) and isinstance(tree.op, BitOr):
            operand = tree.left
            function = tree.right
            newtree = q[ast[function](ast[operand])]
            return newtree

    return pipe_search.recurse(tree)

回答by Sylvain Leroux

Does the python language have support for something similar?

python 语言是否支持类似的东西?

"more functional piping syntax"is this really a more "functional" syntax ? I would say it adds an "infix" syntax to R instead.

“更多功能的管道语法”这真的是一个更“功能”的语法吗?我会说它为 R 添加了“中缀”语法。

That being said, the Python's grammardoes not have direct support for infix notation beyond the standard operators.

话虽如此,Python 的语法并不直接支持标准运算符之外的中缀表示法。



If you really need something like that, you should take that code from Tomer Filibaas a starting point to implement your own infix notation:

如果你真的需要这样的东西,你应该以Tomer Filiba 的代码为起点来实现你自己的中缀符号:

Code sample and comments by Tomer Filiba (http://tomerfiliba.com/blog/Infix-Operators/) :

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

Using instances of this peculiar class, we can now use a new "syntax" for calling functions as infix operators:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

Tomer Filiba ( http://tomerfiliba.com/blog/Infix-Operators/) 的代码示例和评论:

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

使用这个特殊类的实例,我们现在可以使用新的“语法”作为中缀运算符调用函数:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

回答by smci

PyToolz[doc]allows arbitrarily composable pipes, just they aren't defined with that pipe-operator syntax.

PyToolz [doc]允许任意组合管道,只是它们不是用管道运算符语法定义的。

Follow the above link for the quickstart. And here's a video tutorial: http://pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz

按照上面的链接进行快速入门。这是一个视频教程:http: //pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz

In [1]: from toolz import pipe

In [2]: from math import sqrt

In [3]: pipe(12, sqrt, str)
Out[3]: '3.4641016151377544'

回答by shadowtalker

Pipes are a new feature in Pandas 0.16.2.

管道是Pandas 0.16.2中的一个新特性。

Example:

例子:

import pandas as pd
from sklearn.datasets import load_iris

x = load_iris()
x = pd.DataFrame(x.data, columns=x.feature_names)

def remove_units(df):
    df.columns = pd.Index(map(lambda x: x.replace(" (cm)", ""), df.columns))
    return df

def length_times_width(df):
    df['sepal length*width'] = df['sepal length'] * df['sepal width']
    df['petal length*width'] = df['petal length'] * df['petal width']

x.pipe(remove_units).pipe(length_times_width)
x

NB: The Pandas version retains Python's reference semantics. That's why length_times_widthdoesn't need a return value; it modifies xin place.

注意:Pandas 版本保留了 Python 的引用语义。这就是为什么length_times_width不需要返回值的原因;它就地修改x

回答by yardsale8

Building pipewith Infix

建筑pipeInfix

As hinted at by Sylvain Leroux, we can use the Infixoperator to construct a infix pipe. Let's see how this is accomplished.

正如Sylvain Leroux所暗示的,我们可以使用Infix运算符来构造中缀pipe。让我们看看这是如何实现的。

First, here is the code from Tomer Filiba

首先,这是来自Tomer Filiba的代码

Code sample and comments by Tomer Filiba (http://tomerfiliba.com/blog/Infix-Operators/) :

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

Using instances of this peculiar class, we can now use a new "syntax" for calling functions as infix operators:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

Tomer Filiba ( http://tomerfiliba.com/blog/Infix-Operators/) 的代码示例和评论:

from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)

使用这个特殊类的实例,我们现在可以使用新的“语法”作为中缀运算符调用函数:

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

The pipe operator passes the preceding object as an argument to the object that follows the pipe, so x %>% fcan be transformed into f(x). Consequently, the pipeoperator can be defined using Infixas follows:

管道运算符将前面的对象作为参数传递给管道后面的对象,因此x %>% f可以转换为f(x). 因此,pipe可以使用Infix如下定义运算符:

In [1]: @Infix
   ...: def pipe(x, f):
   ...:     return f(x)
   ...:
   ...:

In [2]: from math import sqrt

In [3]: 12 |pipe| sqrt |pipe| str
Out[3]: '3.4641016151377544'

A note on partial application

关于部分应用的说明

The %>%operator from dpylrpushes arguments through the first argument in a function, so

%>%由运营商dpylr推动的参数通过一个函数的第一个参数,所以

df %>% 
filter(x >= 2) %>%
mutate(y = 2*x)

corresponds to

对应于

df1 <- filter(df, x >= 2)
df2 <- mutate(df1, y = 2*x)

The easiest way to achieve something similar in Python is to use currying. The toolzlibrary provides a currydecorator function that makes constructing curried functions easy.

在 Python 中实现类似功能的最简单方法是使用currying。该toolz库提供了一个curry装饰器函数,使构造柯里化函数变得容易。

In [2]: from toolz import curry

In [3]: from datetime import datetime

In [4]: @curry
    def asDate(format, date_string):
        return datetime.strptime(date_string, format)
    ...:
    ...:

In [5]: "2014-01-01" |pipe| asDate("%Y-%m-%d")
Out[5]: datetime.datetime(2014, 1, 1, 0, 0)

Notice that |pipe|pushes the arguments into the last argument position, that is

请注意,|pipe|将参数推入最后一个参数位置,即

x |pipe| f(2)

corresponds to

对应于

f(2, x)

When designing curried functions, static arguments (i.e. arguments that might be used for many examples) should be placed earlier in the parameter list.

在设计柯里化函数时,静态参数(即可能用于许多示例的参数)应该放在参数列表的较早位置。

Note that toolzincludes many pre-curried functions, including various functions from the operatormodule.

请注意,toolz包括许多预柯里化功能,包括来自operator模块的各种功能。

In [11]: from toolz.curried import map

In [12]: from toolz.curried.operator import add

In [13]: range(5) |pipe| map(add(2)) |pipe| list
Out[13]: [2, 3, 4, 5, 6]

which roughly corresponds to the following in R

大致对应于R中的以下内容

> library(dplyr)
> add2 <- function(x) {x + 2}
> 0:4 %>% sapply(add2)
[1] 2 3 4 5 6

Using other infix delimiters

使用其他中缀分隔符

You can change the symbols that surround the Infix invocation by overriding other Python operator methods. For example, switching __or__and __ror__to __mod__and __rmod__will change the |operator to the modoperator.

您可以通过覆盖其他 Python 运算符方法来更改中缀调用周围的符号。例如,将__or__and切换__ror____mod__and__rmod__会将|运算符更改为运算mod符。

In [5]: 12 %pipe% sqrt %pipe% str
Out[5]: '3.4641016151377544'

回答by shadowtalker

If you just want this for personal scripting, you might want to consider using Coconutinstead of Python.

如果您只是希望将其用于个人脚本,您可能需要考虑使用Coconut而不是 Python。

Coconut is a superset of Python. You could therefore use Coconut's pipe operator |>, while completely ignoring the rest of the Coconut language.

Coconut 是 Python 的超集。因此,您可以使用 Coconut 的管道运算符|>,而完全忽略 Coconut 语言的其余部分。

For example:

例如:

def addone(x):
    x + 1

3 |> addone

compiles to

编译为

# lots of auto-generated header junk

# Compiled Coconut: -----------------------------------------------------------

def addone(x):
    return x + 1

(addone)(3)

回答by Eli Korvigo

Adding my 2c. I personally use package fnfor functional style programming. Your example translates into

添加我的 2c。我个人使用包fn进行函数式编程。你的例子转化为

from fn import F, _
from math import sqrt

(F(sqrt) >> _**2 >> str)(12)

Fis a wrapper class with functional-style syntactic sugar for partial application and composition. _is a Scala-style constructor for anonymous functions (similar to Python's lambda); it represents a variable, hence you can combine several _objects in one expression to get a function with more arguments (e.g. _ + _is equivalent to lambda a, b: a + b). F(sqrt) >> _**2 >> strresults in a Callableobject that can be used as many times as you want.

F是一个带有函数式语法糖的包装类,用于部分应用和组合。_是一个 Scala 风格的匿名函数构造函数(类似于 Python 的lambda);它代表一个变量,因此您可以_在一个表达式中组合多个对象以获得具有更多参数的函数(例如_ + _,相当于lambda a, b: a + b)。F(sqrt) >> _**2 >> str结果Callable是可以根据需要多次使用的对象。

回答by Robin Hilliard

I missed the |>pipe operator from Elixir so I created a simple function decorator (~ 50 lines of code) that reinterprets the >>Python right shift operator as a very Elixir-like pipe at compile time using the ast library and compile/exec:

我错过了|>Elixir的管道运算符,所以我创建了一个简单的函数装饰器(约 50 行代码),它>>在编译时使用 ast 库和 compile/exec将Python 右移运算符重新解释为一个非常类似于 Elixir 的管道:

from pipeop import pipes

def add3(a, b, c):
    return a + b + c

def times(a, b):
    return a * b

@pipes
def calc()
    print 1 >> add3(2, 3) >> times(4)  # prints 24

All it's doing is rewriting a >> b(...)as b(a, ...).

它所做的只是重写a >> b(...)b(a, ...).

https://pypi.org/project/pipeop/

https://pypi.org/project/pipeop/

https://github.com/robinhilliard/pipes

https://github.com/robinhilliard/pipes

回答by Legit Stack

One alternative solution would be to use the workflow tool dask. Though it's not as syntactically fun as...

一种替代解决方案是使用工作流工具 dask。虽然它在语法上不如......

var
| do this
| then do that

...it still allows your variable to flow down the chain and using dask gives the added benefit of parallelization where possible.

...它仍然允许您的变量沿链向下流动,并且在可能的情况下使用 dask 可以提供并行化的额外好处。

Here's how I use dask to accomplish a pipe-chain pattern:

以下是我如何使用 dask 来完成管道链模式:

import dask

def a(foo):
    return foo + 1
def b(foo):
    return foo / 2
def c(foo,bar):
    return foo + bar

# pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names)
workflow = {'a_task':(a,1),
            'b_task':(b,'a_task',),
            'c_task':(c,99,'b_task'),}

#dask.visualize(workflow) #visualization available. 

dask.get(workflow,'c_task')

# returns 100

After having worked with elixir I wanted to use the piping pattern in Python. This isn't exactly the same pattern, but it's similar and like I said, comes with added benefits of parallelization; if you tell dask to get a task in your workflow which isn't dependant upon others to run first, they'll run in parallel.

在使用过 elixir 之后,我想在 Python 中使用管道模式。这不是完全相同的模式,但它是相似的,就像我说的,带有并行化的额外好处;如果您告诉 dask 在您的工作流程中获取一个不依赖于其他人首先运行的任务,它们将并行运行。

If you wanted easier syntax you could wrap it in something that would take care of the naming of the tasks for you. Of course in this situation you'd need all functions to take the pipe as the first argument, and you'd lose any benefit of parallization. But if you're ok with that you could do something like this:

如果您想要更简单的语法,您可以将其包装在可以为您处理任务命名的东西中。当然,在这种情况下,您需要所有函数都将管道作为第一个参数,并且您将失去并行化的任何好处。但是,如果您对此感到满意,则可以执行以下操作:

def dask_pipe(initial_var, functions_args):
    '''
    call the dask_pipe with an init_var, and a list of functions
    workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
    workflow, last_task = dask_pipe(initial_var, [function_1, function_2])
    dask.get(workflow, last_task)
    '''
    workflow = {}
    if isinstance(functions_args, list):
        for ix, function in enumerate(functions_args):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1))
        return workflow, 'task_' + str(ix)
    elif isinstance(functions_args, dict):
        for ix, (function, args) in enumerate(functions_args.items()):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args )
        return workflow, 'task_' + str(ix)

# piped functions
def foo(df):
    return df[['a','b']]
def bar(df, s1, s2):
    return df.columns.tolist() + [s1, s2]
def baz(df):
    return df.columns.tolist()

# setup 
import dask
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})

Now, with this wrapper, you can make a pipe following either of these syntactical patterns:

现在,使用此包装器,您可以按照以下任一语法模式制作管道:

# wf, lt = dask_pipe(initial_var, [function_1, function_2])
# wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})

like this:

像这样:

# test 1 - lists for functions only:
workflow, last_task =  dask_pipe(df, [foo, baz])
print(dask.get(workflow, last_task)) # returns ['a','b']

# test 2 - dictionary for args:
workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']})
print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']

回答by mhsekhavat

You can use sspipelibrary. It exposes two objects pand px. Similar to x %>% f(y,z), you can write x | p(f, y, z)and similar to x %>% .^2you can write x | px**2.

您可以使用sspipe库。它公开了两个对象ppx。类似于x %>% f(y,z),可以写x | p(f, y, z),类似于x %>% .^2可以写x | px**2

from sspipe import p, px
from math import sqrt

12 | p(sqrt) | px ** 2 | p(str)