pandas 如何在 Python 中重定向包含类的所有方法？

Question

提问by Yariv

How to implement the composition pattern? I have a class Containerwhich has an attribute object Contained. I would like to redirect/allow access to all methods of Containedclass from Containerby simply calling my_container.some_contained_method(). Am I doing the right thing in the right way?

如何实现组合模式？我有一个Container具有属性 object 的类Contained。我想重定向/允许访问的所有方法Contained从类Container通过简单地调用my_container.some_contained_method()。我是否以正确的方式做正确的事？

I use something like:

我使用类似的东西：

class Container:
   def __init__(self):
       self.contained = Contained()
   def __getattr__(self, item):
       if item in self.__dict__: # some overridden
           return self.__dict__[item] 
       else:
           return self.contained.__getattr__(item) # redirection

Background:

背景：

I am trying to build a class (Indicator) that adds to the functionality of an existing class (pandas.DataFrame). Indicatorwill have all the methods of DataFrame. I could use inheritance, but I am following the "favor compositionover inheritance" advice (see, e.g., the answers in: python: inheriting or composition). One reason not to inherit is because the base class is not serializable and I need to serialize.

我正在尝试构建一个类 ( Indicator) 来增加现有类 ( pandas.DataFrame) 的功能。Indicator将拥有的所有方法DataFrame。我可以使用继承，但我遵循“优先组合而不是继承”的建议（例如，参见：python: inheriting 或 composition 中的答案）。不继承的原因之一是基类不可序列化，我需要序列化。

I have found this, but I am not sure if it fits my needs.

我找到了这个，但我不确定它是否符合我的需要。

Answer 1

回答by unutbu

Caveats:

注意事项：

DataFrames have a lot of attributes. If a DataFrameattribute is a number, you probably just want to return that number. But if the DataFrameattribute is DataFrameyou probably want to return a Container. What should we do if the DataFrameattribute is a Seriesor a descriptor? To implement Container.__getattr__properly, you really have to write unit tests for each and every attribute.
Unit testing is also needed for __getitem__.
You'll also have to define and unit test __setattr__and __setitem__, __iter__, __len__, etc.
Pickling is a form of serialization, so if DataFramesare picklable, I'm not sure how Containers really help with serialization.

DataFrames 有很多属性。如果DataFrame属性是一个数字，您可能只想返回该数字。但是如果DataFrame属性是DataFrame你可能想要返回一个Container. 如果DataFrame属性是aSeries或者descriptor怎么办？要Container.__getattr__正确实施，您确实必须为每个属性编写单元测试。
还需要单元测试__getitem__。
您还可以定义和单元测试__setattr__和__setitem__，__iter__，__len__，等。
酸洗是序列化的一种形式，所以如果DataFrames是可酸洗的，我不确定如何Container真正帮助序列化。

Some comments:

一些评论：

__getattr__is only called if the attribute is not in self.__dict__. So you do not need if item in self.__dict__in your __getattr__.
self.contained.__getattr__(item)calls self.contained's __getattr__method directly. That is usually not what you want to do, because it circumvents the whole Python attribute lookup mechanism. For example, it ignores the possibility that the attribute could be in self.contained.__dict__, or in the __dict__of one of the bases of self.contained.__class__or if itemrefers to a descriptor. Instead use getattr(self.contained, item).

__getattr__仅当属性不在中时才调用self.__dict__。所以你不需要if item in self.__dict__在你的__getattr__.
self.contained.__getattr__(item)直接调用self.contained的 __getattr__方法。这通常不是您想要做的，因为它绕过了整个 Python 属性查找机制。例如，它忽略了属性可能在self.contained.__dict__，或在或 if引用描述符__dict__的基础之一的可能性。而是使用.self.contained.__class__itemgetattr(self.contained, item)

import pandas
import numpy as np

def tocontainer(func):
    def wrapper(*args, **kwargs):
        result = func(*args, **kwargs)
        return Container(result)
    return wrapper

class Container(object):
   def __init__(self, df):
       self.contained = df
   def __getitem__(self, item):
       result = self.contained[item]
       if isinstance(result, type(self.contained)):
           result = Container(result)
       return result
   def __getattr__(self, item):
       result = getattr(self.contained, item)
       if callable(result):
           result = tocontainer(result)
       return result
   def __repr__(self):
       return repr(self.contained)

Here is some random code to test if -- at least superficially -- Containerdelegates to DataFrames properly and returns Containers:

这是一些随机代码，用于测试是否 - 至少在表面上 -正确地Container委托给DataFrames 并返回Containers：

df = pandas.DataFrame(
    [(1, 2), (1, 3), (1, 4), (2, 1),(2,2,)], columns=['col1', 'col2'])
df = Container(df)
df['col1'][3] = 0
print(df)
#    col1  col2
# 0     1     2
# 1     1     3
# 2     1     4
# 3     2     1
# 4     2     2
gp = df.groupby('col1').aggregate(np.count_nonzero)
print(gp)
#       col2
# col1      
# 1        3
# 2        2
print(type(gp))
# <class '__main__.Container'>

print(type(gp[gp.col2 > 2]))
# <class '__main__.Container'>

tf = gp[gp.col2 > 2].reset_index()
print(type(tf))
# <class '__main__.Container'>

result = df[df.col1 == tf.col1]
print(type(result))
# <class '__main__.Container'>

Answer 2

回答by Waylon Walker

I found unbutbu 's answer very useful for my own application, I ran into issues displaying it properly in a jupyter notebook. I found that adding the following methods to the class solved the issue.

我发现 unbutbu 的答案对我自己的应用程序非常有用，但在 jupyter 笔记本中正确显示它时遇到了问题。我发现在类中添加以下方法解决了这个问题。

def _repr_html_(self):
    return self.contained._repr_html_()

def _repr_latex_(self):
    return self.contained._repr_latex_()

pandas 如何在 Python 中重定向包含类的所有方法？

提问by Yariv

回答by unutbu

回答by Waylon Walker

相关推荐

最近更新

标签

pandas 如何在 Python 中重定向包含类的所有方法？

提问by Yariv

回答by unutbu

回答by Waylon Walker

相关推荐

Pandas：类似函数的 grep

在 Pandas DataFrame 中快速应用字符串操作

Pandas DataFrame 按天/小时/分钟切片

如何将 Pandas DatetimeIndex 相应地转换为字符串

相关推荐

最近更新

标签