pandas 熊猫不可变数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24928306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas Immutable DataFrame
提问by sanguineturtle
I am interested in an Immutable DataFrame to use in a program as a reference table, with read_only properties enforced, after it has been initially constructed (which in my case is during a class def __init__()method).
我对在程序中用作参考表的不可变 DataFrame 感兴趣,在它最初构造之后(在我的情况下是在类def __init__()方法期间)强制执行 read_only 属性。
I see Index Objects are Frozen.
我看到索引对象已冻结。
Is there a way to make an entire DataFrame immutable?
有没有办法让整个 DataFrame 不可变?
回答by Joop
Try code something like this
试试这样的代码
class Bla(object):
def __init__(self):
self._df = pd.DataFrame(index=[1,2,3])
@property
def df(self):
return self._df.copy()
this will allow you to get the df back, using b.df, but you will not be able to assign to it. So in short you have a df in class that behaves in the "Immutable DataFrame", purely in that it blocks changes to the original. the returned object is however still a mutable data frame so it will not behave like an Immutable one in other ways. I.e. you will not be able to use it as key for dictionary, etc.
这将允许您使用 b.df 取回 df,但您将无法分配给它。所以简而言之,您在类中有一个 df 在“不可变数据帧”中的行为,纯粹是因为它阻止了对原始数据的更改。然而,返回的对象仍然是一个可变数据框,因此它在其他方面不会像不可变的那样。即您将无法将其用作字典等的键。
回答by flexatone
The StaticFrame package (of which I am an author) implements a Pandas-like interface, and many common Pandas operations, while enforcing immutability in underlying NumPy arrays and immutable Series and Frame containers.
StaticFrame 包(我是其中的作者)实现了一个类似 Pandas 的接口和许多常见的 Pandas 操作,同时在底层 NumPy 数组和不可变的 Series 和 Frame 容器中强制执行不变性。
You can make an entire Pandas DataFrame immutable by converting it to a StaticFrame Framewith static_frame.Frame.from_pandas(df). Then you can use it as a truly read-only table.
您可以通过Frame使用static_frame.Frame.from_pandas(df). 然后您可以将其用作真正的只读表。
See StaticFrame documentation of this method: https://static-frame.readthedocs.io/en/latest/api_creation.html#static_frame.Series.from_pandas
请参阅此方法的 StaticFrame 文档:https://static-frame.readthedocs.io/en/latest/api_creation.html#static_frame.Series.from_pandas
回答by deinonychusaur
If you truely want to make the DataFramebehave as immutable instead of using the copysolution by @Joop (which I would recommend) you could build upon the following structure.
如果您真的想让DataFrame行为不可变,而不是使用copy@Joop的解决方案(我会推荐),您可以建立在以下结构上。
Note that it is just a starting point.
请注意,这只是一个起点。
It basically is a proxy data object that hides all things that would change the state and allows itself to be hashed and all instances of the same original data will have the same hash. There probably are modules that does the below in cooler ways, but I figured it could be educational as an example.
它基本上是一个代理数据对象,它隐藏了所有会改变状态的东西,并允许对自身进行散列,并且相同原始数据的所有实例将具有相同的散列。可能有一些模块以更酷的方式执行以下操作,但我认为作为一个例子,它可能具有教育意义。
Some warnings:
一些警告:
Dependeing on how the string representation of the proxied object is constructed two different proxied objects could get the same hash, howerver the implementation is compatible with
DataFrames among other objects.Changes to the original object, will affect the proxy object.
Equalness will lead to some nasty inifinite reqursions if the other object tosses the equalness question back (this is why
listhas a special case).The
DataFrameproxy maker helper is just a start, the problem is that any method that changes the state of the original object cannot be allowed or needs to be manually overwritten by the helper or entirely masked in by theextraFilter-parameter when instantiating_ReadOnly. See theDataFrameProxy.sort.The proxys won't show as derived from the proxied's type.
根据代理对象的字符串表示的构造方式,两个不同的代理对象可以获得相同的 hash,但是实现与
DataFrame其他对象中的 s兼容。对原始对象的更改,会影响代理对象。
如果另一个对象将相等问题抛回去,则相等将导致一些令人讨厌的无限请求(这就是为什么
list有一个特殊情况)。该
DataFrame代理机辅助仅仅是一个开始,问题是,任何改变所述原始对象的状态的方法不能被允许或需要由辅助手动覆盖或由在完全掩蔽extraFilter-parameter实例化时_ReadOnly。见DataFrameProxy.sort。代理不会显示为派生自代理的类型。
The Generic Read Only Proxy
通用只读代理
This could be used on any object.
这可以用于任何对象。
import md5
import warnings
class _ReadOnly(object):
def __init__(self, obj, extraFilter=tuple()):
self.__dict__['_obj'] = obj
self.__dict__['_d'] = None
self.__dict__['_extraFilter'] = extraFilter
self.__dict__['_hash'] = int(md5.md5(str(obj)).hexdigest(), 16)
@staticmethod
def _cloak(obj):
try:
hash(obj)
return obj
except TypeError:
return _ReadOnly(obj)
def __getitem__(self, value):
return _ReadOnly._cloak(self._obj[value])
def __setitem__(self, key, value):
raise TypeError(
"{0} has a _ReadOnly proxy around it".format(type(self._obj)))
def __delitem__(self, key):
raise TypeError(
"{0} has a _ReadOnly proxy around it".format(type(self._obj)))
def __getattr__(self, value):
if value in self.__dir__():
return _ReadOnly._cloak(getattr(self._obj, value))
elif value in dir(self._obj):
raise AttributeError("{0} attribute {1} is cloaked".format(
type(self._obj), value))
else:
raise AttributeError("{0} has no {1}".format(
type(self._obj), value))
def __setattr__(self, key, value):
raise TypeError(
"{0} has a _ReadOnly proxy around it".format(type(self._obj)))
def __delattr__(self, key):
raise TypeError(
"{0} has a _ReadOnly proxy around it".format(type(self._obj)))
def __dir__(self):
if self._d is None:
self.__dict__['_d'] = [
i for i in dir(self._obj) if not i.startswith('set')
and i not in self._extraFilter]
return self._d
def __repr__(self):
return self._obj.__repr__()
def __call__(self, *args, **kwargs):
if hasattr(self._obj, "__call__"):
return self._obj(*args, **kwargs)
else:
raise TypeError("{0} not callable".format(type(self._obj)))
def __hash__(self):
return self._hash
def __eq__(self, other):
try:
return hash(self) == hash(other)
except TypeError:
if isinstance(other, list):
try:
return all(zip(self, other))
except:
return False
return other == self
The DataFrame proxy
数据帧代理
Should really be extended with more methods like sortand filtering all other state-changing methods of non-interest.
真的应该扩展更多的方法,比如sort过滤所有其他不感兴趣的状态改变方法。
You can either instantiate with a DataFrame-instance as the only argument or give it the arguments as you would have to create a DataFrame
您可以使用DataFrame-instance 作为唯一参数进行实例化,也可以像创建一个实例一样为其提供参数DataFrame
import pandas as pd
class DataFrameProxy(_ReadOnly):
EXTRA_FILTER = ('drop', 'drop_duplicates', 'dropna')
def __init__(self, *args, **kwargs):
if (len(args) == 1 and
not len(kwargs) and
isinstance(args, pd.DataFrame)):
super(DataFrameProxy, self).__init__(args[0],
DataFrameProxy.EXTRA_FILTER)
else:
super(DataFrameProxy, self).__init__(pd.DataFrame(*args, **kwargs),
DataFrameProxy.EXTRA_FILTER)
def sort(self, inplace=False, *args, **kwargs):
if inplace:
warnings.warn("Inplace sorting overridden")
return self._obj.sort(*args, **kwargs)
Finally:
最后:
However, though fun making this contraption, why not simply have a DataFramethat you don't alter? If it is only exposed to you, better just you making sure not to alter it...
然而,虽然制作这个装置很有趣,但为什么不简单地拥有一个DataFrame你不改变的呢?如果它只暴露给你,最好只是你确保不要改变它......

