pandas Python如何使用字符串键索引多维数组,就像字典一样

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30198973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:20:03  来源:igfitidea点击:

Python how to index multidimensional array with string key, like a dict

pythonnumpydictionaryindexingpandas

提问by ericksonla

I would like to combine the functionality of numpy's arraywith native python's dict, namely creating a multidimensional array that can be indexed with strings.

我想将 numpy 的功能array与本机 python 的功能结合起来dict,即创建一个可以用字符串索引的多维数组。

For example, I could do this:

例如,我可以这样做:

dict_2d = {'a': {'x': 1, 'y': 2},
           'b': {'x': 3, 'y': 4}}
print dict_2d['a','y']  # returns 2

I know I could do dict_2d['a']['x']but long term I'd like to be able to treat them like numpy arrays including doing matrix multiplication and such and thats not possible with layered dicts.

我知道我可以做,dict_2d['a']['x']但从长远来看,我希望能够像 numpy 数组一样对待它们,包括进行矩阵乘法等,而分层 dicts 则无法做到这一点。

Its also not that hard to write up a simple version of the class where I just use the class to convert all the strings to int indexes and then use numpy but I'd like to use something that already exists if possible.

编写一个简单版本的类也不难,我只是使用该类将所有字符串转换为 int 索引,然后使用 numpy,但如果可能的话,我想使用已经存在的东西。

Edit: I don't need incredible performance. I'll be working with maybe 10x10 arrays. My goal is to make writing the code simple and robust. Working with numpy arrays is not really much different than just writing it in Fortran. I've spent enough of my life tracking down Fortran indexing errors...

编辑:我不需要令人难以置信的表现。我可能会使用 10x10 阵列。我的目标是使编写代码简单而健壮。使用 numpy 数组与仅在 Fortran 中编写并没有太大不同。我已经花了足够多的时间来追踪 Fortran 索引错误......

采纳答案by BrenBarn

You may be looking for pandas, which provides handy datatypes that wrap numpy arrays, allowing you do access rows and columns by name instead of just by number.

您可能正在寻找pandas,它提供了包装 numpy 数组的方便数据类型,允许您按名称而不是仅按数字访问行和列。

回答by jsbueno

I dislike giving ready made answers - but I think it would take much more time to explain it in English -

我不喜欢给出现成的答案 - 但我认为用英语解释它需要更多时间 -

The basic idea to fetchobjects the way numpy does is to customize the __getitem__method - comma separated values are presented to the method as tuples - you them just use the values in the tuple as indexes to your nested dictionaries in sequence.

以 numpy 的方式获取对象的基本思想是自定义__getitem__方法 - 逗号分隔的值作为元组呈现给方法 - 您只需将元组中的值用作顺序嵌套字典的索引。

Beyond that, Python made easy to create fully functional dict equivalentes with the collections.abc classes: if you implement a minimal set of methods when inhetiring from collections[.abc].MutableMapping, all dictionary behavior is emulated - (__getitem__, __setitem__, __delitem__, __iter__, __len__) - Then, it is just a matter of proper iterating through the key components, and create new, empty, regular dictionaries to store the needed values.

除此之外,Python 可以轻松地使用 collections.abc 类创建功能齐全的 dict 等价物:如果您在继承 from 时实现了最少的方法集,collections[.abc].MutableMapping则模拟所有字典行为 - ( __getitem__, __setitem__, __delitem__, __iter__, __len__) - 然后,这只是适当迭代的问题通过关键组件,并创建新的、空的、常规的字典来存储所需的值。

try:
    from collections import MutableMapping
except ImportError:
    # Python3 compatible import
    from collections.abc import MutableMapping

class NestedDict(MutableMapping):
    def __init__(self, *args, **kw):
        self.data = dict(*args, **kw)

    def get_last_key_levels(self, key, create=False):
        if not isinstance(key, tuple):
            key = (key,)
        current_data = self.data
        for subkey in key:
            previous = current_data
            current_data = current_data[subkey] if not create else current_data.setdefault(subkey, {})
        return previous, current_data, subkey

    def __getitem__(self, key):
        previous, current_data, lastkey = self.get_last_key_levels(key)
        return current_data

    def __setitem__(self, key, value):
        previous, current_data, lastkey = self.get_last_key_levels(key, True)
        previous[lastkey] = value

    def __delitem__(self, key):
        previous, current_data, lastkey = self.get_last_key_levels(key)
        del previous[lastkey]

    def __iter__(self):
        return iter(self.data)

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return "NestedDict({})".format(repr(self.data))

And you are set to go:

你准备去:

>>> from nesteddict import NestedDict
>>> x = NestedDict(a={})
NestedDict({'a': {}})
>>> x["a", "b"] = 10
>>> x
NestedDict({'a': {'b': 10}})
>>> x["a", "c", "e"]  = 25
>>> x
NestedDict({'a': {'c': {'e': 25}, 'b': 10}})
>>> x["a", "c", "e"] 
25
>>> 

Please note that this is a high-level implementation, which will just work, but you will have nowhere near the optimization level you get on NumPy with this - to the contrary. If you will need to perform fast data operations in these objects, you maybe could check "cython" - or resort to your idea of transposing the dict keys to nuemric keys and use NumPy (that idea could still pick some ideas from this answer)

请注意,这是一个高级实现,它可以正常工作,但是您将无法达到 NumPy 的优化级别 - 恰恰相反。如果您需要在这些对象中执行快速数据操作,您也许可以检查“cython” - 或者诉诸您将 dict 键转换为 nuemric 键并使用 NumPy 的想法(这个想法仍然可以从这个答案中选择一些想法)

回答by Ishan Tomar

Use pandas Lets say the file is like this:

使用 pandas 假设文件是​​这样的:

test.csv:

测试.csv:

Params, Val1, Val2, Val3
Par1,23,58,412
Par2,56,45,123
Par3,47,89,984

So you can do something like this in python:

所以你可以在 python 中做这样的事情:

import pandas as pd
x = pd.read_csv('test.csv', index_col='Params')
x['Val1']['Par3']
47