pandas python,将字典存储在数据帧中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16510492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:49:09  来源:igfitidea点击:

python, storing dictionaries inside a dataframe

pythondictionarydataframepandas

提问by Sylvansight

I've built a pandas dataframe which is storing a simple dictionary in each cell. For example:

我构建了一个 Pandas 数据框,它在每个单元格中存储了一个简单的字典。例如:

{'Sales':0,'Revenue':0}

I can retrieve a specific value from the dataframe via:

我可以通过以下方式从数据框中检索特定值:

df[columnA][index100]['Revenue']

But now I'd like to plot a graph of all the Revenue values from the dictionaries in columnA- what is the best way of achieving this?

但是现在我想从字典中绘制所有收入值的图表columnA- 实现这一目标的最佳方法是什么?

Would life be easier in the long run if I dropped the dictionaries and instead used two identically sized dataframes? (Am very new to pandas so not sure of best practice).

如果我放弃字典而是使用两个相同大小的数据帧,从长远来看,生活会更轻松吗?(我对Pandas很陌生,所以不确定最佳实践)。

采纳答案by BrenBarn

A simple way to get all the Revenue values from a column A is df[columnA].map(lambda v: v['Revenue']).

从 A 列获取所有收入值的简单方法是df[columnA].map(lambda v: v['Revenue'])

Depending on what you're doing, life may indeed be easier if you tweak your structure a bit. For instance, you could use a hierarchical index with "Sales" and "Revenue" as the keys in one level.

根据你在做什么,如果你稍微调整一下你的结构,生活确实会更容易。例如,您可以使用以“销售额”和“收入”作为一级键的分层索引。

回答by Andy Hayden

For the majority of use cases you it's not a good idea to be storing dictionaries in DataFrame.
Another datastructure worth mentioning is a Panel.

对于大多数用例,在 DataFrame 中存储字典并不是一个好主意。
另一个值得一提的数据结构是Panel

Suppose you have something a DataFrame of dictionaries (with fairly consistent keys):

假设您有一些字典的 DataFrame(具有相当一致的键):

In [11]: df = pd.DataFrame([[{'a': 1, 'b': 2}, {'a': 3, 'b': 4}], [{'a': 5, 'b': 6}, {'a': 7, 'b': 8}]], columns=list('AB'))

In [12]: df
Out[12]:
                  A                 B
0  {'a': 1, 'b': 2}  {'a': 3, 'b': 4}
1  {'a': 5, 'b': 6}  {'a': 7, 'b': 8}

You can create a Panel (note there are more direct/preferable ways to construct this!):

您可以创建一个面板(请注意,有更直接/更可取的方法来构建它!):

In [13]: wp = pd.Panel({'A': df['A'].apply(pd.Series), 'B': df['B'].apply(pd.Series)})

In [14]: wp
Out[14]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 2 (major_axis) x 2 (minor_axis)
Items axis: A to B
Major_axis axis: 0 to 1
Minor_axis axis: a to b

Sections of which can be accessed efficiently as DataFrames in a variety of ways, for example:

其中的部分可以通过多种方式作为 DataFrame 有效访问,例如:

In [15]: wp.A
Out[15]:
   a  b
0  1  2
1  5  6

In [16]: wp.minor_xs('a')
Out[16]:
   A  B
0  1  3
1  5  7

In [17]: wp.major_xs(0)
Out[17]:
   A  B
a  1  3
b  2  4

So you can do all the pandas DataFrame whizziness:

所以你可以做所有的pandas DataFrame whizziness:

In [18]: wp.A.plot()  # easy!
Out[18]: <matplotlib.axes.AxesSubplot at 0x1048342d0>

There are also ("experimental") higher dimensional Panels.

还有(“实验性”)更高维度的 Panels