Pandas:将 DataFrame 对象存储在另一个 DataFrame 中,即嵌套的 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37950087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:26:13  来源:igfitidea点击:

Pandas: Storing a DataFrame object inside another DataFrame i.e. nested DataFrame

pythonpandasdataframe

提问by wolframalpha

I want to store a DataFrameobject as a value of the column of a row: Here's a simplified analogy of what I want to achieve.

我想将一个DataFrame对象存储为一行列的值:这是我想要实现的简化类比。

>>> df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))
>>> df    
166:    D  E  F
     0  1  2  3
     1  2  4  6

I created a new DataFrame and add a new column on the go as I insert the new DataFrameobject as a value of the new column. Please refer to the code.

当我将新DataFrame对象作为新列的值插入时,我创建了一个新的 DataFrame 并在旅途中添加了一个新列。请参考代码。

>>> df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))
>>> df.loc[df['F'] == 6, 'G'] = df_in_df
>>> df
   D  E  F   G
0  1  2  3 NaN
1  2  4  6 NaN
>>> df.loc[df['F'] == 6, 'G'].item()
    nan
>>> # But the below works fine, i.e. when I insert an integer
>>> df.loc[df['F'] == 6, 'G'] = 4
>>> df
>>>   D  E  F    G
   0  1  2  3  NaN
   1  2  4  6  4.0
>>> # and to verify 
>>> df.loc[df['F'] == 6, 'G'].item()
    4.0

BTW I have managed to find a workaround over this by pickling the DataFrame into a string but I don't feel any good about it:

顺便说一句,我设法通过将 DataFrame 酸洗为字符串来找到解决此问题的方法,但我对此感觉不佳:

df.loc[df['F'] == 6, 'G'] = pickle.dumps(df_in_df)
>>> df
187:    D  E  F                                                  G
     0  1  2  3                                                NaN
     1  2  4  6  ccopy_reg\n_reconstructor\np0\n(cpandas.core.f...

>>> revive_df_from_df = pickle.loads(df.loc[df['F'] == 6, 'G'].item())
>>> revive_df_from_df
191:     X   Y   Z
     0  11  13  17
     1  19  23  31

I started using pandas today itself after referring through pandas in 10 mins, So I don't know the conventions, Any better ideas ? Thanks!

我在 10 分钟内通过大Pandas参考后,今天开始使用大Pandas,所以我不知道惯例,还有更好的主意吗?谢谢!

回答by Merlin

Create a Dict first:

首先创建一个字典:

x = pd.DataFrame()

y =  {'a':[5,4,5],'b':[6,9,7], 'c':[7,3,x]}

# {'a': [5, 4, 5], 'b': [6, 9, 7], 'c': [7, 3, Empty DataFrame
#   Columns: []
#   Index: []]}

z = pd.DataFrame(y)

#   a  b                                      c
# 0  5  6                                      7
# 1  4  9                                      3
# 2  5  7  Empty DataFrame
# Columns: []
# Index: []
# In [ ]:

(or, convert the DataFrame to dict and try to insert it. There is a lot happening ,when pandas creates objects.. You are torturing pandas. Your use case implies nested dicts, I would use that. )

(或者,将 DataFrame 转换为 dict 并尝试插入它。当大Pandas创建对象时,发生了很多事情。你在折磨大Pandas。你的用例暗示了嵌套的字典,我会使用它。)

回答by piRSquared

You are on shaky ground relying on this behavior. pandas does a lot of work trying to infer what you mean or want when passing array like things to its constructors and assignment functions. This is pressing on those boundaries, seemingly intentionally.

依靠这种行为,你处于不稳定的状态。在将类似数组的东西传递给其构造函数和赋值函数时,pandas 做了很多工作,试图推断出你的意思或想要什么。这似乎是有意为之。

It seems that direct assignment via locdoesn't work. This is a work around I've found. Again, I would notexpect this behavior to be robust over pandas versions.

似乎通过直接分配loc不起作用。这是我发现的一项工作。同样,我希望这种行为比 Pandas 版本更健壮。

df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))

df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))

df.loc[df['F'] == 6, 'G'] = np.nan
df.loc[df['F'] == 6, 'G'] = df.loc[df['F'] == 6, ['G']].applymap(lambda x: df_in_df)

df

enter image description here

在此处输入图片说明

回答by Vincent Appiah

First create the column where you want to insert the dictionary. Then convert your dictionary to a string using the repr function. Then insert the string dictionary to your column. If you want to query that string. First select it and then use eval(dict) to convert to dictionary again and use.

首先创建要插入字典的列。然后使用 repr 函数将您的字典转换为字符串。然后将字符串字典插入到您的列中。如果要查询该字符串。首先选中它,然后使用 eval(dict) 再次转换为字典并使用。