Pandas:将 DataFrame 对象存储在另一个 DataFrame 中,即嵌套的 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37950087/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Storing a DataFrame object inside another DataFrame i.e. nested DataFrame
提问by wolframalpha
I want to store a DataFrame
object as a value of the column of a row:
Here's a simplified analogy of what I want to achieve.
我想将一个DataFrame
对象存储为一行列的值:这是我想要实现的简化类比。
>>> df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))
>>> df
166: D E F
0 1 2 3
1 2 4 6
I created a new DataFrame and add a new column on the go as I insert the new DataFrame
object as a value of the new column. Please refer to the code.
当我将新DataFrame
对象作为新列的值插入时,我创建了一个新的 DataFrame 并在旅途中添加了一个新列。请参考代码。
>>> df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))
>>> df.loc[df['F'] == 6, 'G'] = df_in_df
>>> df
D E F G
0 1 2 3 NaN
1 2 4 6 NaN
>>> df.loc[df['F'] == 6, 'G'].item()
nan
>>> # But the below works fine, i.e. when I insert an integer
>>> df.loc[df['F'] == 6, 'G'] = 4
>>> df
>>> D E F G
0 1 2 3 NaN
1 2 4 6 4.0
>>> # and to verify
>>> df.loc[df['F'] == 6, 'G'].item()
4.0
BTW I have managed to find a workaround over this by pickling the DataFrame into a string but I don't feel any good about it:
顺便说一句,我设法通过将 DataFrame 酸洗为字符串来找到解决此问题的方法,但我对此感觉不佳:
df.loc[df['F'] == 6, 'G'] = pickle.dumps(df_in_df)
>>> df
187: D E F G
0 1 2 3 NaN
1 2 4 6 ccopy_reg\n_reconstructor\np0\n(cpandas.core.f...
>>> revive_df_from_df = pickle.loads(df.loc[df['F'] == 6, 'G'].item())
>>> revive_df_from_df
191: X Y Z
0 11 13 17
1 19 23 31
I started using pandas today itself after referring through pandas in 10 mins, So I don't know the conventions, Any better ideas ? Thanks!
我在 10 分钟内通过大Pandas参考后,今天开始使用大Pandas,所以我不知道惯例,还有更好的主意吗?谢谢!
回答by Merlin
Create a Dict first:
首先创建一个字典:
x = pd.DataFrame()
y = {'a':[5,4,5],'b':[6,9,7], 'c':[7,3,x]}
# {'a': [5, 4, 5], 'b': [6, 9, 7], 'c': [7, 3, Empty DataFrame
# Columns: []
# Index: []]}
z = pd.DataFrame(y)
# a b c
# 0 5 6 7
# 1 4 9 3
# 2 5 7 Empty DataFrame
# Columns: []
# Index: []
# In [ ]:
(or, convert the DataFrame to dict and try to insert it. There is a lot happening ,when pandas creates objects.. You are torturing pandas. Your use case implies nested dicts, I would use that. )
(或者,将 DataFrame 转换为 dict 并尝试插入它。当大Pandas创建对象时,发生了很多事情。你在折磨大Pandas。你的用例暗示了嵌套的字典,我会使用它。)
回答by piRSquared
You are on shaky ground relying on this behavior. pandas does a lot of work trying to infer what you mean or want when passing array like things to its constructors and assignment functions. This is pressing on those boundaries, seemingly intentionally.
依靠这种行为,你处于不稳定的状态。在将类似数组的东西传递给其构造函数和赋值函数时,pandas 做了很多工作,试图推断出你的意思或想要什么。这似乎是有意为之。
It seems that direct assignment via loc
doesn't work. This is a work around I've found. Again, I would notexpect this behavior to be robust over pandas versions.
似乎通过直接分配loc
不起作用。这是我发现的一项工作。同样,我不希望这种行为比 Pandas 版本更健壮。
df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))
df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))
df.loc[df['F'] == 6, 'G'] = np.nan
df.loc[df['F'] == 6, 'G'] = df.loc[df['F'] == 6, ['G']].applymap(lambda x: df_in_df)
df
回答by Vincent Appiah
First create the column where you want to insert the dictionary. Then convert your dictionary to a string using the repr function. Then insert the string dictionary to your column. If you want to query that string. First select it and then use eval(dict) to convert to dictionary again and use.
首先创建要插入字典的列。然后使用 repr 函数将您的字典转换为字符串。然后将字符串字典插入到您的列中。如果要查询该字符串。首先选中它,然后使用 eval(dict) 再次转换为字典并使用。