pandas 将熊猫数据帧单元格中的字典解析为新的行单元格(新列)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39640936/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
parsing a dictionary in a pandas dataframe cell into new row cells (new columns)
提问by r_g_r
I have a Pandas Dataframe that contains one column containing cells containing a dictionary of key:value pairs, like this:
我有一个 Pandas 数据框,其中包含一列包含包含键:值对字典的单元格,如下所示:
{"name":"Test Thorton","company":"Test Group","address":"10850 Test #325\r\n","city":"Test City","state_province":"CA","postal_code":"95670","country":"USA","email_address":"[email protected]","phone_number":"999-888-3333","equipment_description":"I'm a big red truck\r\n\r\nRSN# 0000","response_desired":"week","response_method":"email"}
I'm trying to parse the dictionary, so the resulting Dataframe contains a new column for each key and the row is populated with the resulting values for each column, like this:
我正在尝试解析字典,因此生成的 Dataframe 包含每个键的新列,并且该行填充了每列的结果值,如下所示:
//Before
1 2 3 4 5
a b c d {6:y, 7:v}
//After
1 2 3 4 5 6 7
a b c d {6:y, 7:v} y v
Suggestions much appreciated.
非常感谢建议。
回答by piRSquared
consider df
考虑 df
df = pd.DataFrame([
['a', 'b', 'c', 'd', dict(F='y', G='v')],
['a', 'b', 'c', 'd', dict(F='y', G='v')],
], columns=list('ABCDE'))
df
A B C D E
0 a b c d {'F': 'y', 'G': 'v'}
1 a b c d {'F': 'y', 'G': 'v'}
Option 1
Use pd.Series.apply
, assign new columns in place
选项 1
使用pd.Series.apply
,就地分配新列
df.E.apply(pd.Series)
F G
0 y v
1 y v
Assign it like this
像这样分配
df[['F', 'G']] = df.E.apply(pd.Series)
df.drop('E', axis=1)
A B C D F G
0 a b c d y v
1 a b c d y v
Option 2
Pipeline the whole thing using the pd.DataFrame.assign
method
选项 2
使用该pd.DataFrame.assign
方法流水线化整个过程
df.drop('E', 1).assign(**pd.DataFrame(df.E.values.tolist()))
A B C D F G
0 a b c d y v
1 a b c d y v
回答by jezrael
I think you can use concat
:
我认为你可以使用concat
:
df = pd.DataFrame({1:['a','h'],2:['b','h'], 5:[{6:'y', 7:'v'},{6:'u', 7:'t'}] })
print (df)
1 2 5
0 a b {6: 'y', 7: 'v'}
1 h h {6: 'u', 7: 't'}
print (df.loc[:,5].values.tolist())
[{6: 'y', 7: 'v'}, {6: 'u', 7: 't'}]
df1 = pd.DataFrame(df.loc[:,5].values.tolist())
print (df1)
6 7
0 y v
1 u t
print (pd.concat([df, df1], axis=1))
1 2 5 6 7
0 a b {6: 'y', 7: 'v'} y v
1 h h {6: 'u', 7: 't'} u t
Timings(len(df)=2k
):
时间( len(df)=2k
):
In [2]: %timeit (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))
100 loops, best of 3: 2.99 ms per loop
In [3]: %timeit (pir(df))
1 loop, best of 3: 625 ms per loop
df = pd.concat([df]*1000).reset_index(drop=True)
print (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))
def pir(df):
df[['F', 'G']] = df[5].apply(pd.Series)
df.drop(5, axis=1)
return df
print (pir(df))