pandas 将熊猫数据帧单元格中的字典解析为新的行单元格（新列）

Question

提问by r_g_r

I have a Pandas Dataframe that contains one column containing cells containing a dictionary of key:value pairs, like this:

我有一个 Pandas 数据框，其中包含一列包含包含键：值对字典的单元格，如下所示：

{"name":"Test Thorton","company":"Test Group","address":"10850 Test #325\r\n","city":"Test City","state_province":"CA","postal_code":"95670","country":"USA","email_address":"[email protected]","phone_number":"999-888-3333","equipment_description":"I'm a big red truck\r\n\r\nRSN# 0000","response_desired":"week","response_method":"email"}

I'm trying to parse the dictionary, so the resulting Dataframe contains a new column for each key and the row is populated with the resulting values for each column, like this:

我正在尝试解析字典，因此生成的 Dataframe 包含每个键的新列，并且该行填充了每列的结果值，如下所示：

//Before

1  2  3  4  5
a  b  c  d  {6:y, 7:v}

//After

1  2  3  4  5           6  7
a  b  c  d  {6:y, 7:v}  y  v

Suggestions much appreciated.

非常感谢建议。

Answer 1

回答by piRSquared

consider df

考虑 df

df = pd.DataFrame([
        ['a', 'b', 'c', 'd', dict(F='y', G='v')],
        ['a', 'b', 'c', 'd', dict(F='y', G='v')],
    ], columns=list('ABCDE'))

df

   A  B  C  D                     E
0  a  b  c  d  {'F': 'y', 'G': 'v'}
1  a  b  c  d  {'F': 'y', 'G': 'v'}

Option 1
Use pd.Series.apply, assign new columns in place

选项 1
使用pd.Series.apply，就地分配新列

df.E.apply(pd.Series)

   F  G
0  y  v
1  y  v

Assign it like this

像这样分配

df[['F', 'G']] = df.E.apply(pd.Series)
df.drop('E', axis=1)

   A  B  C  D  F  G
0  a  b  c  d  y  v
1  a  b  c  d  y  v

Option 2
Pipeline the whole thing using the pd.DataFrame.assignmethod

选项 2
使用该pd.DataFrame.assign方法流水线化整个过程

df.drop('E', 1).assign(**pd.DataFrame(df.E.values.tolist()))

   A  B  C  D  F  G
0  a  b  c  d  y  v
1  a  b  c  d  y  v

Answer 2

回答by jezrael

I think you can use concat:

我认为你可以使用concat：

df = pd.DataFrame({1:['a','h'],2:['b','h'], 5:[{6:'y', 7:'v'},{6:'u', 7:'t'}] })

print (df)
   1  2                 5
0  a  b  {6: 'y', 7: 'v'}
1  h  h  {6: 'u', 7: 't'}

print (df.loc[:,5].values.tolist())
[{6: 'y', 7: 'v'}, {6: 'u', 7: 't'}]

df1 = pd.DataFrame(df.loc[:,5].values.tolist())
print (df1)
   6  7
0  y  v
1  u  t

print (pd.concat([df, df1], axis=1))
   1  2                 5  6  7
0  a  b  {6: 'y', 7: 'v'}  y  v
1  h  h  {6: 'u', 7: 't'}  u  t

Timings(len(df)=2k):

时间( len(df)=2k):

In [2]: %timeit (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))
100 loops, best of 3: 2.99 ms per loop

In [3]: %timeit (pir(df))
1 loop, best of 3: 625 ms per loop

df = pd.concat([df]*1000).reset_index(drop=True)

print (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1))


def pir(df):
    df[['F', 'G']] = df[5].apply(pd.Series)
    df.drop(5, axis=1)
    return df

print (pir(df))

pandas 将熊猫数据帧单元格中的字典解析为新的行单元格（新列）

提问by r_g_r

回答by piRSquared

回答by jezrael

相关推荐

最近更新

标签

pandas 将熊猫数据帧单元格中的字典解析为新的行单元格（新列）

提问by r_g_r

回答by piRSquared

回答by jezrael

相关推荐

用 Pandas 上的值注释条形图（在 Seaborn factorplot 条形图上）

pandas 熊猫排序 lambda 函数

pandas 我可以在 Android 上运行 Numpy（或其他 Python 包）吗？

pandas 在数据框中打印列名称和值

相关推荐

最近更新

标签