Python 熊猫数据框中的字典列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29325458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:24:15  来源:igfitidea点击:

Dictionary column in pandas dataframe

pythondictionarypandas

提问by user1274037

I've got a csv that I'm reading into a pandas dataframe. However one of the columns is in the form of a dictionary. Here is an example:

我有一个 csv,我正在读入一个 Pandas 数据框。然而,其中一列是字典的形式。下面是一个例子:

ColA, ColB, ColC, ColdD
20, 30, {"ab":"1", "we":"2", "as":"3"},"String"

How can I turn this into a dataframe that looks like this:

我怎样才能把它变成一个看起来像这样的数据框:

ColA, ColB, AB, WE, AS, ColdD
20, 30, "1", "2", "3", "String"

editI fixed up the question, it looks like this but is a string that needs to be parsed, not dict object.

编辑我解决了这个问题,它看起来像这样,但它是一个需要解析的字符串,而不是 dict 对象。

回答by jedwards

What about something like:

怎么样:

import pandas as pd

# Create mock dataframe
df = pd.DataFrame([
    [20, 30, {'ab':1, 'we':2, 'as':3}, 'String1'],
    [21, 31, {'ab':4, 'we':5, 'as':6}, 'String2'],
    [22, 32, {'ab':7, 'we':8, 'as':9}, 'String2'],
], columns=['Col A', 'Col B', 'Col C', 'Col D'])

# Create dataframe where you'll store the dictionary values
ddf = pd.DataFrame(columns=['AB','WE','AS'])

# Populate ddf dataframe
for (i,r) in df.iterrows():
    e = r['Col C']
    ddf.loc[i] = [e['ab'], e['we'], e['as']]

# Replace df with the output of concat(df, ddf)
df = pd.concat([df, ddf], axis=1)

# New column order, also drops old Col C column
df = df[['Col A', 'Col B', 'AB', 'WE', 'AS', 'Col D']]

print(df)

Output:

输出:

   Col A  Col B  AB  WE  AS    Col D
0     20     30   1   2   3  String1
1     21     31   4   5   6  String2
2     22     32   7   8   9  String2

回答by Bob Haffner

So starting with your one row df

所以从你的一行 df 开始

    Col A   Col B   Col C                           Col D
0   20      30      {u'we': 2, u'ab': 1, u'as': 3}  String1

EDIT: based on the comment by the OP, I'm assuming we need to convert the string first

编辑:根据 OP 的评论,我假设我们需要先转换字符串

import ast
df["ColC"] =  df["ColC"].map(lambda d : ast.literal_eval(d))

then we convert Col C to a dict, transpose it and then join it to the original df

然后我们将 Col C 转换为 dict,将其转置,然后将其加入原始 df

dfNew = df.join(pd.DataFrame(df["Col C"].to_dict()).T)
dfNew

which gives you this

这给了你这个

    Col A   Col B   Col C                           Col D   ab  as  we
0   20      30      {u'we': 2, u'ab': 1, u'as': 3}  String1 1   3   2

Then we just select the columns we want in dfNew

然后我们只需要在dfNew中选择我们想要的列

dfNew[["Col A", "Col B", "ab", "we", "as", "Col D"]]

    Col A   Col B   ab  we  as  Col D
0   20      30      1   2   3   String1

回答by psychemedia

As per https://stackoverflow.com/a/38231651/454773, you can use .apply(pd.Series)to map the dict containing column onto new columns and then concatenate these new columns back into the original dataframe minus the original dict containing column:

根据https://stackoverflow.com/a/38231651/454773,您可以使用.apply(pd.Series)将包含 dict 的列映射到新列,然后将这些新列连接回原始数据帧减去原始 dict 包含列:

dw=pd.DataFrame( [[20, 30, {"ab":"1", "we":"2", "as":"3"},"String"]],
                columns=['ColA', 'ColB', 'ColC', 'ColdD'])
pd.concat([dw.drop(['ColC'], axis=1), dw['ColC'].apply(pd.Series)], axis=1)

Returns:

返回:

ColA    ColB    ColdD   ab  as  we
20      30      String  1   3   2