python pandas数据框到字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18695605/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:30:21  来源:igfitidea点击:

python pandas dataframe to dictionary

pythondictionarypandas

提问by perigee

I've a two columns dataframe, and intend to convert it to python dictionary - the first column will be the key and the second will be the value. Thank you in advance.

我有一个两列数据框,并打算将其转换为 python 字典 - 第一列将是键,第二列将是值。先感谢您。

Dataframe:

数据框:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

回答by joris

See the docs for to_dict. You can use it like this:

请参阅文档to_dict。你可以这样使用它:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

如果你只有一列,为了避免列名也是字典中的一个级别(实际上,在这种情况下你使用Series.to_dict()):

df.set_index('id')['value'].to_dict()

回答by dalloliogm

The answers by joris in this thread and by punchagan in the duplicated threadare very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

此线程中 joris 和重复线程中的 Punchagan 的答案非常优雅,但是如果用于键的列包含任何重复值,它们将不会给出正确的结果。

For example:

例如:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

如果您有重复的条目并且不想丢失它们,您可以使用这个丑陋但有效的代码:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

回答by DSM

If you want a simple way to preserve duplicates, you could use groupby:

如果您想要一种简单的方法来保留重复项,您可以使用groupby

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

回答by praful gupta

mydict = dict(zip(df.id, df.value))

回答by user1376377

Another (slightly shorter) solution for not losing duplicate entries:

另一个(略短)不丢失重复条目的解决方案:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

回答by Vincent Appiah

in some versions the code below might not work

在某些版本中,下面的代码可能不起作用

mydict = dict(zip(df.id, df.value))

so make it explicit

所以要明确

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Notei used id_ because the word id is reserved word

注意我使用了 id_ 因为这个词 id 是保留字

回答by Dmitry

You need a list as a dictionary value. This code will do the trick.

您需要一个列表作为字典值。这段代码可以解决问题。

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

回答by Dongwan Kim

You can use 'dict comprehension'

您可以使用“字典理解”

my_dict = {row[0]: row[1] for row in df.values}

回答by Gil Baggio

Simplest solution:

最简单的解决方案:

df.set_index('id').T.to_dict('records')

Example:

例子:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3,etc and u want them as lists, then use the below code:

如果您有多个值,例如 val1、val2、val3 等,并且您希望将它们作为列表,请使用以下代码:

df.set_index('id').T.to_dict('list')

回答by SummersKing

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

this is my sloution, a basic loop

这是我的 slotion,一个基本的循环