Python 将 Pandas DataFrame 转换为字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26716616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:54:45  来源:igfitidea点击:

Convert a Pandas DataFrame to a dictionary

pythonpandasdictionarydataframe

提问by Prince Bhatti

I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keysand the elements of other columns in same row be values.

我有一个四列的 DataFrame。我想将此 DataFrame 转换为 python 字典。我希望第一列keys的元素为 ,同一行中其他列的元素为values

DataFrame:

数据框:

    ID   A   B   C
0   p    1   3   2
1   q    4   3   2
2   r    4   0   9  

Output should be like this:

输出应该是这样的:

Dictionary:

字典:

{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}

采纳答案by Alex Riley

The to_dict()method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.

to_dict()方法将列名设置为字典键,因此您需要稍微重塑 DataFrame。将“ID”列设置为索引,然后转置 DataFrame 是实现此目的的一种方法。

to_dict()also accepts an 'orient' argument which you'll need in order to output a listof values for each column. Otherwise, a dictionary of the form {index: value}will be returned for each column.

to_dict()还接受一个 'orient' 参数,您需要它来输出每列的值列表。否则,{index: value}将为每一列返回该表单的字典。

These steps can be done with the following line:

这些步骤可以通过以下行完成:

>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}


In case a different dictionary format is needed, here are examples of the possible orient arguments. Consider the following simple DataFrame:

如果需要不同的字典格式,以下是可能的 orient 参数示例。考虑以下简单的 DataFrame:

>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

Then the options are as follows.

然后选项如下。

dict- the default: column names are keys, values are dictionaries of index:data pairs

dict- 默认值:列名是键,值是索引:数据对的字典

>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'}, 
 'b': {0: 0.5, 1: 0.25, 2: 0.125}}

list- keys are column names, values are lists of column data

list- 键是列名,值是列数据列表

>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'], 
 'b': [0.5, 0.25, 0.125]}

series- like 'list', but values are Series

series- 像“列表”,但值是系列

>>> df.to_dict('series')
{'a': 0       red
      1    yellow
      2      blue
      Name: a, dtype: object, 

 'b': 0    0.500
      1    0.250
      2    0.125
      Name: b, dtype: float64}

split- splits columns/data/index as keys with values being column names, data values by row and index labels respectively

split- 将列/数据/索引拆分为键,值分别为列名、数据值按行和索引标签

>>> df.to_dict('split')
{'columns': ['a', 'b'],
 'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
 'index': [0, 1, 2]}

records- each row becomes a dictionary where key is column name and value is the data in the cell

记录- 每行成为一个字典,其中键是列名,值是单元格中的数据

>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5}, 
 {'a': 'yellow', 'b': 0.25}, 
 {'a': 'blue', 'b': 0.125}]

index- like 'records', but a dictionary of dictionaries with keys as index labels (rather than a list)

index- 像'records',但是一个字典的字典,以键作为索引标签(而不是一个列表)

>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
 1: {'a': 'yellow', 'b': 0.25},
 2: {'a': 'blue', 'b': 0.125}}

回答by Prince Bhatti

Try to use Zip

尝试使用 Zip

df = pd.read_csv("file")
d= dict([(i,[a,b,c ]) for i, a,b,c in zip(df.ID, df.A,df.B,df.C)])
print d

Output:

输出:

{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

回答by Farhad Maleki

Follow these steps:

按着这些次序:

Suppose your dataframe is as follows:

假设您的数据框如下:

>>> df
   A  B  C ID
0  1  3  2  p
1  4  3  2  q
2  4  0  9  r

1. Use set_indexto set IDcolumns as the dataframe index.

1.set_index用于将ID列设置为数据框索引。

    df.set_index("ID", drop=True, inplace=True)

2. Use the orient=indexparameter to have the index as dictionary keys.

2. 使用orient=index参数将索引作为字典键。

    dictionary = df.to_dict(orient="index")

The results will be as follows:

结果如下:

    >>> dictionary
    {'q': {'A': 4, 'B': 3, 'D': 2}, 'p': {'A': 1, 'B': 3, 'D': 2}, 'r': {'A': 4, 'B': 0, 'D': 9}}

3. If you need to have each sample as a list run the following code. Determine the column order

3. 如果您需要将每个样本作为列表,请运行以下代码。确定列顺序

column_order= ["A", "B", "C"] #  Determine your preferred order of columns
d = {} #  Initialize the new dictionary as an empty dictionary
for k in dictionary:
    d[k] = [dictionary[k][column_name] for column_name in column_order]

回答by Kamil Sindi

If you don't mind the dictionary values being tuples, you can use itertuples:

如果你不介意字典值是元组,你可以使用 itertuples:

>>> {x[0]: x[1:] for x in df.itertuples(index=False)}
{'p': (1, 3, 2), 'q': (4, 3, 2), 'r': (4, 0, 9)}

回答by Umer

DataFrame.to_dict()converts DataFrame to dictionary.

DataFrame.to_dict()将 DataFrame 转换为字典。

Example

例子

>>> df = pd.DataFrame(
    {'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])
>>> df
   col1  col2
a     1   0.1
b     2   0.2
>>> df.to_dict()
{'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}

See this Documentationfor details

有关详细信息,请参阅此文档

回答by Victoria Stuart

For my use (node names with xy positions) I found @user4179775's answer to the most helpful / intuitive:

对于我的使用(带有 xy 位置的节点名称),我找到了 @user4179775 对最有用/最直观的回答:

import pandas as pd

df = pd.read_csv('glycolysis_nodes_xy.tsv', sep='\t')

df.head()
    nodes    x    y
0  c00033  146  958
1  c00031  601  195
...

xy_dict_list=dict([(i,[a,b]) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_list
{'c00022': [483, 868],
 'c00024': [146, 868],
 ... }

xy_dict_tuples=dict([(i,(a,b)) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_tuples
{'c00022': (483, 868),
 'c00024': (146, 868),
 ... }


Addendum

附录

I later returned to this issue, for other, but related, work. Here is an approach that more closely mirrors the [excellent] accepted answer.

后来我又回到这个问题,为其他但相关的工作。这是一种更接近于[优秀]已接受答案的方法。

node_df = pd.read_csv('node_prop-glycolysis_tca-from_pg.tsv', sep='\t')

node_df.head()
   node  kegg_id kegg_cid            name  wt  vis
0  22    22       c00022   pyruvate        1   1
1  24    24       c00024   acetyl-CoA      1   1
...

Convert Pandas dataframe to a [list], {dict}, {dict of {dict}}, ...

将 Pandas 数据帧转换为 [list]、{dict}、{dict of {dict}}、...

Per accepted answer:

每个接受的答案:

node_df.set_index('kegg_cid').T.to_dict('list')

{'c00022': [22, 22, 'pyruvate', 1, 1],
 'c00024': [24, 24, 'acetyl-CoA', 1, 1],
 ... }

node_df.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'kegg_id': 22, 'name': 'pyruvate', 'node': 22, 'vis': 1, 'wt': 1},
 'c00024': {'kegg_id': 24, 'name': 'acetyl-CoA', 'node': 24, 'vis': 1, 'wt': 1},
 ... }

In my case, I wanted to do the same thing but with selected columns from the Pandas dataframe, so I needed to slice the columns. There are two approaches.

就我而言,我想做同样的事情,但使用 Pandas 数据框中的选定列,因此我需要对列进行切片。有两种方法。

  1. Directly:
  1. 直接地:

(see: Convert pandas to dictionary defining the columns used fo the key values)

(请参阅:将熊猫转换为定义用于键值的列的字典

node_df.set_index('kegg_cid')[['name', 'wt', 'vis']].T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }
  1. "Indirectly:" first, slice the desired columns/data from the Pandas dataframe (again, two approaches),
  1. “间接:”首先,从 Pandas 数据框中切出所需的列/数据(同样,两种方法),
node_df_sliced = node_df[['kegg_cid', 'name', 'wt', 'vis']]

or

或者

node_df_sliced2 = node_df.loc[:, ['kegg_cid', 'name', 'wt', 'vis']]

that can then can be used to create a dictionary of dictionaries

然后可以用来创建字典字典

node_df_sliced.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }

回答by Muhammad Moiz Ahmed

should a dictionary like:

应该像这样的字典:

{'red': '0.500', 'yellow': '0.250, 'blue': '0.125'}

be required out of a dataframe like:

需要从数据帧中提取,例如:

        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

simplest way would be to do:

最简单的方法是这样做:

dict(df.values.tolist())

working snippet below:

下面的工作片段:

import pandas as pd
df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
dict(df.values.tolist())

enter image description here

在此处输入图片说明