从 Pandas DataFrame 构建 NetworkX 图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21207872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:35:25  来源:igfitidea点击:

Construct NetworkX graph from Pandas DataFrame

pythonpandasnetworkx

提问by urschrei

I'd like to create some NetworkX graphs from a simple Pandas DataFrame:

我想从一个简单的 Pandas DataFrame 创建一些 NetworkX 图:

        Loc 1   Loc 2   Loc 3   Loc 4   Loc 5   Loc 6   Loc 7
Foo     0       0       1       1       0       0           0
Bar     0       0       1       1       0       1           1
Baz     0       0       1       0       0       0           0
Bat     0       0       1       0       0       1           0
Quux    1       0       0       0       0       0           0

Where Foo…is the index, and Loc 1to Loc 7are the columns. But converting to Numpy matrices or recarrays doesn't seem to work for generating input for nx.Graph(). Is there a standard strategy for achieving this? I'm not averse the reformatting the data in Pandas --> dumping to CSV --> importing to NetworkX, but it seems as if I should be able to generate the edges from the index and the nodes from the values.

whereFoo…是索引,Loc 1toLoc 7是列。但是转换为 Numpy 矩阵或 recarrays 似乎不适用于为nx.Graph(). 是否有实现这一目标的标准策略?我不反对在 Pandas 中重新格式化数据 --> 转储到 CSV --> 导入到 NetworkX,但似乎我应该能够从索引生成边和从值生成节点。

采纳答案by Andy Hayden

NetworkX expects a square matrix(of nodes and edges), perhaps* you want to pass it:

NetworkX 需要一个方阵(节点和边),也许*您想传递它:

In [11]: df2 = pd.concat([df, df.T]).fillna(0)

Note: It's important that the index and columns are in the same order!

注意:索引和列的顺序相同很重要!

In [12]: df2 = df2.reindex(df2.columns)

In [13]: df2
Out[13]: 
       Bar  Bat  Baz  Foo  Loc 1  Loc 2  Loc 3  Loc 4  Loc 5  Loc 6  Loc 7  Quux
Bar      0    0    0    0      0      0      1      1      0      1      1     0
Bat      0    0    0    0      0      0      1      0      0      1      0     0
Baz      0    0    0    0      0      0      1      0      0      0      0     0
Foo      0    0    0    0      0      0      1      1      0      0      0     0
Loc 1    0    0    0    0      0      0      0      0      0      0      0     1
Loc 2    0    0    0    0      0      0      0      0      0      0      0     0
Loc 3    1    1    1    1      0      0      0      0      0      0      0     0
Loc 4    1    0    0    1      0      0      0      0      0      0      0     0
Loc 5    0    0    0    0      0      0      0      0      0      0      0     0
Loc 6    1    1    0    0      0      0      0      0      0      0      0     0
Loc 7    1    0    0    0      0      0      0      0      0      0      0     0
Quux     0    0    0    0      1      0      0      0      0      0      0     0

In[14]: graph = nx.from_numpy_matrix(df2.values)

This doesn't pass the column/index names to the graph, if you wanted to do that you could use relabel_nodes(you may have to be wary of duplicates, which are allowed in pandas' DataFrames):

这不会将列/索引名称传递给图形,如果您想这样做,您可以使用relabel_nodes(您可能必须警惕重复项,这在 Pandas 的 DataFrames 中是允许的):

In [15]: graph = nx.relabel_nodes(graph, dict(enumerate(df2.columns))) # is there nicer  way than dict . enumerate ?

*It's unclear exactly what the columns and index represent for the desired graph.

*目前尚不清楚列和索引代表所需图形的确切含义。

回答by Agoston T

A little late answer, but now networkx can read data from pandas dataframes, in that case ideally the format is the following for a simple directed graph:

回答有点晚,但现在 networkx 可以从 pandas 数据帧读取数据,在这种情况下,理想情况下,简单有向图的格式如下:

+----------+---------+---------+
|   Source |  Target |  Weight |
+==========+=========+=========+
| Node_1   | Node_2  |   0.2   |
+----------+---------+---------+
| Node_2   | Node_1  |   0.6   |   
+----------+---------+---------+

If you are using adjacency matrixes then Andy Hayden is right, you should take care of the correct format. Since in your question you used 0 and 1, I guess you would like to see an undirected graph. It may seem counterintuitive first since you said Index represents e.g. a person, and columns represent groups to which a given person belongs, but it's correct also in the other way a group (membership) belongs to a person. Following this logic, you should actually put the groups in indexes and the persons in columns too.

如果您使用的是邻接矩阵,那么 Andy Hayden 是对的,您应该注意正确的格式。由于在您的问题中您使用了 0 和 1,我猜您希望看到一个无向图。由于您说Index 表示例如一个人,列表示给定的人所属的组,这首先看起来可能有悖常理,但从另一方面来说,组(成员资格)属于一个人也是正确的。按照这个逻辑,您实际上应该将组放在索引中,将人员也放在列中。

Just a side note: You can also define this problem in the sense of a directed graph, for example you would like to visualize an association network of hierarchical categories. There, the association e.g. from Samwise Gamgee to Hobbits is stronger than in the other direction usually (since Frodo Baggins is more likely the Hobbit prototype)

只是一个旁注:您也可以在有向图的意义上定义这个问题,例如您想可视化分层类别的关联网络。在那里,例如从 Samwise Gamgee 到 Hobbits 的关联通常比在另一个方向上更强(因为 Frodo Baggins 更有可能是霍比特人的原型)

回答by tmsss

You can also use scipy to create the square matrix like this:

您还可以使用 scipy 来创建这样的方阵:

import scipy.sparse as sp

cols = df.columns
X = sp.csr_matrix(df.astype(int).values)
Xc = X.T * X  # multiply sparse matrix
Xc.setdiag(0)  # reset diagonal

# create dataframe from co-occurence matrix in dense format
df = pd.DataFrame(Xc.todense(), index=cols, columns=cols)

Later on you can create an edge list from the dataframe and import it into Networkx:

稍后您可以从数据帧创建边缘列表并将其导入 Networkx:

df = df.stack().reset_index()
df.columns = ['source', 'target', 'weight']

df = df[df['weight'] != 0]  # remove non-connected nodes

g = nx.from_pandas_edgelist(df, 'source', 'target', ['weight'])