pandas 将具有属性和边的节点从 DataFrame 加载到 NetworkX

Question

提问by José

I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:

我是使用 Python 处理图形的新手：NetworkX。到目前为止，我一直在使用 Gephi。标准步骤（但不是唯一可能的）是：

Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:
```
id;NormalizedName;Gender
per1;Jesús;male
per2;Abraham;male
per3;Isaac;male
per4;Jacob;male
per5;Judá;male
per6;Tamar;female
...
```
Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):
```
Target;Source;Weight;Type
per1;per2;3;Undirected
per3;per4;2;Undirected
...
```

从表格/电子表格加载节点信息；其中一列应该是 ID，其余的列是关于节点的元数据（节点是人，所以性别，组......通常用于着色）。喜欢：
```
id;NormalizedName;Gender
per1;Jesús;male
per2;Abraham;male
per3;Isaac;male
per4;Jacob;male
per5;Judá;male
per6;Tamar;female
...
```
然后也从表/电子表格中加载边，使用与节点电子表格的列 ID 相同的节点名称，通常有四列（目标、来源、权重和类型）：
```
Target;Source;Weight;Type
per1;per2;3;Undirected
per3;per4;2;Undirected
...
```

This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:

这是我拥有的两个数据帧，我想在 Python 中加载它们。阅读有关 NetworkX 的信息，似乎不太可能将两个表（一个用于节点，一个用于边）加载到同一个图中，我不确定最好的方法是什么：

Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?
Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?

我是否应该仅使用 DataFrame 中的节点信息创建图形，然后添加（附加）来自其他 DataFrame 的边？如果是这样并且由于 nx.from_pandas_dataframe() 需要有关边缘的信息，我想我不应该使用它来创建节点......我应该将信息作为列表传递吗？
我是否应该仅使用来自 DataFrame 的边信息创建一个图形，然后将来自其他 DataFrame 的信息作为属性添加到每个节点？有没有比迭代 DataFrame 和节点更好的方法呢？

Answer 1

回答by harryscholes

Create the weighted graph from the edge table using nx.from_pandas_dataframe:

使用nx.from_pandas_dataframe以下方法从边表创建加权图：

import networkx as nx
import pandas as pd

edges = pd.DataFrame({'source' : [0, 1],
                      'target' : [1, 2],
                      'weight' : [100, 50]})

nodes = pd.DataFrame({'node' : [0, 1, 2],
                      'name' : ['Foo', 'Bar', 'Baz'],
                      'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

Then add the node attributes from dictionaries using set_node_attributes:

然后使用set_node_attributes以下命令从字典中添加节点属性：

nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())

Or iterate over the graph to add the node attributes:

或者遍历图以添加节点属性：

for i in sorted(G.nodes()):
    G.node[i]['name'] = nodes.name[i]
    G.node[i]['gender'] = nodes.gender[i]

Update:

更新：

As of nx 2.0the argument order of nx.set_node_attributeshas changed: (G, values, name=None)

由于nx 2.0的参数顺序nx.set_node_attributes已更改：(G, values, name=None)

Using the example from above:

使用上面的例子：

nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')

Answer 2

回答by Aaron Bramson

Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:

这是基本相同的答案，但更新了一些细节。我们将从基本相同的设置开始，但这里不会有节点索引，只有名称来解决@LancelotHolmes 评论并使其更通用：

import networkx as nx
import pandas as pd

linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
                  'target' : ['Bob', 'Cindy'],
                  'weight' : [100, 50]})

nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
                  'type' : ['Foo', 'Bar', 'Baz'],
                  'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())

Here the Trueparameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a DiGraphtype, but if you don't need that, then you can make it another type in the obvious way.

这里的True参数告诉 NetworkX 将 linkData 中的所有属性保留为链接属性。在这种情况下，我将其设为一种DiGraph类型，但如果您不需要它，那么您可以以明显的方式将其设为另一种类型。

Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the nameproperty, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.

现在，由于您需要通过从 linkData 生成的节点的名称来匹配 nodeData，您需要将 nodeData 数据帧的索引设置为name属性，然后再将其设置为字典，以便 NetworkX 2.x 可以将其加载为节点属性。

nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))

This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).

这将整个 nodeData 数据帧加载到字典中，其中键是名称，其他属性是该键内的键：值对（即，节点索引是其名称的普通节点属性）。

Answer 3

回答by Ioanna

A small remark:

一个小说明：

from_pandas_dataframe doesn't work in nx 2, referring to this one

from_pandas_dataframe 在 nx 2 中不起作用，指的是这个

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

I think that in nx 2.0 it goes like that:

我认为在 nx 2.0 中它是这样的：

G = nx.from_pandas_edgelist(edges, source = "Source", target = "Target")

pandas 将具有属性和边的节点从 DataFrame 加载到 NetworkX

提问by José

回答by harryscholes

Update:

更新：

回答by Aaron Bramson

回答by Ioanna

相关推荐

最近更新

标签

pandas 将具有属性和边的节点从 DataFrame 加载到 NetworkX

提问by José

回答by harryscholes

Update:

更新：

回答by Aaron Bramson

回答by Ioanna

相关推荐

Pandas 合并创建不需要的重复条目

在 Pandas 中用 .loc 覆盖 Nan 值

pandas 将我的列转换为 2 个小数位

pandas 带有 matplotlib 散射的条件颜色

相关推荐

最近更新

标签