pandas 将具有属性和边的节点从 DataFrame 加载到 NetworkX
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42558165/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Load nodes with attributes and edges from DataFrame to NetworkX
提问by José
I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:
我是使用 Python 处理图形的新手:NetworkX。到目前为止,我一直在使用 Gephi。标准步骤(但不是唯一可能的)是:
Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:
id;NormalizedName;Gender per1;Jesús;male per2;Abraham;male per3;Isaac;male per4;Jacob;male per5;Judá;male per6;Tamar;female ...
Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):
Target;Source;Weight;Type per1;per2;3;Undirected per3;per4;2;Undirected ...
从表格/电子表格加载节点信息;其中一列应该是 ID,其余的列是关于节点的元数据(节点是人,所以性别,组......通常用于着色)。喜欢:
id;NormalizedName;Gender per1;Jesús;male per2;Abraham;male per3;Isaac;male per4;Jacob;male per5;Judá;male per6;Tamar;female ...
然后也从表/电子表格中加载边,使用与节点电子表格的列 ID 相同的节点名称,通常有四列(目标、来源、权重和类型):
Target;Source;Weight;Type per1;per2;3;Undirected per3;per4;2;Undirected ...
This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:
这是我拥有的两个数据帧,我想在 Python 中加载它们。阅读有关 NetworkX 的信息,似乎不太可能将两个表(一个用于节点,一个用于边)加载到同一个图中,我不确定最好的方法是什么:
Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?
Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?
我是否应该仅使用 DataFrame 中的节点信息创建图形,然后添加(附加)来自其他 DataFrame 的边?如果是这样并且由于 nx.from_pandas_dataframe() 需要有关边缘的信息,我想我不应该使用它来创建节点......我应该将信息作为列表传递吗?
我是否应该仅使用来自 DataFrame 的边信息创建一个图形,然后将来自其他 DataFrame 的信息作为属性添加到每个节点?有没有比迭代 DataFrame 和节点更好的方法呢?
回答by harryscholes
Create the weighted graph from the edge table using nx.from_pandas_dataframe
:
使用nx.from_pandas_dataframe
以下方法从边表创建加权图:
import networkx as nx
import pandas as pd
edges = pd.DataFrame({'source' : [0, 1],
'target' : [1, 2],
'weight' : [100, 50]})
nodes = pd.DataFrame({'node' : [0, 1, 2],
'name' : ['Foo', 'Bar', 'Baz'],
'gender' : ['M', 'F', 'M']})
G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')
Then add the node attributes from dictionaries using set_node_attributes
:
然后使用set_node_attributes
以下命令从字典中添加节点属性:
nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())
Or iterate over the graph to add the node attributes:
或者遍历图以添加节点属性:
for i in sorted(G.nodes()):
G.node[i]['name'] = nodes.name[i]
G.node[i]['gender'] = nodes.gender[i]
Update:
更新:
As of nx 2.0
the argument order of nx.set_node_attributes
has changed: (G, values, name=None)
由于nx 2.0
的参数顺序nx.set_node_attributes
已更改:(G, values, name=None)
Using the example from above:
使用上面的例子:
nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')
回答by Aaron Bramson
Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:
这是基本相同的答案,但更新了一些细节。我们将从基本相同的设置开始,但这里不会有节点索引,只有名称来解决@LancelotHolmes 评论并使其更通用:
import networkx as nx
import pandas as pd
linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
'target' : ['Bob', 'Cindy'],
'weight' : [100, 50]})
nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
'type' : ['Foo', 'Bar', 'Baz'],
'gender' : ['M', 'F', 'M']})
G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())
Here the True
parameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a DiGraph
type, but if you don't need that, then you can make it another type in the obvious way.
这里的True
参数告诉 NetworkX 将 linkData 中的所有属性保留为链接属性。在这种情况下,我将其设为一种DiGraph
类型,但如果您不需要它,那么您可以以明显的方式将其设为另一种类型。
Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the name
property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.
现在,由于您需要通过从 linkData 生成的节点的名称来匹配 nodeData,您需要将 nodeData 数据帧的索引设置为name
属性,然后再将其设置为字典,以便 NetworkX 2.x 可以将其加载为节点属性。
nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))
This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).
这将整个 nodeData 数据帧加载到字典中,其中键是名称,其他属性是该键内的键:值对(即,节点索引是其名称的普通节点属性)。
回答by Ioanna
A small remark:
一个小说明:
from_pandas_dataframe doesn't work in nx 2, referring to this one
from_pandas_dataframe 在 nx 2 中不起作用,指的是这个
G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')
I think that in nx 2.0 it goes like that:
我认为在 nx 2.0 中它是这样的:
G = nx.from_pandas_edgelist(edges, source = "Source", target = "Target")