pandas Neo4j 使用 py2neo 从熊猫数据帧创建节点和关系
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45738180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Neo4j create nodes and relationships from pandas dataframe with py2neo
提问by Fabio Lamanna
Getting results on a pandas dataframe from a cypher query on a Neo4j database with py2neo is really straightforward, as:
使用 py2neo 从 Neo4j 数据库上的密码查询中获取 Pandas 数据帧的结果非常简单,如下所示:
>>> from pandas import DataFrame
>>> DataFrame(graph.data("MATCH (a:Person) RETURN a.name, a.born LIMIT 4"))
a.born a.name
0 1964 Keanu Reeves
1 1967 Carrie-Anne Moss
2 1961 Laurence Fishburne
3 1960 Hugo Weaving
Now I am trying to create (or better MERGE) a set of nodes and relationships from a pandas dataframe into a Neo4j database with py2neo. Imagine I have a dataframe like:
现在,我正在尝试使用 py2neo 创建(或更好地合并)一组从 Pandas 数据帧到 Neo4j 数据库的节点和关系。想象一下,我有一个像这样的数据框:
LABEL1 LABEL2
p1 n1
p2 n1
p3 n2
p4 n2
where Labels are column header and properties as values. I would like to reproduce the following cypher query (for the first row as example), for every rows of my dataframe:
其中标签是列标题,属性是值。我想为我的数据帧的每一行重现以下密码查询(以第一行为例):
query="""
MATCH (a:Label1 {property:p1))
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:n1))
"""
I know I can tell py2neo just to graph.run(query)
, or even run a LOAD CSV
cypher script in the same way, but I wonder whether I can iterate through the dataframe and apply the above query row by row WITHIN py2neo.
我知道我可以只告诉 py2neo graph.run(query)
,甚至LOAD CSV
以相同的方式运行密码脚本,但我想知道我是否可以遍历数据帧并在 py2neo 中逐行应用上述查询。
回答by William Lyon
You can use DataFrame.iterrows()
to iterate through the DataFrame and execute a query for each row, passing in the values from the row as parameters.
您可以使用DataFrame.iterrows()
遍历 DataFrame 并对每一行执行查询,将行中的值作为参数传入。
for index, row in df.iterrows():
graph.run('''
MATCH (a:Label1 {property:$label1})
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
''', parameters = {'label1': row['label1'], 'label2': row['label2']})
That will execute one transaction per row. We can batch multiple queries into one transaction for better performance.
这将每行执行一个事务。我们可以将多个查询批处理到一个事务中以获得更好的性能。
tx = graph.begin()
for index, row in df.iterrows():
tx.evaluate('''
MATCH (a:Label1 {property:$label1})
MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()
Typically we can batch ~20k database operations in a single transaction.
通常,我们可以在单个事务中批处理 ~20k 数据库操作。
回答by Anna Yashina
I found out that the proposed solution doesn't work for me. The code above creates new nodes even though the nodes already exist. To make sure you don't create any duplicates, I suggest matching both a
and b
node before merge
:
我发现提议的解决方案对我不起作用。即使节点已经存在,上面的代码也会创建新节点。为确保您不会创建任何重复项,我建议在之前匹配a
和b
节点merge
:
tx = graph.begin()
for index, row in df.iterrows():
tx.evaluate('''
MATCH (a:Label1 {property:$label1}), (b:Label2 {property:$label2})
MERGE (a)-[r:R_TYPE]->(b)
''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()
Also in my case, I had to add relationship properties simultaneously (see the code below). Moreover, I had 500k+ relationships to add, so I expectedly run into the java heap memory error. I solved the problem by placing begin()
and commit()
inside the loop, so for each new relationship a new transaction is created:
同样在我的情况下,我必须同时添加关系属性(请参阅下面的代码)。此外,我有 500k+ 关系要添加,所以我预计会遇到 java 堆内存错误。我通过放置begin()
和commit()
在循环内解决了这个问题,因此对于每个新关系都会创建一个新事务:
for index, row in df.iterrows():
tx = graph.begin()
tx.evaluate('''
MATCH (a:Label1 {property:$label1}), (b:Label2 {property:$label2})
MERGE (a)-[r:R_TYPE{property_name:$p}]->(b)
''', parameters = {'label1': row['label1'], 'label2': row['label2'], 'p': row['property']})
tx.commit()