pandas Neo4j 使用 py2neo 从熊猫数据帧创建节点和关系

Question

提问by Fabio Lamanna

Getting results on a pandas dataframe from a cypher query on a Neo4j database with py2neo is really straightforward, as:

使用 py2neo 从 Neo4j 数据库上的密码查询中获取 Pandas 数据帧的结果非常简单，如下所示：

>>> from pandas import DataFrame
>>> DataFrame(graph.data("MATCH (a:Person) RETURN a.name, a.born LIMIT 4"))
   a.born              a.name
0    1964        Keanu Reeves
1    1967    Carrie-Anne Moss
2    1961  Laurence Fishburne
3    1960        Hugo Weaving

Now I am trying to create (or better MERGE) a set of nodes and relationships from a pandas dataframe into a Neo4j database with py2neo. Imagine I have a dataframe like:

现在，我正在尝试使用 py2neo 创建（或更好地合并）一组从 Pandas 数据帧到 Neo4j 数据库的节点和关系。想象一下，我有一个像这样的数据框：

LABEL1 LABEL2
p1 n1
p2 n1
p3 n2
p4 n2

where Labels are column header and properties as values. I would like to reproduce the following cypher query (for the first row as example), for every rows of my dataframe:

其中标签是列标题，属性是值。我想为我的数据帧的每一行重现以下密码查询（以第一行为例）：

query="""
    MATCH (a:Label1 {property:p1))
    MERGE (a)-[r:R_TYPE]->(b:Label2 {property:n1))
"""

I know I can tell py2neo just to graph.run(query), or even run a LOAD CSVcypher script in the same way, but I wonder whether I can iterate through the dataframe and apply the above query row by row WITHIN py2neo.

我知道我可以只告诉 py2neo graph.run(query)，甚至LOAD CSV以相同的方式运行密码脚本，但我想知道我是否可以遍历数据帧并在 py2neo 中逐行应用上述查询。

Answer 1

回答by William Lyon

You can use DataFrame.iterrows()to iterate through the DataFrame and execute a query for each row, passing in the values from the row as parameters.

您可以使用DataFrame.iterrows()遍历 DataFrame 并对每一行执行查询，将行中的值作为参数传入。

for index, row in df.iterrows():
    graph.run('''
      MATCH (a:Label1 {property:$label1})
      MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
    ''', parameters = {'label1': row['label1'], 'label2': row['label2']})

That will execute one transaction per row. We can batch multiple queries into one transaction for better performance.

这将每行执行一个事务。我们可以将多个查询批处理到一个事务中以获得更好的性能。

tx = graph.begin()
for index, row in df.iterrows():
    tx.evaluate('''
      MATCH (a:Label1 {property:$label1})
      MERGE (a)-[r:R_TYPE]->(b:Label2 {property:$label2})
    ''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()

Typically we can batch ~20k database operations in a single transaction.

通常，我们可以在单个事务中批处理 ~20k 数据库操作。

Answer 2

回答by Anna Yashina

I found out that the proposed solution doesn't work for me. The code above creates new nodes even though the nodes already exist. To make sure you don't create any duplicates, I suggest matching both aand bnode before merge:

我发现提议的解决方案对我不起作用。即使节点已经存在，上面的代码也会创建新节点。为确保您不会创建任何重复项，我建议在之前匹配a和b节点merge：

tx = graph.begin()
for index, row in df.iterrows():
    tx.evaluate('''
       MATCH (a:Label1 {property:$label1}), (b:Label2 {property:$label2})
       MERGE (a)-[r:R_TYPE]->(b)
       ''', parameters = {'label1': row['label1'], 'label2': row['label2']})
tx.commit()

Also in my case, I had to add relationship properties simultaneously (see the code below). Moreover, I had 500k+ relationships to add, so I expectedly run into the java heap memory error. I solved the problem by placing begin()and commit()inside the loop, so for each new relationship a new transaction is created:

同样在我的情况下，我必须同时添加关系属性（请参阅下面的代码）。此外，我有 500k+ 关系要添加，所以我预计会遇到 java 堆内存错误。我通过放置begin()和commit()在循环内解决了这个问题，因此对于每个新关系都会创建一个新事务：

for index, row in df.iterrows():
    tx = graph.begin()
    tx.evaluate('''
       MATCH (a:Label1 {property:$label1}), (b:Label2 {property:$label2})
       MERGE (a)-[r:R_TYPE{property_name:$p}]->(b)
       ''', parameters = {'label1': row['label1'], 'label2': row['label2'], 'p': row['property']})
    tx.commit()

pandas Neo4j 使用 py2neo 从熊猫数据帧创建节点和关系

提问by Fabio Lamanna

回答by William Lyon

回答by Anna Yashina

相关推荐

最近更新

标签

pandas Neo4j 使用 py2neo 从熊猫数据帧创建节点和关系

提问by Fabio Lamanna

回答by William Lyon

回答by Anna Yashina

相关推荐

pandas 如何在Python中找到两个矩阵之间的差异，结果不应该有任何带减号的值

pandas 如何在 matplotlib 图形上绘制框架

pandas 熊猫数据框如何在忽略日期的同时比较日期时间

Python Pandas - 在 Groupby DF 上将列转换为百分比

相关推荐

最近更新

标签