如何使用 Python / pyspark 运行 graphx?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23302270/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I run graphx with Python / pyspark?
提问by Glenn Strycker
I am attempting to run Spark graphx with Python using pyspark. My installation appears correct, as I am able to run the pyspark tutorials and the (Java) GraphX tutorials just fine. Presumably since GraphX is part of Spark, pyspark should be able to interface it, correct?
我正在尝试使用 pyspark 使用 Python 运行 Spark graphx。我的安装看起来是正确的,因为我能够很好地运行 pyspark 教程和(Java)GraphX 教程。大概因为 GraphX 是 Spark 的一部分,pyspark 应该能够连接它,对吗?
Here are the tutorials for pyspark: http://spark.apache.org/docs/0.9.0/quick-start.htmlhttp://spark.apache.org/docs/0.9.0/python-programming-guide.html
以下是 pyspark 的教程:http://spark.apache.org/docs/0.9.0/quick-start.html http://spark.apache.org/docs/0.9.0/python-programming-guide。 html
Here are the ones for GraphX: http://spark.apache.org/docs/0.9.0/graphx-programming-guide.htmlhttp://ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html
以下是 GraphX 的:http ://spark.apache.org/docs/0.9.0/graphx-programming-guide.html http://ampcamp.berkeley.edu/big-data-mini-course/graph-分析与graphx.html
Can anyone convert the GraphX tutorial to be in Python?
任何人都可以将 GraphX 教程转换为 Python 吗?
采纳答案by Misty Nodine
It looks like the python bindings to GraphX are delayed at least to Spark 1.41.5∞. It is waiting behind the Java API.
看起来 python 与 GraphX 的绑定至少延迟到 Spark 1.4 1.5∞。它在 Java API 后面等待。
You can track the status at SPARK-3789 GRAPHX Python bindings for GraphX - ASF JIRA
您可以在SPARK-3789 GRAPHX Python bindings for GraphX跟踪状态- ASF JIRA
回答by Wildfire
GraphX 0.9.0 doesn't have python API yet. It's expected in upcoming releases.
GraphX 0.9.0 还没有 python API。预计在即将发布的版本中。
回答by zhibo
You should look at GraphFrames (https://github.com/graphframes/graphframes), which wraps GraphX algorithms under the DataFrames API and it provides Python interface.
您应该查看 GraphFrames ( https://github.com/graphframes/graphframes),它在 DataFrames API 下包装了 GraphX 算法并提供 Python 接口。
Here is a quick example from https://graphframes.github.io/graphframes/docs/_site/quick-start.html, with slight modification so that it works
这是来自https://graphframes.github.io/graphframes/docs/_site/quick-start.html 的一个快速示例,稍加修改即可正常工作
first start pyspark with the graphframes pkg loaded
首先启动pyspark,并加载graphframes pkg
pyspark --packages graphframes:graphframes:0.1.0-spark1.6
pyspark --packages graphframes:graphframes:0.1.0-spark1.6
python code:
蟒蛇代码:
from graphframes import *
# Create a Vertex DataFrame with unique ID column "id"
v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
# Create an Edge DataFrame with "src" and "dst" columns
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
# Create a GraphFrame
g = GraphFrame(v, e)
# Query: Get in-degree of each vertex.
g.inDegrees.show()
# Query: Count the number of "follow" connections in the graph.
g.edges.filter("relationship = 'follow'").count()
# Run PageRank algorithm, and show results.
results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()