Python 使用 Pyspark 和 Hive 显示来自特定数据库的表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42489130/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:46:08  来源:igfitidea点击:

Showing tables from specific database with Pyspark and Hive

pythonapache-sparkhivepysparkbeeline

提问by Keithx

Having some databases and tables in them in Hive instance. I'd like to show tables for some specific database (let's say 3_db).

在 Hive 实例中有一些数据库和表。我想显示某些特定数据库的表(比如说 3_db)。

+------------------+--+
|? database_name?? |
+------------------+--+
| 1_db???????????  |
| 2_db???????????  |
| 3_db?            |
+------------------+--+

If I enter beeline from bash-nothing complex there, I just do the following:

如果我从 bash-nothing complex 那里输入直线,我只需执行以下操作:

show databases;
show tables from 3_db;

When I'm using pyspark via ipython notebeook- my cheap tricks are not working there and give me error on the second line (show tables from 3_db) instead:

当我通过 ipython notebeook 使用 pyspark 时-我的廉价技巧在那里不起作用,而是在第二行(显示 3_db 中的表)上给我错误:

sqlContext.sql('show databases').show()
sqlContext.sql('show tables from 3_db').show()

What seems to be wrong and why's the same code works in one place and don't work in another?

什么似乎是错误的,为什么相同的代码在一个地方工作而在另一个地方不起作用?

回答by David ???? Markovitz

sqlContext.sql("show tables in 3_db").show()

回答by aelesbao

Another possibility is to use the Catalogmethods:

另一种可能性是使用Catalog方法:

spark = SparkSession.builder.getOrCreate()
spark.catalog.listTables("3_db")

Just be aware that in PySpark this method returns a listand in Scala, it returns a DataFrame.

请注意,在 PySpark 中,此方法返回 alist而在 Scala 中,它返回 a DataFrame