scala 在 spark 1.6 中将 csv 读取为数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38595893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:30:27  来源:igfitidea点击:

Read csv as Data Frame in spark 1.6

scalaapache-spark

提问by user2145299

I have Spark 1.6 and trying to read a csv (or tsv) file as a dataframe. Here are the steps I take:

我有 Spark 1.6 并尝试将 csv(或 tsv)文件作为数据帧读取。以下是我采取的步骤:

scala>  val sqlContext= new org.apache.spark.sql.SQLContext(sc)
scala> import sqlContext.implicits._
scala> val df = sqlContext.read
scala> .format("com.databricks.spark.csv")
scala> .option("header", "true")
scala.option("inferSchema", "true")
scala> .load("data.csv")
scala> df.show()

Error:

错误:

<console>:35: error: value show is not a member of org.apache.spark.sql.DataFrameReader df.show()

The last command is supposed to show the first few lines of the dataframe, but I get the error message. Any help will be much appreciated.

最后一个命令应该显示数据帧的前几行,但我收到错误消息。任何帮助都感激不尽。

回答by MrChristine

Looks like you functions are not chained together properly and it's attempting to run "show()" on the val df, which is a reference to the DataFrameReader class. If I run the following, I can reproduce your error:

看起来您的函数没有正确链接在一起,它试图在 val df 上运行“show()”,这是对 DataFrameReader 类的引用。如果我运行以下命令,我可以重现您的错误:

val df = sqlContext.read
df.show()

If you restructure the code, it would work:

如果您重组代码,它将起作用:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("data.csv")
df.show()

回答by Rajeev Rathor

In java first add dependency in POM.xml file and run following code to read csv file.

在 java 中首先在 POM.xml 文件中添加依赖项并运行以下代码来读取 csv 文件。

<dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-csv_2.10</artifactId>
            <version>1.4.0</version>
        </dependency>

Dataset<Row> df = sparkSession.read().format("com.databricks.spark.csv").option`enter code here`("header", true).option("inferSchema", true).load("hdfs://localhost:9000/usr/local/hadoop_data/loan_100.csv");

回答by user3521180

Use the following instead:

请改用以下内容:

val sqlContext = new SQLContext(sc);

It should resolve your issue.

它应该可以解决您的问题。