Spark 和 Scala:读取 CSV 文件作为数据帧/数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37271474/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:18:03  来源:igfitidea点击:

Spark & Scala: Read in CSV file as DataFrame / Dataset

scalashellcsvapache-spark

提问by Boern

coming from the Rworld I want to import an .csv into Spark (v.1.6.1) using the Scala Shell (./spark-shell)

来自R世界我想使用 Scala Shell ( ./spark-shell)将 .csv 导入 Spark (v.1.6.1 )

My .csv has a header and looks like

我的 .csv 有一个标题,看起来像

"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104

Thanks.

谢谢。

回答by Boern

Spark 2.0+

火花 2.0+

Since the databricks/spark-csvhas been integrated into Spark, reading .CSVs is pretty straight forward using the SparkSession

由于databricks/spark-csv已集成到 Spark 中,因此使用SparkSession

val spark = .builder()
   .master("local")
   .appName("Word Count")
   .getOrCreate()
val df = spark.read.option("header", true).csv(path)

Older versions

旧版本

After restarting my spark-shell I figured it out by myself - may be of help for others:

重新启动我的 spark-shell 后,我自己想通了 - 可能对其他人有帮助:

After installing like described hereand starting the spark-shell using ./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0:

这里描述的那样安装并使用./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0以下命令启动 spark-shell 后:

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df = sqlContext.read.format("com.databricks.spark.csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("/home/vb/opt/spark/data/mllib/mydata.csv")
scala> df.printSchema()
root
 |-- col1: double (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: integer (nullable = true)