scala spark检索20多条记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31478695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:22:46  来源:igfitidea点击:

spark retrieving more than 20 records

scalaapache-spark-sql

提问by user1342645

I have a file called students.json that looks like this:

我有一个名为 student.json 的文件,如下所示:

{"uid":1,"name":"Michael","sid":1}
{"uid":2,"name":"Andy", "age":30,"sid":1}
{"uid":3,"name":"Jsaxsustin", "age":19,"sid":1}
{"uid":4,"name":"Andasxsay", "age":30,"sid":2}
{"uid":5,"name":"Jusewedtin", "age":19,"sid":1}
{"uid":6,"name":"Anwereddy", "age":30,"sid":3}
{"uid":7,"name":"Justdwedwein", "age":19,"sid":2}
{"uid":8,"name":"Andwedewy", "age":30,"sid":1}
{"uid":9,"name":"Justidedwn", "age":19,"sid":1}
{"uid":10,"name":"Anddwdey", "age":30,"sid":3}
{"uid":11,"name":"Michael","sid":1}
{"uid":12,"name":"Andy", "age":30,"sid":1}
{"uid":13,"name":"Jsaxsustin", "age":19,"sid":1}
{"uid":14,"name":"Andasxsay", "age":30,"sid":2}
{"uid":15,"name":"Jusewedtin", "age":19,"sid":1}
{"uid":16,"name":"Anwereddy", "age":30,"sid":3}
{"uid":17,"name":"Justdwedwein", "age":19,"sid":2}
{"uid":18,"name":"Andwe2fr3fdewy", "age":30,"sid":1}
{"uid":19,"name":"Justide4y45y54dwn", "age":19,"sid":1}
{"uid":20,"name":"Anddwd45y45yey", "age":30,"sid":3}
{"uid":21,"name":"Justdw45y45yedin", "age":19,"sid":1}
{"uid":22,"name":"An45y45ydy", "age":30,"sid":1}
{"uid":23,"name":"Jsaxsus4y54ytin", "age":19,"sid":1}
{"uid":24,"name":"Andas45y4y5xsay", "age":30,"sid":2}
{"uid":25,"name":"Jusewe4y5dtin", "age":19,"sid":1}
{"uid":26,"name":"Anwere45y45yddy", "age":30,"sid":3}
{"uid":27,"name":"Justdwe4y4y5dwein", "age":19,"sid":2}
{"uid":28,"name":"Andwede45ywy", "age":30,"sid":1}
{"uid":29,"name":"Justided45y45wn", "age":19,"sid":1}
{"uid":30,"name":"Anddwde4t4y", "age":30,"sid":3}
{"uid":31,"name":"Mich4y554ael","sid":1}
{"uid":32,"name":"An45ydy", "age":30,"sid":1}
{"uid":33,"name":"Jsaxsudfsstin", "age":19,"sid":1}
{"uid":34,"name":"Andasxssdfdsay", "age":30,"sid":2}
{"uid":35,"name":"Jusewedtsdfdsin", "age":19,"sid":1}
{"uid":36,"name":"Anweredsfdsdy", "age":30,"sid":3}
{"uid":37,"name":"Justdwedfsdwein", "age":19,"sid":2}
{"uid":38,"name":"Andwedewy", "age":30,"sid":1}
{"uid":39,"name":"Jdsfdsfustidedwn", "age":19,"sid":1}
{"uid":40,"name":"Ansdfdsey", "age":30,"sid":3}
{"uid":41,"name":"Jussdsdtdwedin", "age":19,"sid":1}

then I am using sparks to import the data and do an sql query on it using sqlContext. To retrieve the information I am using the follwing scala code (almost taken out of a spark example on scala):

然后我使用 sparks 导入数据并使用 sqlContext 对其进行 sql 查询。要检索信息,我使用以下 scala 代码(几乎取自 scala 上的 spark 示例):

val sqlContext = new org.apache.spark.sql.SQLContext(sc);
import sqlContext.implicits._
val students = sqlContext.read.json("../myTests/students.json");
students.registerTempTable("students");
sqlContext.sql("SELECT * FROM students").show;

This instead of returning all items of the list it only return 20 records(the first 20 records)... If on the last line instead of

这不是返回列表的所有项目,而是只返回 20 条记录(前 20 条记录)...如果在最后一行而不是

sqlContext.sql("SELECT * FROM students").show;

I put

我放

sqlContext.sql("SELECT * FROM students where uid>10").show;

it gives me 20 records from 10 to 30. In short I want to see more than 20 records, how can I do that? is it possible? I have looked in the documentation and have not read anything about this.... I know that it might be a bit silly but I am doing some tests before I get into anything more serious... I downloaded the spark and I am running it as an standalone just to see the scala...

它给了我从 10 到 30 的 20 条记录。总之我想查看 20 多条记录,我该怎么做?是否可以?我查看了文档并没有阅读任何关于此的内容......我知道这可能有点愚蠢,但在我进入更严重的事情之前我正在做一些测试......我下载了spark并且我正在运行它作为一个独立的只是为了看到 Scala ......

my spark initialization shows this(only Error, I changed the config so that it would not show me info)

我的 spark 初始化显示了这一点(只有错误,我更改了配置,使其不显示信息)

15/07/17 11:37:55 ERROR Shell: Failed to locate the winutils binary in the hadoo
p binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
        at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)

        at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
        at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
        at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Group
s.java:280)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupI
nformation.java:271)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(Use
rGroupInformation.java:248)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(
UserGroupInformation.java:763)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGrou
pInformation.java:748)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGr
oupInformation.java:621)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils
.scala:2162)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName.apply(Utils
.scala:2162)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2162)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:212)
        at org.apache.spark.repl.SparkIMain.<init>(SparkIMain.scala:118)
        at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.<init>(SparkIL
oop.scala:187)
        at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:2
17)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process.apply$mcZ$sp(SparkILoop.scala:949)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass
Loader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$pr
ocess(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:664)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:16
9)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_21)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> studentsstudentsstudentsstudents

Thanks

谢谢

回答by WoodChopper

Lets say you want to see 100 rows, you could also use,

假设您想查看 100 行,您也可以使用,

df.show(100)

sqlContext.sql("SELECT * FROM students where uid>10").show(100)

link to the method

方法链接

回答by Haiying Wang

sqlContext.sql("SELECT * FROM students") returns a DataFrame instance, refer to DataFrame API. To retrieve different # of rows, you can call ".take(n)":

sqlContext.sql("SELECT * FROM Students") 返回一个DataFrame实例,参考DataFrame API。要检索不同的行数,您可以调用“.take(n)”:

sqlContext.sql("SELECT * FROM students where uid>10").take(30).foreach(println)