scala 有没有办法获取 Spark Dataframe 的前 1000 行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34206508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:51:35  来源:igfitidea点击:

Is there a way to take the first 1000 rows of a Spark Dataframe?

scalaapache-spark

提问by Michael Discenza

I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I end up just taking the first df that is returned by this function.

我正在使用该randomSplit函数来获取少量数据帧以用于开发目的,而我最终只使用了该函数返回的第一个 df。

val df_subset = data.randomSplit(Array(0.00000001, 0.01), seed = 12345)(0)

If I use df.take(1000)then I end up with an array of rows- not a dataframe, so that won't work for me.

如果我使用,df.take(1000)那么我最终会得到一个行数组 - 而不是数据帧,所以这对我不起作用。

Is there a better, simpler way to take say the first 1000 rows of the df and store it as another df?

有没有更好、更简单的方法来表示 df 的前 1000 行并将其存储为另一个 df?

回答by Markon

The method you are looking for is .limit.

您正在寻找的方法是.limit

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new Dataset.

通过取前 n 行返回一个新的数据集。这个函数和 head 的区别在于 head 返回一个数组,而 limit 返回一个新的 Dataset。