scala 有没有办法获取 Spark Dataframe 的前 1000 行？

Question

提问by Michael Discenza

I am using the randomSplitfunction to get a small amount of a dataframe to use in dev purposes and I end up just taking the first df that is returned by this function.

我正在使用该randomSplit函数来获取少量数据帧以用于开发目的，而我最终只使用了该函数返回的第一个 df。

val df_subset = data.randomSplit(Array(0.00000001, 0.01), seed = 12345)(0)

If I use df.take(1000)then I end up with an array of rows- not a dataframe, so that won't work for me.

如果我使用，df.take(1000)那么我最终会得到一个行数组 - 而不是数据帧，所以这对我不起作用。

Is there a better, simpler way to take say the first 1000 rows of the df and store it as another df?

有没有更好、更简单的方法来表示 df 的前 1000 行并将其存储为另一个 df？

Answer 1

回答by Markon

The method you are looking for is .limit.

您正在寻找的方法是.limit。

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new Dataset.

通过取前 n 行返回一个新的数据集。这个函数和 head 的区别在于 head 返回一个数组，而 limit 返回一个新的 Dataset。

scala 有没有办法获取 Spark Dataframe 的前 1000 行？

提问by Michael Discenza

回答by Markon

相关推荐

最近更新

标签

scala 有没有办法获取 Spark Dataframe 的前 1000 行？

提问by Michael Discenza

回答by Markon

相关推荐

Spark Scala 列出目录中的文件夹

scala 如何检查数据帧？

在 Scala/Spark 中将纪元转换为日期时间

如何在带有 Spark 的 Scala 中使用 countDistinct？

相关推荐

最近更新

标签