scala 如何选择以公共标签开头的所有列

Question

提问by user299791

I have a dataframe in Spark 1.6 and want to select just some columns out of it. The column names are like:

我在 Spark 1.6 中有一个数据框，只想从中选择一些列。列名是这样的：

colA, colB, colC, colD, colE, colF-0, colF-1, colF-2

I know I can do like this to select specific columns:

我知道我可以这样做来选择特定的列：

df.select("colA", "colB", "colE")

but how to select, say "colA", "colB" and all the colF-* columns at once? Is there a way like in Pandas?

但是如何一次选择“colA”、“colB”和所有 colF-* 列？有没有像Pandas那样的方法？

Answer 1

回答by Michael Lloyd Lee mlk

First grab the column names with df.columns, then filter down to just the column names you want .filter(_.startsWith("colF")). This gives you an array of Strings. But the select takes select(String, String*). Luckily select for columns is select(Column*), so finally convert the Strings into Columns with .map(df(_)), and finally turn the Array of Columns into a var arg with : _*.

首先使用获取列名df.columns，然后过滤到您想要的列名.filter(_.startsWith("colF"))。这为您提供了一个字符串数组。但是选择需要select(String, String*). 幸运的是，选择列是select(Column*)，所以最后将字符串转换为列，最后将列.map(df(_))数组转换为可变参数: _*。

df.select(df.columns.filter(_.startsWith("colF")).map(df(_)) : _*).show

This filter could be made more complex (same as Pandas). It is however a rather ugly solution (IMO):

这个过滤器可以做得更复杂（和 Pandas 一样）。然而，这是一个相当丑陋的解决方案（IMO）：

df.select(df.columns.filter(x => (x.equals("colA") || x.startsWith("colF"))).map(df(_)) : _*).show

If the list of other columns is fixed you could also merge a fixed array of columns names with filtered array.

如果其他列的列表是固定的，您还可以将列名的固定数组与过滤数组合并。

df.select((Array("colA", "colB") ++ df.columns.filter(_.startsWith("colF"))).map(df(_)) : _*).show

Answer 2

回答by kfkhalili

I wrote a function that does that. Read the comments to see how it works.

我写了一个函数来做到这一点。阅读评论以了解它是如何工作的。

  /**
    * Given a sequence of prefixes, select suitable columns from [[DataFrame]]
    * @param columnPrefixes Sequence of prefixes
    * @param dF Incoming [[DataFrame]]
    * @return [[DataFrame]] with prefixed columns selected
    */
  def selectPrefixedColumns(columnPrefixes: Seq[String], dF: DataFrame): DataFrame = {
    // Find out if given column name matches any of the provided prefixes
    def colNameStartsWith: String => Boolean = (colName: String) =>
        columnsPrefix.map(prefix => colName.startsWith(prefix)).reduce(_ || _)
    // Filter columns list by checking against given prefixes sequence
    val columns = dF.columns.filter(colNameStartsWith)
    // Select filtered columns list
    dF.select(columns.head, columns.tail:_*)
  }

scala 如何选择以公共标签开头的所有列

提问by user299791

回答by Michael Lloyd Lee mlk

回答by kfkhalili

相关推荐

最近更新

标签

scala 如何选择以公共标签开头的所有列

提问by user299791

回答by Michael Lloyd Lee mlk

回答by kfkhalili

相关推荐

scala Scala要么向右映射要么向左返回

scala 我应该在声明案例类时使用 final 修饰符吗？

如何使用模式匹配在 Scala 中获取非空列表？

如何在 Scala 中获取部署到 YARN 的 Spark 应用程序的 applicationId？

相关推荐

最近更新

标签