scala 在火花数据框中创建子字符串列

Question

提问by J Smith

I want to take a json file and map it so that one of the columns is a substring of another. For example to take the left table and produce the right table:

我想获取一个 json 文件并对其进行映射，以便其中一列是另一列的子字符串。例如，取左表并生成右表：

 ------------              ------------------------
|     a      |             |      a     |    b    |
|------------|       ->    |------------|---------|
|hello, world|             |hello, world|  hello  |

I can do this using spark-sql syntax but how can it be done using the in-built functions?

我可以使用 spark-sql 语法来做到这一点，但如何使用内置函数来做到这一点？

Answer 1

回答by pasha701

Such statement can be used

可以使用这样的语句

import org.apache.spark.sql.functions._

dataFrame.select(col("a"), substring_index(col("a"), ",", 1).as("b"))

Answer 2

回答by Balázs Fehér

Suppose you have the following dataframe:

假设您有以下数据框：

import spark.implicits._
import org.apache.spark.sql.functions._

var df = sc.parallelize(Seq(("foobar", "foo"))).toDF("a", "b")

+------+---+
|     a|  b|
+------+---+
|foobar|foo|
+------+---+

You could subset a new column from the first column as follows:

您可以从第一列中创建一个新列的子集，如下所示：

df = df.select(col("*"), substring(col("a"), 4, 6).as("c"))

+------+---+---+
|     a|  b|  c|
+------+---+---+
|foobar|foo|bar|
+------+---+---+

Answer 3

回答by soote

You would use the withColumnfunction

你会使用这个withColumn功能

import org.apache.spark.sql.functions.{ udf, col }
def substringFn(str: String) = your substring code
val substring = udf(substringFn _)
dataframe.withColumn("b", substring(col("a"))

Answer 4

回答by Ignacio Alorre

Just to enrich existing answers. In case you were interested in the right part of the string column. That is:

只是为了丰富现有的答案。如果您对字符串列的右侧部分感兴趣。那是：

 ------------              ------------------------
|     a      |             |      a     |    b    |
|------------|       ->    |------------|---------|
|hello, world|             |hello, world|  world  |

You should use a negative index:

您应该使用负索引：

dataFrame.select(col("a"), substring_index(col("a"), ",", -1).as("b"))

scala 在火花数据框中创建子字符串列

提问by J Smith

回答by pasha701

回答by Balázs Fehér

回答by soote

回答by Ignacio Alorre

相关推荐

最近更新

标签

scala 在火花数据框中创建子字符串列

提问by J Smith

回答by pasha701

回答by Balázs Fehér

回答by soote

回答by Ignacio Alorre

相关推荐

scala 计算 Spark DataFrame 中非空值的数量

scala 多次触发数据帧分组

scala 来自 Spark Streaming 的 RestAPI 服务调用

将列表转换为数据帧 spark scala

相关推荐

最近更新

标签