scala 在spark的子串中使用长度函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46353360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
use length function in substring in spark
提问by satish
I am trying to use the length function inside a substring function in a DataFramebut it gives error
我正在尝试在 a 中的子字符串函数中使用长度函数,DataFrame但它给出了错误
val substrDF = testDF.withColumn("newcol", substring($"col", 1, length($"col")-1))
below is the error
下面是错误
error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
I am using 2.1.
我正在使用 2.1。
回答by pasha701
Function "expr" can be used:
可以使用函数“expr”:
val data = List("first", "second", "third")
val df = sparkContext.parallelize(data).toDF("value")
val result = df.withColumn("cutted", expr("substring(value, 1, length(value)-1)"))
result.show(false)
output:
输出:
+------+------+
|value |cutted|
+------+------+
|first |firs |
|second|secon |
|third |thir |
+------+------+
回答by shabbir hussain
You could also use $"COLUMN".substr
您也可以使用 $"COLUMN"。子字符串
val substrDF = testDF.withColumn("newcol", $"col".substr(lit(1), length($"col")-1))
Output:
输出:
val testDF = sc.parallelize(List("first", "second", "third")).toDF("col")
val result = testDF.withColumn("newcol", $"col".substr(org.apache.spark.sql.functions.lit(1), length($"col")-1))
result.show(false)
+------+------+
|col |newcol|
+------+------+
|first |firs |
|second|secon |
|third |thir |
+------+------+
回答by elghoto
You get that error because you the signature of substringis
你得到那个错误是因为你的签名substring是
def substring(str: Column, pos: Int, len: Int): Column
The lenargument that you are passing is a Column, and should be an Int.
len您传递的参数是 a Column,并且应该是Int。
You may probably want to implement a simple UDF to solve that problem.
您可能想要实现一个简单的 UDF 来解决该问题。
val strTail = udf((str: String) => str.substring(1))
testDF.withColumn("newCol", strTail($"col"))
回答by philantrovert
If all you want is to remove the last character of the string, you can do that without UDF as well. By using regexp_replace:
如果您只想删除字符串的最后一个字符,您也可以在没有 UDF 的情况下执行此操作。通过使用regexp_replace:
testDF.show
+---+----+
| id|name|
+---+----+
| 1|abcd|
| 2|qazx|
+---+----+
testDF.withColumn("newcol", regexp_replace($"name", ".$" , "") ).show
+---+----+------+
| id|name|newcol|
+---+----+------+
| 1|abcd| abc|
| 2|qazx| qaz|
+---+----+------+

