SQL 如何将 String 值转换(或强制转换)为 Integer 值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45898806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I convert (or cast) a String value to an Integer value?
提问by chaotic3quilibrium
Using Spark 2.1 (on Databricks), I have a table which has a column of type String as a result of an import from a .CSV file. In a SELECT query against that table, I am attempting to convert that column's value into an Integer before using the column value in a mathematical operation. I have been unable to find the right Spark SQL "function" to do this.
使用 Spark 2.1(在 Databricks 上),我有一个表,由于从 .CSV 文件导入,它有一个 String 类型的列。在对该表的 SELECT 查询中,我试图在数学运算中使用列值之前将该列的值转换为整数。我一直无法找到正确的 Spark SQL“函数”来执行此操作。
Below is an example of the SQL. "TO_NUMBER" isn't working on either of the two Strings; Sum_GN_POP or Count1:
下面是 SQL 的示例。“TO_NUMBER”不适用于两个字符串中的任何一个;Sum_GN_POP 或 Count1:
SELECT name AS geohashPrefix3, TO_NUMBER(Sum_GN_POP) AS totalPopulation, TO_NUMBER(Count1) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
And it would be helpful if I could find the documentation for this. I will want to do other kinds of conversions (or casts) with other types, too. Any guidance on either or both of this is greatly appreciated.
如果我能找到这方面的文档,那将会很有帮助。我也想用其他类型进行其他类型的转换(或强制转换)。非常感谢任何关于这两者或其中之一的指导。
采纳答案by chaotic3quilibrium
Summary:
Apache Spark's SQLhas partial compatibility with Apache Hive. So, most SQL that can be written in Hive can be written in Spark SQL.
总结:
Apache Spark 的 SQL与Apache Hive部分兼容。所以,大多数可以用 Hive 编写的 SQL 都可以用 Spark SQL 编写。
Detail:
To convert a STRING to a specific numeric type like INT, a cast may be used. The cast consists of wrapping the target with parenthesis and preceding the parenthesis with the type to which it is to be changed. For example, the cast might look like this:
详细信息:
要将 STRING 转换为特定的数字类型(如 INT),可以使用强制转换。强制转换包括用括号将目标括起来,并在括号前加上要更改的类型。例如,演员表可能如下所示:
INT(someStringValue)
So, to make the SQL in the original posted question work, it needs to be changed to look like this (replacing the original function named "TO_NUMBER" with "INT"):
因此,要使原始发布问题中的 SQL 起作用,需要将其更改为如下所示(将名为“TO_NUMBER”的原始函数替换为“INT”):
SELECT name AS geohashPrefix3, INT(Sum_GN_POP) AS totalPopulation, INT(Count1) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
回答by Haroun Mohammedi
You can get it as Integer
from the csv
file using the option inferSchemalike this :
您可以使用选项inferSchemaInteger
从csv
文件中获取它,如下所示:
val df = spark.read.option("inferSchema", true).csv("file-location")
That being said : the inferSchema option do make mistakes sometimes and put the type as String
. if so you can use the cast
operator on Column
话虽如此: inferSchema 选项有时会出错并将类型设置为String
. 如果是这样,您可以使用cast
运算符Column
Dataframe/Dataset Implemetation :
数据框/数据集实现:
val df2 = df.withColumn("Count1", $"Count1" cast "Int" as "landMass").withColumn("Count1", $"Sum_GN_POP" cast "Int" as "totalPopulation")
SQL Implemetation :
SQL 实现:
SELECT name AS geohashPrefix3, CAST(Sum_GN_POP as INT) AS totalPopulation, CAST(Count1 AS INT) AS landMass
FROM wayne_geohash3
WHERE (LENGTH(name) = 3)
回答by Raphael Roth
I would to it using an UDF because Spark's cast will not capture variable overflow:
我会使用 UDF,因为 Spark 的强制转换不会捕获变量溢出:
val parseInt = udf((s:String) => scala.util.Try{Some(s.toInt)}.getOrElse(None))
Seq("100", "10000000000", "1x0")
.toDF("i")
.select(
$"i" cast "int" as "casted_result",
parseInt($"i") as "udf_result"
).show
+-------------+----------+
|casted_result|udf_result|
+-------------+----------+
| 100| 100|
| 1410065408| null|
| null| null|
+-------------+----------+
回答by Reza
Haroun's answer about casting in Sql works for me. But noticethat, if the number in the string is bigger than integer
, result will be null
. For numbers bigger than integer
(long
or bigint
), the cast should be like:
Haroun 关于在 Sql 中强制转换的回答对我有用。但请注意,如果字符串中的数字大于integer
,则结果将为null
。对于大于integer
(long
或bigint
) 的数字,演员表应该是这样的:
CAST(Sum_GN_POP as BIGINT)