scala Spark SQL 更改数字格式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45007602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark SQL change format of the number
提问by Cherry
After showcommand spark prints the following:
在show命令 spark 打印以下内容后:
+-----------------------+---------------------------+
|NameColumn |NumberColumn |
+-----------------------+---------------------------+
|name |4.3E-5 |
+-----------------------+---------------------------+
Is there a way to change NumberColumnformat to something like 0.000043?
有没有办法将NumberColumn格式更改为类似的格式0.000043?
回答by Ramesh Maharjan
you can use format_numberfunctionas
你可以使用format_number函数作为
import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))
here 5 is the decimal places you want to show
这里 5 是您要显示的小数位
As you can see in the link above that the format_numberfunctions returns a string column
正如您在上面的链接中看到的,format_number函数返回一个字符串列
format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
format_number(Column x, int d) 将
数字列 x 格式化为类似 '#,###,###.##' 的格式,四舍五入到 d 位小数,并将结果作为字符串列返回。
If your don't require ,you can call regexp_replacefunction which is defined as
如果您不需要,,您可以调用regexp_replace定义为的函数
regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.
regexp_replace(Column e, String pattern, String replacement)
用rep 替换指定字符串值中匹配regexp 的所有子字符串。
and use it as
并将其用作
import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))
Thus comma(,) should be removed for large numbers.
因此,对于大数字,应删除逗号( ,)。
回答by vdep
You can use castoperation as below:
您可以使用cast以下操作:
val df = sc.parallelize(Seq(0.000043)).toDF("num")
df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")
adjust the precision and scale accordingly.
相应地调整精度和比例。
回答by Jose Alberto Gonzalez
In newer versions of pyspark you can use round() or bround() functions. Theses functions return a numeric column and solve the problem with ",".
在较新版本的 pyspark 中,您可以使用 round() 或 bround() 函数。这些函数返回一个数字列并用“,”解决问题。
it would be like:
它会是这样的:
df.withColumn("NumberColumn", bround("NumberColumn",5))
回答by Dinesh Kumar
df.createOrReplaceTempView("table")
outDF=sqlContext.sql("select CAST (num as DECIMAL(15,6)) from table")
6 decimal precisions in this case.
在本例中为 6 个十进制精度。

![scala 如何将 RDD[Row] 转换为 RDD[String]](/res/img/loading.gif)