scala Spark SQL 更改数字格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45007602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:20:57  来源:igfitidea点击:

Spark SQL change format of the number

scalaapache-sparkapache-spark-sql

提问by Cherry

After showcommand spark prints the following:

show命令 spark 打印以下内容后:

+-----------------------+---------------------------+
|NameColumn             |NumberColumn               |
+-----------------------+---------------------------+
|name                   |4.3E-5                     |
+-----------------------+---------------------------+

Is there a way to change NumberColumnformat to something like 0.000043?

有没有办法将NumberColumn格式更改为类似的格式0.000043

回答by Ramesh Maharjan

you can use format_numberfunctionas

你可以使用format_number函数作为

import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))

here 5 is the decimal places you want to show

这里 5 是您要显示的小数位

As you can see in the link above that the format_numberfunctions returns a string column

正如您在上面的链接中看到的,format_number函数返回一个字符串列

format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

format_number(Column x, int d) 将
数字列 x 格式化为类似 '#,###,###.##' 的格式,四舍五入到 d 位小数,并将结果作为字符串列返回。

If your don't require ,you can call regexp_replacefunction which is defined as

如果您不需要,,您可以调用regexp_replace定义为的函数

regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.

regexp_replace(Column e, String pattern, String replacement)
用rep 替换指定字符串值中匹配regexp 的所有子字符串。

and use it as

并将其用作

import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))

Thus comma(,) should be removed for large numbers.

因此,对于大数字,应删除逗号( ,)。

回答by vdep

You can use castoperation as below:

您可以使用cast以下操作:

val df = sc.parallelize(Seq(0.000043)).toDF("num")    

df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")

adjust the precision and scale accordingly.

相应地调整精度和比例。

回答by Jose Alberto Gonzalez

In newer versions of pyspark you can use round() or bround() functions. Theses functions return a numeric column and solve the problem with ",".

在较新版本的 pyspark 中,您可以使用 round() 或 bround() 函数。这些函数返回一个数字列并用“,”解决问题。

it would be like:

它会是这样的:

df.withColumn("NumberColumn", bround("NumberColumn",5))

回答by Dinesh Kumar

df.createOrReplaceTempView("table")
outDF=sqlContext.sql("select CAST (num as DECIMAL(15,6)) from table")

6 decimal precisions in this case.

在本例中为 6 个十进制精度。