SQL 如何过滤掉teradata文本字段中的非数字值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3559698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how do i filter out non-numeric values in a text field in teradata?
提问by Chris Drappier
oI have a teradata table with about 10 million records in it, that stores a numeric id field as a varchar. i need to transfer the values in this field to a bigint column in another table, but i can't simply say cast(id_field as bigint) because i get an invalid character error. looking through the values, i find that there could be a character at any position in the string, so let's say the string is varchar(18) i could filter out invalid rows like so :
o我有一个 teradata 表,里面有大约 1000 万条记录,它将数字 id 字段存储为 varchar。我需要将此字段中的值传输到另一个表中的 bigint 列,但我不能简单地说 cast(id_field as bigint) 因为我收到无效字符错误。查看这些值,我发现字符串中的任何位置都可能有一个字符,所以假设字符串是 varchar(18) 我可以像这样过滤掉无效行:
where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)
etc, etc...
then the cast would work, but this is not feasible in the long run. it's slow and if the string has 18 possible characters, it makes the query unreadable. how can i filter out rows that have a value in this field that will not cast as a bigint without checking each character individually for an array of non-numeric characters?
那么演员表会起作用,但从长远来看这是不可行的。它很慢,如果字符串有 18 个可能的字符,它会使查询不可读。我如何过滤掉在该字段中具有值的行,该行不会被转换为 bigint 而不单独检查每个字符以获取非数字字符数组?
example values would be
示例值将是
123abc464
a2.3v65
a_356087
........
000000000
BOB KNIGHT
1235468099
the values follow no specific patterns, I simply need to filter out the ones that contain ANY non-numeric data. 123456789 is okay but 123.abc_c3865 is not...
值不遵循特定模式,我只需要过滤掉包含任何非数字数据的值。123456789 没问题,但 123.abc_c3865 不是...
采纳答案by lins314159
The best that I've ever managed is this:
我曾经管理过的最好的是:
where char2hexint(upper(id_field)) = char2hexint(lower(id_field))
Since upper case characters give a different hex value to lower case ones, this will ensure that you have no alphabetical characters, but will still leave you with underscores, colons and so forth. If this doesn't meet your requirements, you may need to write an UDF.
由于大写字符为小写字符提供不同的十六进制值,这将确保您没有字母字符,但仍会留下下划线、冒号等。如果这不符合您的要求,您可能需要编写一个 UDF。
回答by dnoeth
Starting with TD14 Teradata added some functions, now there are multiple ways, e.g.:
从TD14开始Teradata增加了一些功能,现在有多种方式,例如:
WHERE RTRIM(col, '0123456789') = ''
But the easiest way is TO_NUMBER, which returns NULL for bad data:
但最简单的方法是 TO_NUMBER,它为坏数据返回 NULL:
TO_NUMBER(col)
回答by a_sillyguy
could we also try to divide the values in the field by some integer "if divided then must be a number and if not and throws some error,then must have some character...." guess this would be lot fast as has just mathematics involved...
我们是否也可以尝试将字段中的值除以某个整数“如果被除,则必须是一个数字,如果不是并抛出一些错误,则必须有一些字符......”猜想这会像数学一样快涉及...
回答by rossinaus
I've faced the same issue to try to exclude alpha characters from street address house numbers. The following will work if you don't mind concatanating all the numeric numbers together...... It checks if the upper of a string equals the lower of the string, if so it's a number, if not it becomes null.
我遇到了同样的问题,试图从街道地址门牌号中排除字母字符。如果您不介意将所有数字连接在一起,以下将起作用......它检查字符串的上部是否等于字符串的下部,如果是,则为数字,否则为空。
select cast(case when upper(substring('12E'from 1 for 1)) = lower(substring('12E'from 1 for 1)) then substring('12E'from 1 for 1) else null end ||
case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end ||
case when upper(substring('12E'from 3 for 1)) = lower(substring('12E'from 3 for 1)) then substring('12E'from 3 for 1) else null end ||
case when upper(substring('12E'from 4 for 1)) = lower(substring('12E'from 4 for 1)) then substring('12E'from 4 for 1) else null end ||
case when upper(substring('12E'from 5 for 1)) = lower(substring('12E'from 5 for 1)) then substring('12E'from 5 for 1) else null end ||
case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end
as integer)
回答by Kaveh
Try using this code segment
尝试使用此代码段
WHERE id_Field NOT LIKE '%[^0-9]%'
回答by user3867061
I found lins314159 answer to be very helpful with a similar issue. It may be an old thread but for what it's worth, I used:
我发现 lins314159 答案对类似问题非常有帮助。它可能是一个旧线程,但对于它的价值,我使用了:
char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')
char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')
to successfully cast the remaining VARCHAR results to INT
成功将剩余的 VARCHAR 结果转换为 INT