SQL 如何过滤掉teradata文本字段中的非数字值？

Question

提问by Chris Drappier

oI have a teradata table with about 10 million records in it, that stores a numeric id field as a varchar. i need to transfer the values in this field to a bigint column in another table, but i can't simply say cast(id_field as bigint) because i get an invalid character error. looking through the values, i find that there could be a character at any position in the string, so let's say the string is varchar(18) i could filter out invalid rows like so :

o我有一个 teradata 表，里面有大约 1000 万条记录，它将数字 id 字段存储为 varchar。我需要将此字段中的值传输到另一个表中的 bigint 列，但我不能简单地说 cast(id_field as bigint) 因为我收到无效字符错误。查看这些值，我发现字符串中的任何位置都可能有一个字符，所以假设字符串是 varchar(18) 我可以像这样过滤掉无效行：

     where substr(id_field,1,1) not in (/*big,ugly array of non-numeric chars*/)
     and substr(id_field,2,1) not in (/*big,ugly array of non-numeric chars*/)

etc, etc...

then the cast would work, but this is not feasible in the long run. it's slow and if the string has 18 possible characters, it makes the query unreadable. how can i filter out rows that have a value in this field that will not cast as a bigint without checking each character individually for an array of non-numeric characters?

那么演员表会起作用，但从长远来看这是不可行的。它很慢，如果字符串有 18 个可能的字符，它会使查询不可读。我如何过滤掉在该字段中具有值的行，该行不会被转换为 bigint 而不单独检查每个字符以获取非数字字符数组？

example values would be

示例值将是

   123abc464
   a2.3v65
   a_356087
   ........
   000000000
   BOB KNIGHT
   1235468099

the values follow no specific patterns, I simply need to filter out the ones that contain ANY non-numeric data. 123456789 is okay but 123.abc_c3865 is not...

值不遵循特定模式，我只需要过滤掉包含任何非数字数据的值。123456789 没问题，但 123.abc_c3865 不是...

Answer 1

采纳答案by lins314159

The best that I've ever managed is this:

我曾经管理过的最好的是：

where char2hexint(upper(id_field)) = char2hexint(lower(id_field))

Since upper case characters give a different hex value to lower case ones, this will ensure that you have no alphabetical characters, but will still leave you with underscores, colons and so forth. If this doesn't meet your requirements, you may need to write an UDF.

由于大写字符为小写字符提供不同的十六进制值，这将确保您没有字母字符，但仍会留下下划线、冒号等。如果这不符合您的要求，您可能需要编写一个 UDF。

Answer 2

回答by dnoeth

Starting with TD14 Teradata added some functions, now there are multiple ways, e.g.:

从TD14开始Teradata增加了一些功能，现在有多种方式，例如：

WHERE RTRIM(col, '0123456789') = ''

But the easiest way is TO_NUMBER, which returns NULL for bad data:

但最简单的方法是 TO_NUMBER，它为坏数据返回 NULL：

TO_NUMBER(col)

Answer 3

回答by a_sillyguy

could we also try to divide the values in the field by some integer "if divided then must be a number and if not and throws some error,then must have some character...." guess this would be lot fast as has just mathematics involved...

我们是否也可以尝试将字段中的值除以某个整数“如果被除，则必须是一个数字，如果不是并抛出一些错误，则必须有一些字符......”猜想这会像数学一样快涉及...

Answer 4

回答by rossinaus

I've faced the same issue to try to exclude alpha characters from street address house numbers. The following will work if you don't mind concatanating all the numeric numbers together...... It checks if the upper of a string equals the lower of the string, if so it's a number, if not it becomes null.

我遇到了同样的问题，试图从街道地址门牌号中排除字母字符。如果您不介意将所有数字连接在一起，以下将起作用......它检查字符串的上部是否等于字符串的下部，如果是，则为数字，否则为空。

select cast(case when upper(substring('12E'from 1 for 1)) = lower(substring('12E'from 1 for 1)) then substring('12E'from 1 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end ||
             case when upper(substring('12E'from 3 for 1)) = lower(substring('12E'from 3 for 1)) then substring('12E'from 3 for 1) else null end ||
             case when upper(substring('12E'from 4 for 1)) = lower(substring('12E'from 4 for 1)) then substring('12E'from 4 for 1) else null end ||
             case when upper(substring('12E'from 5 for 1)) = lower(substring('12E'from 5 for 1)) then substring('12E'from 5 for 1) else null end ||
             case when upper(substring('12E'from 2 for 1)) = lower(substring('12E'from 2 for 1)) then substring('12E'from 2 for 1) else null end
             as integer)

Answer 5

回答by Kaveh

Try using this code segment

尝试使用此代码段

WHERE id_Field NOT LIKE '%[^0-9]%'

Answer 6

回答by user3867061

I found lins314159 answer to be very helpful with a similar issue. It may be an old thread but for what it's worth, I used:

我发现 lins314159 答案对类似问题非常有帮助。它可能是一个旧线程，但对于它的价值，我使用了：

char2hexint(upper(id_field)) = char2hexint(lower(id_field)) AND substr(id_field,1,1) IN ('1' to '9')

to successfully cast the remaining VARCHAR results to INT

成功将剩余的 VARCHAR 结果转换为 INT

SQL 如何过滤掉teradata文本字段中的非数字值？

提问by Chris Drappier

采纳答案by lins314159

回答by dnoeth

回答by a_sillyguy

回答by rossinaus

回答by Kaveh

回答by user3867061

相关推荐

最近更新

标签

SQL 如何过滤掉teradata文本字段中的非数字值？

提问by Chris Drappier

采纳答案by lins314159

回答by dnoeth

回答by a_sillyguy

回答by rossinaus

回答by Kaveh

回答by user3867061

相关推荐

多行 SQL Where 子句

SQL GROUP BY 与 MAX(DATE)

SQL 搜索 Oracle CLOB 列的最佳方法是什么？

SQL SQLite 字符串包含其他字符串查询

相关推荐

最近更新

标签