从 MySQL 中的字符串中删除引号和逗号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7917/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 11:55:07  来源:igfitidea点击:

Remove Quotes and Commas from a String in MySQL

mysqlregexstring

提问by Steve Willard

I'm importing some data from a CSVfile, and numbers that are larger than 1000get turned into 1,100etc.

我正在从CSV文件中导入一些数据,以及大于1000转换成的数字1,100等。

What's a good way to remove both the quotes and the comma from this so I can put it into an intfield?

从中删除引号和逗号以便我可以将其放入int字段的好方法是什么?

Edit:

编辑:

The data is actually already in a MySQL table, so I need to be able to this using SQL. Sorry for the mixup.

数据实际上已经在一个 MySQL 表中,所以我需要能够使用 SQL。抱歉搞混了。

采纳答案by Joseph Pecoraro

Here is a good case for regular expressions. You can run a find and replace on the data either before you import (easier) or later on if the SQL import accepted those characters (not nearly as easy). But in either case, you have any number of methods to do a find and replace, be it editors, scripting languages, GUI programs, etc. Remember that you're going to want to find and replace allof the bad characters.

这是正则表达式的一个很好的例子。如果 SQL 导入接受这些字符,您可以在导入之前(更容易)或稍后(不太容易)对数据运行查找和替换。但在任何一种情况下,您都可以使用多种方法进行查找和替换,无论是编辑器、脚本语言、GUI 程序等。请记住,您将要查找和替换所有不良字符。

A typical regular expression to find the comma and quotes (assuming just double quotes) is: (Blacklist)

查找逗号和引号(假设只是双引号)的典型正则表达式是:(黑名单)

/[,"]/

Or, if you find something might change in the future, this regular expression, matches anything except a number or decimal point. (Whitelist)

或者,如果您发现某些内容将来可能会发生变化,此正则表达式将匹配除数字或小数点以外的任何内容。(白名单)

/[^0-9\.]/

What has been discussed by the people above is that we don't know all of the data in your CSV file. It sounds like you want to remove the commas and quotes from all of the numbers in the CSV file. But because we don't know what else is in the CSV file we want to make sure that we don't corrupt other data. Just blindly doing a find/replace could affect other portions of the file.

上面的人所讨论的是我们不知道您的 CSV 文件中的所有数据。听起来您想从 CSV 文件中的所有数字中删除逗号和引号。但是因为我们不知道 CSV 文件中还有什么,所以我们要确保不会损坏其他数据。只是盲目地进行查找/替换可能会影响文件的其他部分。

回答by Joseph Pecoraro

My guess here is that because the data was able to import that the field is actually a varchar or some character field, because importing to a numeric field might have failed. Here was a test case I ran purely a MySQL, SQL solution.

我的猜测是,因为数据能够导入该字段实际上是 varchar 或某个字符字段,因为导入到数字字段可能会失败。这是我纯粹运行 MySQL、SQL 解决方案的测试用例。

  1. The table is just a single column (alpha) that is a varchar.

    mysql> desc t;
    
    +-------+-------------+------+-----+---------+-------+
    | Field | Type        | Null | Key | Default | Extra |
    +-------+-------------+------+-----+---------+-------+
    | alpha | varchar(15) | YES  |     | NULL    |       | 
    +-------+-------------+------+-----+---------+-------+
    
  2. Add a record

    mysql> insert into t values('"1,000,000"');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> select * from t;
    
    +-------------+
    | alpha       |
    +-------------+
    | "1,000,000" | 
    +-------------+
    
  3. Update statement.

    mysql> update t set alpha = replace( replace(alpha, ',', ''), '"', '' );
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select * from t;
    
    +---------+
    | alpha   |
    +---------+
    | 1000000 | 
    +---------+
    
  1. 该表只是一个单列 (alpha),它是一个 varchar。

    mysql> desc t;
    
    +-------+-------------+------+-----+---------+-------+
    | Field | Type        | Null | Key | Default | Extra |
    +-------+-------------+------+-----+---------+-------+
    | alpha | varchar(15) | YES  |     | NULL    |       | 
    +-------+-------------+------+-----+---------+-------+
    
  2. 添加记录

    mysql> insert into t values('"1,000,000"');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> select * from t;
    
    +-------------+
    | alpha       |
    +-------------+
    | "1,000,000" | 
    +-------------+
    
  3. 更新声明。

    mysql> update t set alpha = replace( replace(alpha, ',', ''), '"', '' );
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select * from t;
    
    +---------+
    | alpha   |
    +---------+
    | 1000000 | 
    +---------+
    

So in the end the statement I used was:

所以最后我使用的语句是:

UPDATE table
   SET field_name = replace( replace(field_name, ',', ''), '"', '' );

I looked at the MySQL Documentationand it didn't look like I could do the regular expressions find and replace. Although you could, like Eldila, use a regular expression for a find and then an alternative solution for replace.

我查看了MySQL 文档,看起来我无法执行正则表达式 find和 replace。尽管您可以像Eldila一样,使用正则表达式进行查找,然后使用替代解决方案进行替换。



Also be careful with s/"(\d+),(\d+)"/$1$2/because what if the number has more then just a single comma, for instance "1,000,000" you're going to want to do a global replace (in perl that is s///g). But even with a global replace the replacement starts where you last left off (unless perl is different), and would miss the every other comma separated group. A possible solution would be to make the first (\d+) optional like so s/(\d+)?,(\d+)/$1$2/gand in this case I would need a second find and replace to strip the quotes.

还要小心,s/"(\d+),(\d+)"/$1$2/因为如果数字不止一个逗号,例如“1,000,000”,您将要进行全局替换(在 perl 中就是s///g)。但即使使用全局替换,替换也会从您上次停止的地方开始(除非 perl 不同),并且会错过每隔一个逗号分隔的组。一个可能的解决方案是使第一个 (\d+) 像这样可选s/(\d+)?,(\d+)/$1$2/g,在这种情况下,我需要第二个查找和替换来去除引号。

Here are some ruby examples of the regular expressions acting on just the string "1,000,000", notice there are NOT double quote inside the string, this is just a string of the number itself.

以下是一些仅作用于字符串“1,000,000”的正则表达式的 ruby​​ 示例,注意字符串内没有双引号,这只是数字本身的字符串。

>> "1,000,000".sub( /(\d+),(\d+)/, '' )
# => "1000,000"  
>> "1,000,000".gsub( /(\d+),(\d+)/, '' )
# => "1000,000"  
>> "1,000,000".gsub( /(\d+)?,(\d+)/, '' )
# => "1000000"  
>> "1,000,000".gsub( /[,"]/, '' )
# => "1000000"  
>> "1,000,000".gsub( /[^0-9]/, '' )
# => "1000000"

回答by Eldila

The solution to the changed question is basically the same.

改题的解法基本相同。

You will have to run select query with the regex where clause.

您必须使用正则表达式 where 子句运行选择查询。

Somthing like

有点像

Select *
  FROM SOMETABLE
  WHERE SOMEFIELD REGEXP '"(\d+),(\d+)"'

Foreach of these rows, you want to do the following regex substitution s/"(\d+),(\d+)"/$1$2/ and then update the field with the new value.

对于这些行中的每一行,您希望执行以下正则表达式替换 s/"(\d+),(\d+)"/$1$2/,然后使用新值更新该字段。

Please Joseph Pecoraro seriously and have a backup before doing mass changes to any files or databases. Because whenever you do regex, you can seriously mess up data if there are cases that you have missed.

请Joseph Pecoraro 认真并在对任何文件或数据库进行大量更改之前进行备份。因为每当你做正则表达式时,如果你错过了一些案例,你可能会严重弄乱数据。

回答by Eldila

My command does remove all ',' and '"'.

我的命令确实删除了所有 ',' 和 '"'。

In order to convert the sting "1,000" more strictly, you will need the following command.

为了更严格地转换字符串“1,000”,您将需要以下命令。

Perl -lne 's/"(\d+),(\d+)"//; print' file.txt > newfile.txt

回答by Eldila

Actually nlucaroni, your case isn't quite right. Your example doesn't include double-quotes, so

其实nlucaroni,你的情况不太对。你的例子不包括双引号,所以

id,age,name,...
1,23,phil,

won't match my regex. It requires the format "XXX,XXX". I can't think of an example of when it will match incorrectly.

不会匹配我的正则表达式。它需要格式“XXX,XXX”。我想不出什么时候会不正确匹配的例子。

All the following example won't include the deliminator in the regex:

以下所有示例都不会在正则表达式中包含分隔符:

"111,111",234
234,"111,111"
"111,111","111,111"
"111,111",234
234,"111,111"
"111,111","111,111"

Please let me know if you can think of a counter-example.

如果你能想到一个反例,请告诉我。

Cheers!

干杯!

回答by Eldila

You could use this perl command.

你可以使用这个 perl 命令。

Perl -lne 's/[,|"]//; print' file.txt > newfile.txt

You may need to play around with it a bit, but it should do the trick.

您可能需要稍微尝试一下,但它应该可以解决问题。

回答by Eldila

Here's the PHP way:

这是PHP的方式:

$stripped = str_replace(array(',', '"'), '', $value);

Link to W3Schools page

链接到 W3Schools 页面

回答by BlaM

Daniel's and Eldila's answer have one problem: They remove all quotes and commas in the whole file.

Daniel 和 Eldila 的回答有一个问题:他们删除了整个文件中的所有引号和逗号。

What I usually do when I have to do something like this is to first replace all separating quotes and (usually) semicolons by tabs.

当我必须做这样的事情时,我通常会首先用制表符替换所有分隔引号和(通常)分号。

  • Search:";"
  • Replace:\t
  • 搜索:“;”
  • 替换:\t

Since I know in which column my affected values will be I then do another search and replace:

由于我知道受影响的值将在哪一列中,因此我会进行另一次搜索并替换:

  • Search:^([\t]+)\t([\t]+)\t([0-9]+),([0-9]+)\t
  • Replace:\1\t\2\t\3\4\t
  • 搜索:^([\t]+)\t([\t]+)\t([0-9]+),([0-9]+)\t
  • 替换:\1\t\2\t\3\4\t

... given the value with the comma is in the third column.

...给定带逗号的值在第三列中。

You need to start with an "^" to make sure that it starts at the beginning of a line. Then you repeat ([0-9]+)\t as often as there are columns that you just want to leave as they are.

您需要以“^”开头,以确保它从一行的开头开始。然后您重复 ([0-9]+)\t 的频率,因为您只想保留原样的列。

([0-9]+),([0-9]+) searches for values where there is a number, then a comma and then another number.

([0-9]+),([0-9]+) 搜索有数字的值,然后是逗号,然后是另一个数字。

In the replace string we use \1 and \2 to just keep the values from the edited line, separating them with \t (tab). Then we put \3\4 (no tab between) to put the two components of the number without the comma right after each other. All values after that will be left alone.

在替换字符串中,我们使用 \1 和 \2 来保留编辑行中的值,并用 \t(制表符)分隔它们。然后我们把 \3\4 (中间没有制表符)把没有逗号的数字的两个组成部分放在一起。之后的所有值都将被保留。

If you need your file to have semicolon to separate the elements, you then can go on and replace the tabs with semicolons. However then - if you leave out the quotes - you'll have to make sure that the text values do not contain any semicolons themselves. That's why I prefer to use TAB as column separator.

如果你需要你的文件有分号来分隔元素,你可以继续用分号替换制表符。但是,如果省略引号,则必须确保文本值本身不包含任何分号。这就是为什么我更喜欢使用 TAB 作为列分隔符的原因。

I usually do that in an ordinary text editor (EditPlus) that supports RegExp, but the same regexps can be used in any programming language.

我通常在支持 RegExp 的普通文本编辑器 (EditPlus) 中执行此操作,但相同的 regexp 可用于任何编程语言。