MySQL GROUP_CONCAT 转义
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/452357/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL GROUP_CONCAT escaping
提问by Bill Zeller
(NOTE: This question is not about escaping queries, it's about escaping results)
(注意:这个问题不是关于转义查询,而是关于转义结果)
I'm using GROUP_CONCATto combine multiple rows into a comma delimited list. For example, assume I have the two (example) tables:
我正在使用GROUP_CONCAT将多行组合成一个逗号分隔的列表。例如,假设我有两个(示例)表:
CREATE TABLE IF NOT EXISTS `Comment` (
`id` int(11) unsigned NOT NULL auto_increment,
`post_id` int(11) unsigned NOT NULL,
`name` varchar(255) collate utf8_unicode_ci NOT NULL,
`comment` varchar(255) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=6 ;
INSERT INTO `Comment` (`id`, `post_id`, `name`, `comment`) VALUES
(1, 1, 'bill', 'some comment'),
(2, 1, 'john', 'another comment'),
(3, 2, 'bill', 'blah'),
(4, 3, 'john', 'asdf'),
(5, 4, 'x', 'asdf');
CREATE TABLE IF NOT EXISTS `Post` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=7 ;
INSERT INTO `Post` (`id`, `title`) VALUES
(1, 'first post'),
(2, 'second post'),
(3, 'third post'),
(4, 'fourth post'),
(5, 'fifth post'),
(6, 'sixth post');
And I want to list all posts along with a list of each username who commented on the post:
我想列出所有帖子以及对该帖子发表评论的每个用户名的列表:
SELECT
Post.id as post_id, Post.title as title, GROUP_CONCAT(name)
FROM Post
LEFT JOIN Comment on Comment.post_id = Post.id
GROUP BY Post.id
gives me:
给我:
id title GROUP_CONCAT( name )
1 first post bill,john
2 second post bill
3 third post john
4 fourth post x
5 fifth post NULL
6 sixth post NULL
This works great, except that if a username contains a comma it will ruin the list of users. Does MySQL have a function that will let me escape these characters? (Please assume usernames can contain any characters, since this is only an example schema)
这很好用,除了如果用户名包含逗号它会破坏用户列表。MySQL 是否有一个函数可以让我转义这些字符?(请假设用户名可以包含任何字符,因为这只是一个示例架构)
回答by Lemon Juice
Actually, there are ascii control characters
specifically designed for separating database fields and records:
实际上,有ascii control characters
专门为分离数据库字段和记录而设计的:
0x1F (31): unit (fields) separator
0x1E (30): record separator
0x1D (29): group separator
Read more: about ascii characters
阅读更多:关于 ascii 字符
You will never have them in usernames and most probably never in any other non-binary data
in your database so they can be used safely:
您永远不会在用户名中使用它们,而且很可能永远不会non-binary data
在您的数据库中的任何其他地方使用它们,因此可以安全地使用它们:
GROUP_CONCAT(foo SEPARATOR 0x1D)
Then split by CHAR(0x1D)
in whatever client language you want.
然后以CHAR(0x1D)
您想要的任何客户端语言拆分。
回答by ???u
If there's some other character that's illegal in usernames, you can specify a different separator character using a little-known syntax:
如果用户名中存在其他非法字符,您可以使用鲜为人知的语法指定不同的分隔符:
...GROUP_CONCAT(name SEPARATOR '|')...
... You want to allow pipes? or any character?
...你想允许管道?或任何字符?
Escape the separator character, perhaps with backslash, but before doing that escape backslashes themselves:
转义分隔符,可能使用反斜杠,但在此之前转义反斜杠:
group_concat(replace(replace(name, '\', '\\'), '|', '\|') SEPARATOR '|')
This will:
这会:
- escape any backslashes with another backslash
- escape the separator character with a backslash
- concatenate the results with the separator character
- 用另一个反斜杠转义任何反斜杠
- 用反斜杠转义分隔符
- 用分隔符连接结果
To get the unescaped results, do the same thing in the reverse order:
要获得未转义的结果,请按相反的顺序执行相同的操作:
- split the results by the separator character where not preceded by a backslash. Actually, it's a little tricky, you want to split it where it isn't preceded by an odd numberof blackslashes. This regex will match that:
(?<!\\)(?:\\\\)*\|
- replace all escaped separator chars with literals, i.e. replace \| with |
- replace all double backslashes with singe backslashes, e.g. replace \\ with \
- 按分隔符分割结果,前面没有反斜杠。实际上,这有点棘手,您想在它前面没有奇数个黑斜线的地方拆分它。这个正则表达式将匹配:
(?<!\\)(?:\\\\)*\|
- 用文字替换所有转义的分隔符,即替换 \| 与|
- 用单个反斜杠替换所有双反斜杠,例如用 \ 替换 \\
回答by Bill Zeller
I'd suggest GROUP_CONCAT(name SEPARATOR '\n'), since \n usually does not occur. This might be a little simpler, since you don't need to escape anything, but could lead to unexpected problems. The encodeing/regexp decoding stuff as proposed by nick is of course nice too.
我建议 GROUP_CONCAT(name SEPARATOR '\n'),因为 \n 通常不会出现。这可能更简单一些,因为您不需要逃避任何事情,但可能会导致意想不到的问题。nick 提出的编码/正则表达式解码内容当然也很好。
回答by Bill Karwin
回答by derobert
If you're going to be doing the decoding in your application, maybe just use hex
:
如果您打算在您的应用程序中进行解码,也许只需使用hex
:
SELECT GROUP_CONCAT(HEX(foo)) ...
or you could also put the length in them:
或者你也可以把长度放在里面:
SELECT GROUP_CONCAT(CONCAT(LENGTH(foo), ':', foo)) ...
Not that I tested either :-D
也不是我测试过:-D
回答by Bill Zeller
Jason S: This is exactly the issue I'm dealing with. I'm using an PHP MVC framework and was processing the results like you describe (multiple rows per result and code to group the results together). However, I've been working on two functions for my models to implement. One returns a list of all necessary fields needed to recreate the object and the other is a function that given a row with the fields from the first function, instantiate a new object. This lets me request a row from the database and easily turn it back into the object without knowing the internals of the data needed by the model. This doesn't work quite as well when multiple rows represent one object, so I was trying to use GROUP_CONCAT to get around that problem.
Jason S:这正是我正在处理的问题。我正在使用 PHP MVC 框架,并且正在处理您描述的结果(每个结果多行和将结果组合在一起的代码)。但是,我一直在为我的模型实现两个功能。一个返回重新创建对象所需的所有必要字段的列表,另一个是一个函数,该函数给出包含第一个函数的字段的行,实例化一个新对象。这让我可以从数据库请求一行并轻松地将其转换回对象,而无需知道模型所需数据的内部结构。当多行代表一个对象时,这不太好,所以我试图使用 GROUP_CONCAT 来解决这个问题。
回答by bobince
Right now I'm allowing any character. I realize a pipe would be unlikely to show up, but I'd like to allow it.
现在我允许任何字符。我意识到管道不太可能出现,但我想允许它出现。
How about a control character, which you should be stripping out of application input anyway? I doubt you need eg. a tab or a newline in a name field.
无论如何,您应该从应用程序输入中剥离一个控制字符怎么样?我怀疑你需要例如。名称字段中的制表符或换行符。
回答by bonger
Just to expand on some of the answers, I implemented @derobert 's second suggestionin PHP and it works well. Given MySQL such as:
只是为了扩展一些答案,我在 PHP 中实现了 @derobert 的第二个建议,并且效果很好。给定 MySQL,例如:
GROUP_CONCAT(CONCAT(LENGTH(field), ':', field) SEPARATOR '') AS fields
I used the following function to split it:
我使用以下函数来拆分它:
function concat_split( $str ) {
// Need to guard against PHP's stupid multibyte string function overloading.
static $mb_overload_string = null;
if ( null === $mb_overload_string ) {
$mb_overload_string = defined( 'MB_OVERLOAD_STRING' )
&& ( ini_get( 'mbstring.func_overload' ) & MB_OVERLOAD_STRING );
}
if ( $mb_overload_string ) {
$mb_internal_encoding = mb_internal_encoding();
mb_internal_encoding( '8bit' );
}
$ret = array();
for ( $offset = 0; $colon = strpos( $str, ':', $offset ); $offset = $colon + 1 + $len ) {
$len = intval( substr( $str, $offset, $colon ) );
$ret[] = substr( $str, $colon + 1, $len );
}
if ( $mb_overload_string ) {
mb_internal_encoding( $mb_internal_encoding );
}
return $ret;
}
I also initially implemented @???u 's suggestion, using one of @Lemon Juice 's separators. It worked fine but apart from its complication it was slower, the main problem being that PCRE only allows fixed length lookbehind so using the suggested regex to split requires capturing the delimiters, otherwise doubled backslashes at the end of strings will be lost. So given MySQL such as (note 4 PHP backslashes => 2 MySQL backslashes => 1 real backslash):
我最初也使用 @Lemon Juice 的分隔符之一实现了 @???u 的建议。它工作正常,但除了它的复杂性之外,它更慢,主要问题是 PCRE 只允许固定长度的后视,因此使用建议的正则表达式进行拆分需要捕获分隔符,否则字符串末尾的双反斜杠将丢失。因此,给定 MySQL,例如(注意 4 PHP 反斜杠 => 2 MySQL 反斜杠 => 1 个真正的反斜杠):
GROUP_CONCAT(REPLACE(REPLACE(field, '\\', '\\\\'),
CHAR(31), CONCAT('\\', CHAR(31))) SEPARATOR 0x1f) AS fields
the split function was:
拆分功能是:
function concat_split( $str ) {
$ret = array();
// 4 PHP backslashes => 2 PCRE backslashes => 1 real backslash.
$strs = preg_split( '/(?<!\\)((?:\\\\)*+\x1f)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE );
// Need to add back any captured double backslashes.
for ( $i = 0, $cnt = count( $strs ); $i < $cnt; $i += 2 ) {
$ret[] = isset( $strs[ $i + 1 ] ) ? ( $strs[ $i ] . substr( $strs[ $i + 1 ], 0, -1 ) ) : $strs[ $i ];
}
return str_replace( array( "\\x1f", "\\" ), array( "\x1f", "\" ), $ret );
}
回答by benlumley
what nick said really, with an enhancement - the separator can be more than one character too.
nick 说的是真的,有一个增强 - 分隔符也可以是多个字符。
I've often used
我经常用
GROUP_CONCAT(name SEPARATOR '"|"')
Chances of a username containing "|" are fairly low i'd say.
用户名包含“|”的可能性 我会说相当低。
回答by Jason S
You're getting into that gray area where it might be better to postprocess this outside the world of SQL.
您正在进入那个灰色区域,在 SQL 世界之外对它进行后处理可能会更好。
At least that's what I'd do: I'd just ORDER BY instead of GROUP BY, and loop through the results to handle the grouping as a filter done in the client language:
至少这就是我要做的:我只是 ORDER BY 而不是 GROUP BY,并遍历结果以将分组处理为客户端语言中完成的过滤器:
- Start by initializing
last_id
to NULL - Fetch the next row of the resultset (if there aren't more rows go to step 6)
If the id of the row is different than
last_id
start a new output row:a. if
last_id
isn't NULL then output the grouped rowb. set the new grouped row = the input row, but store the name as a single element array
c. set
last_id
to the value of the current IDOtherwise (id is the same as
last_id
) append the row name onto the existing grouped row.- Go back to step 2
- Otherwise you have finished; if the
last_id
isn't NULL then output the existing group row.
- 首先初始化
last_id
为 NULL - 获取结果集的下一行(如果没有更多行,请转到第 6 步)
如果行的 id 不同于
last_id
开始新的输出行:一种。如果
last_id
不是 NULL 则输出分组的行湾 设置新的分组行 = 输入行,但将名称存储为单个元素数组
C。设置
last_id
为当前ID的值否则(id 与 相同
last_id
)将行名称附加到现有的分组行上。- 返回第 2 步
- 否则你已经完成了;如果
last_id
不是 NULL 则输出现有的组行。
Then your output ends up including names organized as an array and can decide how you want to handle/escape/format them then.
然后您的输出最终包括组织为数组的名称,然后可以决定您希望如何处理/转义/格式化它们。
What language/system are you using? PHP? Perl? Java?
你使用什么语言/系统?PHP?珀尔?爪哇?