SQL 连接/聚合字符串的最佳方式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13639262/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimal way to concatenate/aggregate strings
提问by matt
I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCE
and FOR XML
, but they just don't cut it for me.
我正在寻找一种将不同行中的字符串聚合为一行的方法。我希望在许多不同的地方这样做,所以有一个功能来促进这一点会很好。我已经尝试过使用COALESCE
and 的解决方案FOR XML
,但他们只是不适合我。
String aggregation would do something like this:
字符串聚合会做这样的事情:
id | Name Result: id | Names
-- - ---- -- - -----
1 | Matt 1 | Matt, Rocks
1 | Rocks 2 | Stylus
2 | Stylus
I've taken a look at CLR-defined aggregate functionsas a replacement for COALESCE
and FOR XML
, but apparently SQL Azuredoes notsupport CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.
我看过CLR 定义的聚合函数作为COALESCE
and的替代品FOR XML
,但显然SQL Azure不支持 CLR 定义的东西,这对我来说很痛苦,因为我知道能够使用它会解决很多问题我的问题。
Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but heyI'll take what I can get) that I can use to aggregate my stuff?
是否有任何可能的解决方法,或类似的最佳方法(可能不如 CLR 最佳,但嘿,我会尽我所能)我可以用来聚合我的东西?
采纳答案by Serge Belov
SOLUTION
解决方案
The definition of optimalcan vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.
最优的定义可能会有所不同,但这里介绍了如何使用常规 Transact SQL 连接来自不同行的字符串,这在 Azure 中应该可以正常工作。
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM dbo.SourceTable
),
Concatenated AS
(
SELECT
ID,
CAST(Name AS nvarchar) AS FullName,
Name,
NameNumber,
NameCount
FROM Partitioned
WHERE NameNumber = 1
UNION ALL
SELECT
P.ID,
CAST(C.FullName + ', ' + P.Name AS nvarchar),
P.Name,
P.NameNumber,
P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C
ON P.ID = C.ID
AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
EXPLANATION
解释
The approach boils down to three steps:
该方法归结为三个步骤:
Number the rows using
OVER
andPARTITION
grouping and ordering them as needed for the concatenation. The result isPartitioned
CTE. We keep counts of rows in each partition to filter the results later.Using recursive CTE (
Concatenated
) iterate through the row numbers (NameNumber
column) addingName
values toFullName
column.Filter out all results but the ones with the highest
NameNumber
.
根据需要对行进行编号,
OVER
并PARTITION
根据需要对它们进行分组和排序。结果是Partitioned
CTE。我们保留每个分区中的行数,以便稍后过滤结果。使用递归 CTE (
Concatenated
) 遍历行号 (NameNumber
列) 向列添加Name
值FullName
。过滤掉所有结果,但最高的那些
NameNumber
。
Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same ID
are concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).
请记住,为了使此查询可预测,必须同时定义分组(例如,在您的场景中将相同的行ID
串联)和排序(我假设您只是在串联之前按字母顺序对字符串进行排序)。
I've quickly tested the solution on SQL Server 2012 with the following data:
我已经使用以下数据在 SQL Server 2012 上快速测试了该解决方案:
INSERT dbo.SourceTable (ID, Name)
VALUES
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')
The query result:
查询结果:
ID FullName
----------- ------------------------------
2 Stylus
3 Bar, Baz, Foo
1 Matt, Rocks
回答by slachterman
Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).
像下面这样使用 FOR XML PATH 的方法真的那么慢吗?Itzik Ben-Gan 在他的 T-SQL Querying 一书中写道,这种方法具有良好的性能(在我看来,Ben-Gan 先生是一个值得信赖的来源)。
create table #t (id int, name varchar(20))
insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')
select id
,Names = stuff((select ', ' + name as [text()]
from #t xt
where xt.id = t.id
for xml path('')), 1, 2, '')
from #t t
group by id
回答by Hrobky
For those of us who found this and are not using Azure SQL Database:
对于我们这些发现这个的人 并且未使用 Azure SQL 数据库:
STRING_AGG()
in PostgreSQL, SQL Server 2017 and Azure SQL
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql
STRING_AGG()
在 PostgreSQL、SQL Server 2017 和 Azure SQL 中
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://docs.microsoft.com/en-us/sql/t-sql/函数/字符串聚合事务SQL
GROUP_CONCAT()
in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
GROUP_CONCAT()
在 MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat
(Thanks to @Brianjorden and @milanio for Azure update)
(感谢 @Brianjorden 和 @milanio 的 Azure 更新)
Example Code:
示例代码:
select Id
, STRING_AGG(Name, ', ') Names
from Demo
group by Id
SQL Fiddle: http://sqlfiddle.com/#!18/89251/1
SQL 小提琴:http://sqlfiddle.com/#! 18/89251/1
回答by QMaster
Although @serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself. This is @serge way:
虽然@serge 的答案是正确的,但我将他的方式与 xmlpath 的时间消耗进行了比较,我发现 xmlpath 更快。我会写比较代码,你可以自己检查。这是@serge方式:
DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;
set nocount on;
declare @YourTable table (ID int, Name nvarchar(50))
WHILE @counter < 1000
BEGIN
insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
SET @counter = @counter + 1;
END
SET @startTime = GETDATE()
;WITH Partitioned AS
(
SELECT
ID,
Name,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
COUNT(*) OVER (PARTITION BY ID) AS NameCount
FROM @YourTable
),
Concatenated AS
(
SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1
UNION ALL
SELECT
P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
FROM Partitioned AS P
INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT
ID,
FullName
FROM Concatenated
WHERE NameNumber = NameCount
SET @endTime = GETDATE();
SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 54 milliseconds
And this is xmlpath way:
这是 xmlpath 方式:
DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;
set nocount on;
declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))
WHILE @counter < 1000
BEGIN
insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
SET @counter = @counter + 1;
END
SET @startTime = GETDATE();
set nocount off
SELECT
t1.HeaderValue
,STUFF(
(SELECT
', ' + t2.ChildValue
FROM @YourTable t2
WHERE t1.HeaderValue=t2.HeaderValue
ORDER BY t2.ChildValue
FOR XML PATH(''), TYPE
).value('.','varchar(max)')
,1,2, ''
) AS ChildValues
FROM @YourTable t1
GROUP BY t1.HeaderValue
SET @endTime = GETDATE();
SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 4 milliseconds
回答by Brian Jorden
Update: Ms SQL Server 2017+, Azure SQL Database
更新:Ms SQL Server 2017+,Azure SQL 数据库
You can use: STRING_AGG
.
您可以使用:STRING_AGG
。
Usage is pretty simple for OP's request:
对于 OP 的请求,用法非常简单:
SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id
Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. @hrobky mentioned this previously as a SQL Server 2016 feature at the time.
好吧,我的旧未回答被理所当然地删除了(在下面保持原样),但是如果将来有人碰巧降落在这里,那么有个好消息。他们也在 Azure SQL 数据库中实现了 STRING_AGG()。这应该提供本文中最初要求的确切功能以及本机和内置支持。@hrobky 之前提到这是当时 SQL Server 2016 的一项功能。
--- Old Post: Not enough reputation here to reply to @hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..
--- 旧帖子:这里没有足够的声望直接回复@hrobky,但是 STRING_AGG 看起来不错,但目前仅在 SQL Server 2016 vNext 中可用。希望它也能很快跟进 Azure SQL Datababse..
回答by jvc
You can use += to concatenate strings, for example:
您可以使用 += 来连接字符串,例如:
declare @test nvarchar(max)
set @test = ''
select @test += name from names
if you select @test, it will give you all names concatenated
如果您选择@test,它将为您提供连接的所有名称
回答by Tom Halladay
I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:
我发现 Serge 的回答非常有希望,但我也遇到了它所写的性能问题。但是,当我将其重组为使用临时表而不包括双 CTE 表时,1000 条合并记录的性能从 1 分 40 秒变为亚秒级。这里适用于需要在旧版本 SQL Server 上执行此操作而不使用 FOR XML 的任何人:
DECLARE @STRUCTURED_VALUES TABLE (
ID INT
,VALUE VARCHAR(MAX) NULL
,VALUENUMBER BIGINT
,VALUECOUNT INT
);
INSERT INTO @STRUCTURED_VALUES
SELECT ID
,VALUE
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
,COUNT(*) OVER (PARTITION BY ID) AS VALUECOUNT
FROM RAW_VALUES_TABLE;
WITH CTE AS (
SELECT SV.ID
,SV.VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM @STRUCTURED_VALUES SV
WHERE VALUENUMBER = 1
UNION ALL
SELECT SV.ID
,CTE.VALUE + ' ' + SV.VALUE AS VALUE
,SV.VALUENUMBER
,SV.VALUECOUNT
FROM @STRUCTURED_VALUES SV
JOIN CTE
ON SV.ID = CTE.ID
AND SV.VALUENUMBER = CTE.VALUENUMBER + 1
)
SELECT ID
,VALUE
FROM CTE
WHERE VALUENUMBER = VALUECOUNT
ORDER BY ID
;