SQL 连接/聚合字符串的最佳方式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13639262/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:22:40  来源:igfitidea点击:

Optimal way to concatenate/aggregate strings

sqlsql-serverazureaggregate-functionsazure-sql-database

提问by matt

I'm finding a way to aggregate strings from different rows into a single row. I'm looking to do this in many different places, so having a function to facilitate this would be nice. I've tried solutions using COALESCEand FOR XML, but they just don't cut it for me.

我正在寻找一种将不同行中的字符串聚合为一行的方法。我希望在许多不同的地方这样做,所以有一个功能来促进这一点会很好。我已经尝试过使用COALESCEand 的解决方案FOR XML,但他们只是不适合我。

String aggregation would do something like this:

字符串聚合会做这样的事情:

id | Name                    Result: id | Names
-- - ----                            -- - -----
1  | Matt                            1  | Matt, Rocks
1  | Rocks                           2  | Stylus
2  | Stylus

I've taken a look at CLR-defined aggregate functionsas a replacement for COALESCEand FOR XML, but apparently SQL Azuredoes notsupport CLR-defined stuff, which is a pain for me because I know being able to use it would solve a whole lot of problems for me.

我看过CLR 定义的聚合函数作为COALESCEand的替代品FOR XML,但显然SQL Azure支持 CLR 定义的东西,这对我来说很痛苦,因为我知道能够使用它会解决很多问题我的问题。

Is there any possible workaround, or similarly optimal method (which might not be as optimal as CLR, but heyI'll take what I can get) that I can use to aggregate my stuff?

是否有任何可能的解决方法,或类似的最佳方法(可能不如 CLR 最佳,但嘿,我会尽我所能)我可以用来聚合我的东西?

采纳答案by Serge Belov

SOLUTION

解决方案

The definition of optimalcan vary, but here's how to concatenate strings from different rows using regular Transact SQL, which should work fine in Azure.

最优的定义可能会有所不同,但这里介绍了如何使用常规 Transact SQL 连接来自不同行的字符串,这在 Azure 中应该可以正常工作。

;WITH Partitioned AS
(
    SELECT 
        ID,
        Name,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
        COUNT(*) OVER (PARTITION BY ID) AS NameCount
    FROM dbo.SourceTable
),
Concatenated AS
(
    SELECT 
        ID, 
        CAST(Name AS nvarchar) AS FullName, 
        Name, 
        NameNumber, 
        NameCount 
    FROM Partitioned 
    WHERE NameNumber = 1

    UNION ALL

    SELECT 
        P.ID, 
        CAST(C.FullName + ', ' + P.Name AS nvarchar), 
        P.Name, 
        P.NameNumber, 
        P.NameCount
    FROM Partitioned AS P
        INNER JOIN Concatenated AS C 
                ON P.ID = C.ID 
                AND P.NameNumber = C.NameNumber + 1
)
SELECT 
    ID,
    FullName
FROM Concatenated
WHERE NameNumber = NameCount

EXPLANATION

解释

The approach boils down to three steps:

该方法归结为三个步骤:

  1. Number the rows using OVERand PARTITIONgrouping and ordering them as needed for the concatenation. The result is PartitionedCTE. We keep counts of rows in each partition to filter the results later.

  2. Using recursive CTE (Concatenated) iterate through the row numbers (NameNumbercolumn) adding Namevalues to FullNamecolumn.

  3. Filter out all results but the ones with the highest NameNumber.

  1. 根据需要对行进行编号,OVERPARTITION根据需要对它们进行分组和排序。结果是PartitionedCTE。我们保留每个分区中的行数,以便稍后过滤结果。

  2. 使用递归 CTE ( Concatenated) 遍历行号 (NameNumber列) 向列添加NameFullName

  3. 过滤掉所有结果,但最高的那些NameNumber

Please keep in mind that in order to make this query predictable one has to define both grouping (for example, in your scenario rows with the same IDare concatenated) and sorting (I assumed that you simply sort the string alphabetically before concatenation).

请记住,为了使此查询可预测,必须同时定义分组(例如,在您的场景中将相同的行ID串联)和排序(我假设您只是在串联之前按字母顺序对字符串进行排序)。

I've quickly tested the solution on SQL Server 2012 with the following data:

我已经使用以下数据在 SQL Server 2012 上快速测试了该解决方案:

INSERT dbo.SourceTable (ID, Name)
VALUES 
(1, 'Matt'),
(1, 'Rocks'),
(2, 'Stylus'),
(3, 'Foo'),
(3, 'Bar'),
(3, 'Baz')

The query result:

查询结果:

ID          FullName
----------- ------------------------------
2           Stylus
3           Bar, Baz, Foo
1           Matt, Rocks

回答by slachterman

Are methods using FOR XML PATH like below really that slow? Itzik Ben-Gan writes that this method has good performance in his T-SQL Querying book (Mr. Ben-Gan is a trustworthy source, in my view).

像下面这样使用 FOR XML PATH 的方法真的那么慢吗?Itzik Ben-Gan 在他的 T-SQL Querying 一书中写道,这种方法具有良好的性能(在我看来,Ben-Gan 先生是一个值得信赖的来源)。

create table #t (id int, name varchar(20))

insert into #t
values (1, 'Matt'), (1, 'Rocks'), (2, 'Stylus')

select  id
        ,Names = stuff((select ', ' + name as [text()]
        from #t xt
        where xt.id = t.id
        for xml path('')), 1, 2, '')
from #t t
group by id

回答by Hrobky

For those of us who found this and are not using Azure SQL Database:

对于我们这些发现这个的人 并且未使用 Azure SQL 数据库

STRING_AGG()in PostgreSQL, SQL Server 2017 and Azure SQL
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql

STRING_AGG()在 PostgreSQL、SQL Server 2017 和 Azure SQL 中
https://www.postgresql.org/docs/current/static/functions-aggregate.html
https://docs.microsoft.com/en-us/sql/t-sql/函数/字符串聚合事务SQL

GROUP_CONCAT()in MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

GROUP_CONCAT()在 MySQL
http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

(Thanks to @Brianjorden and @milanio for Azure update)

(感谢 @Brianjorden 和 @milanio 的 Azure 更新)

Example Code:

示例代码:

select Id
, STRING_AGG(Name, ', ') Names 
from Demo
group by Id

SQL Fiddle: http://sqlfiddle.com/#!18/89251/1

SQL 小提琴:http://sqlfiddle.com/#! 18/89251/1

回答by QMaster

Although @serge answer is correct but i compared time consumption of his way against xmlpath and i found the xmlpath is so faster. I'll write the compare code and you can check it by yourself. This is @serge way:

虽然@serge 的答案是正确的,但我将他的方式与 xmlpath 的时间消耗进行了比较,我发现 xmlpath 更快。我会写比较代码,你可以自己检查。这是@serge方式:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (ID int, Name nvarchar(50))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE()

;WITH Partitioned AS
(
    SELECT 
        ID,
        Name,
        ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Name) AS NameNumber,
        COUNT(*) OVER (PARTITION BY ID) AS NameCount
    FROM @YourTable
),
Concatenated AS
(
    SELECT ID, CAST(Name AS nvarchar) AS FullName, Name, NameNumber, NameCount FROM Partitioned WHERE NameNumber = 1

    UNION ALL

    SELECT 
        P.ID, CAST(C.FullName + ', ' + P.Name AS nvarchar), P.Name, P.NameNumber, P.NameCount
    FROM Partitioned AS P
        INNER JOIN Concatenated AS C ON P.ID = C.ID AND P.NameNumber = C.NameNumber + 1
)
SELECT 
    ID,
    FullName
FROM Concatenated
WHERE NameNumber = NameCount

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 54 milliseconds

And this is xmlpath way:

这是 xmlpath 方式:

DECLARE @startTime datetime2;
DECLARE @endTime datetime2;
DECLARE @counter INT;
SET @counter = 1;

set nocount on;

declare @YourTable table (RowID int, HeaderValue int, ChildValue varchar(5))

WHILE @counter < 1000
BEGIN
    insert into @YourTable VALUES (@counter, ROUND(@counter/10,0), CONVERT(NVARCHAR(50), @counter) + 'CC')
    SET @counter = @counter + 1;
END

SET @startTime = GETDATE();

set nocount off
SELECT
    t1.HeaderValue
        ,STUFF(
                   (SELECT
                        ', ' + t2.ChildValue
                        FROM @YourTable t2
                        WHERE t1.HeaderValue=t2.HeaderValue
                        ORDER BY t2.ChildValue
                        FOR XML PATH(''), TYPE
                   ).value('.','varchar(max)')
                   ,1,2, ''
              ) AS ChildValues
    FROM @YourTable t1
    GROUP BY t1.HeaderValue

SET @endTime = GETDATE();

SELECT DATEDIFF(millisecond,@startTime, @endTime)
--Take about 4 milliseconds

回答by Brian Jorden

Update: Ms SQL Server 2017+, Azure SQL Database

更新:Ms SQL Server 2017+,Azure SQL 数据库

You can use: STRING_AGG.

您可以使用:STRING_AGG

Usage is pretty simple for OP's request:

对于 OP 的请求,用法非常简单:

SELECT id, STRING_AGG(name, ', ') AS names
FROM some_table
GROUP BY id

Read More

阅读更多

Well my old non-answer got rightfully deleted (left in-tact below), but if anyone happens to land here in the future, there is good news. They have implimented STRING_AGG() in Azure SQL Database as well. That should provide the exact functionality originally requested in this post with native and built in support. @hrobky mentioned this previously as a SQL Server 2016 feature at the time.

好吧,我的旧未回答被理所当然地删除了(在下面保持原样),但是如果将来有人碰巧降落在这里,那么有个好消息。他们也在 Azure SQL 数据库中实现了 STRING_AGG()。这应该提供本文中最初要求的确切功能以及本机和内置支持。@hrobky 之前提到这是当时 SQL Server 2016 的一项功能。

--- Old Post: Not enough reputation here to reply to @hrobky directly, but STRING_AGG looks great, however it is only available in SQL Server 2016 vNext currently. Hopefully it will follow to Azure SQL Datababse soon as well..

--- 旧帖子:这里没有足够的声望直接回复@hrobky,但是 STRING_AGG 看起来不错,但目前仅在 SQL Server 2016 vNext 中可用。希望它也能很快跟进 Azure SQL Datababse..

回答by jvc

You can use += to concatenate strings, for example:

您可以使用 += 来连接字符串,例如:

declare @test nvarchar(max)
set @test = ''
select @test += name from names

if you select @test, it will give you all names concatenated

如果您选择@test,它将为您提供连接的所有名称

回答by Tom Halladay

I found Serge's answer to be very promising, but I also encountered performance issues with it as-written. However, when I restructured it to use temporary tables and not include double CTE tables, the performance went from 1 minute 40 seconds to sub-second for 1000 combined records. Here it is for anyone who needs to do this without FOR XML on older versions of SQL Server:

我发现 Serge 的回答非常有希望,但我也遇到了它所写的性能问题。但是,当我将其重组为使用临时表而不包括双 CTE 表时,1000 条合并记录的性能从 1 分 40 秒变为亚秒级。这里适用于需要在旧版本 SQL Server 上执行此操作而不使用 FOR XML 的任何人:

DECLARE @STRUCTURED_VALUES TABLE (
     ID                 INT
    ,VALUE              VARCHAR(MAX) NULL
    ,VALUENUMBER        BIGINT
    ,VALUECOUNT         INT
);

INSERT INTO @STRUCTURED_VALUES
SELECT   ID
        ,VALUE
        ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY VALUE) AS VALUENUMBER
        ,COUNT(*) OVER (PARTITION BY ID)    AS VALUECOUNT
FROM    RAW_VALUES_TABLE;

WITH CTE AS (
    SELECT   SV.ID
            ,SV.VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    WHERE   VALUENUMBER = 1

    UNION ALL

    SELECT   SV.ID
            ,CTE.VALUE + ' ' + SV.VALUE AS VALUE
            ,SV.VALUENUMBER
            ,SV.VALUECOUNT
    FROM    @STRUCTURED_VALUES SV
    JOIN    CTE 
        ON  SV.ID = CTE.ID
        AND SV.VALUENUMBER = CTE.VALUENUMBER + 1

)
SELECT   ID
        ,VALUE
FROM    CTE
WHERE   VALUENUMBER = VALUECOUNT
ORDER BY ID
;