从 sql server 中的字符串中去除 html 标签的最佳方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/457701/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to strip html tags from a string in sql server?
提问by Rory
I've got data in SQL Server 2005 that contains html tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like <
with <
, etc.
我在 SQL Server 2005 中有包含 html 标签的数据,我想去掉所有这些,只留下标签之间的文本。理想情况下还可以替换诸如<
with 之类的东西<
。
Is there an easy way to do this or has someone already got some sample t-sql code?
有没有一种简单的方法可以做到这一点,或者有人已经得到了一些示例 t-sql 代码?
I don't have the ability to add extended stored procs and the like, so would prefer a pure t-sql approach (preferably one backwards compatible with sql 2000).
我没有能力添加扩展存储过程等,所以更喜欢纯 t-sql 方法(最好是与 sql 2000 向后兼容的方法)。
I just want to retrieve the data with stripped out html, not update it, so ideally it would be written as a user-defined function, to make for easy reuse.
我只想用剥离的 html 检索数据,而不是更新它,所以理想情况下它会被编写为用户定义的函数,以便于重用。
So for example converting this:
所以例如转换这个:
<B>Some useful text</B>
<A onclick="return openInfo(this)"
href="http://there.com/3ce984e88d0531bac5349"
target=globalhelp>
<IMG title="Source Description" height=15 alt="Source Description"
src="/ri/new_info.gif" width=15 align=top border=0>
</A>> <b>more text</b></TD></TR>
to this:
对此:
Some useful text > more text
回答by RedFilter
There is a UDF that will do that described here:
有一个 UDF 可以执行此处描述的操作:
User Defined Function to Strip HTML
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO
Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.
编辑:请注意,这是针对 SQL Server 2005,但如果将关键字 MAX 更改为 4000 之类的内容,它也将适用于 SQL Server 2000。
回答by Patrick Honorez
Derived from @Goner Doug answer, with a few things updated:
- using REPLACE where possible
- conversion of predefined entities like é
(I chose the ones I needed :-)
- some conversion of list tags <ul> and <li>
来自@Goner Doug 的回答,更新了一些内容:
- 尽可能使用 REPLACE
- 预定义实体的转换,例如é
(我选择了我需要的实体:-)
- 列表标签的一些转换<ul> and <li>
ALTER FUNCTION [dbo].[udf_StripHTML]
--by Patrick Honorez --- www.idevlop.com
--inspired by http://stackoverflow.com/questions/457701/best-way-to-strip-html-tags-from-a-string-in-sql-server/39253602#39253602
(
@HTMLText varchar(MAX)
)
RETURNS varchar(MAX)
AS
BEGIN
DECLARE @Start int
DECLARE @End int
DECLARE @Length int
set @HTMLText = replace(@htmlText, '<br>',CHAR(13) + CHAR(10))
set @HTMLText = replace(@htmlText, '<br/>',CHAR(13) + CHAR(10))
set @HTMLText = replace(@htmlText, '<br />',CHAR(13) + CHAR(10))
set @HTMLText = replace(@htmlText, '<li>','- ')
set @HTMLText = replace(@htmlText, '</li>',CHAR(13) + CHAR(10))
set @HTMLText = replace(@htmlText, '’' collate Latin1_General_CS_AS, '''' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '"' collate Latin1_General_CS_AS, '"' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '&' collate Latin1_General_CS_AS, '&' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '€' collate Latin1_General_CS_AS, '' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '<' collate Latin1_General_CS_AS, '<' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '>' collate Latin1_General_CS_AS, '>' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, ' ' collate Latin1_General_CS_AS, ' ' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '©' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '«' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '®' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '±' collate Latin1_General_CS_AS, '±' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '²' collate Latin1_General_CS_AS, '2' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '³' collate Latin1_General_CS_AS, '3' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'µ' collate Latin1_General_CS_AS, 'μ' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '·' collate Latin1_General_CS_AS, '·' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'º' collate Latin1_General_CS_AS, 'o' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '»' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '¼' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '½' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '¾' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '&Aelig' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'Ç' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'È' collate Latin1_General_CS_AS, 'è' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'É' collate Latin1_General_CS_AS, 'é' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'Ê' collate Latin1_General_CS_AS, 'ê' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'Ö' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'à' collate Latin1_General_CS_AS, 'à' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'â' collate Latin1_General_CS_AS, 'a' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ä' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'æ' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ç' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'è' collate Latin1_General_CS_AS, 'è' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'é' collate Latin1_General_CS_AS, 'é' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ê' collate Latin1_General_CS_AS, 'ê' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ë' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'î' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ô' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ö' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '÷' collate Latin1_General_CS_AS, '÷' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ø' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ù' collate Latin1_General_CS_AS, 'ù' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ú' collate Latin1_General_CS_AS, 'ú' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'û' collate Latin1_General_CS_AS, '?' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, 'ü' collate Latin1_General_CS_AS, 'ü' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '"' collate Latin1_General_CS_AS, '"' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '&' collate Latin1_General_CS_AS, '&' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '‹' collate Latin1_General_CS_AS, '<' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '›' collate Latin1_General_CS_AS, '>' collate Latin1_General_CS_AS)
-- Remove anything between <STYLE> tags
SET @Start = CHARINDEX('<STYLE', @HTMLText)
SET @End = CHARINDEX('</STYLE>', @HTMLText, CHARINDEX('<', @HTMLText)) + 7
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<STYLE', @HTMLText)
SET @End = CHARINDEX('</STYLE>', @HTMLText, CHARINDEX('</STYLE>', @HTMLText)) + 7
SET @Length = (@End - @Start) + 1
END
-- Remove anything between <whatever> tags
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
回答by dudeNumber4
If your HTML is well formed, I think this is a better solution:
如果您的 HTML 格式良好,我认为这是一个更好的解决方案:
create function dbo.StripHTML( @text varchar(max) ) returns varchar(max) as
begin
declare @textXML xml
declare @result varchar(max)
set @textXML = REPLACE( @text, '&', '' );
with doc(contents) as
(
select chunks.chunk.query('.') from @textXML.nodes('/') as chunks(chunk)
)
select @result = contents.value('.', 'varchar(max)') from doc
return @result
end
go
select dbo.StripHTML('This <i>is</i> an <b>html</b> test')
回答by Goner Doug
Here's an updated version of this function that incorporates the RedFilter answer (Pinal's original) with the LazyCoders additions and the goodeye typo corrections AND my own addition to handle in-line <STYLE>
tags inside the HTML.
这是此函数的更新版本,它结合了 RedFilter 答案(Pinal 的原件)与 LazyCoders 添加和 goodeye 错字更正以及我自己添加的用于处理<STYLE>
HTML 中的内嵌标记的添加。
ALTER FUNCTION [dbo].[udf_StripHTML]
(
@HTMLText varchar(MAX)
)
RETURNS varchar(MAX)
AS
BEGIN
DECLARE @Start int
DECLARE @End int
DECLARE @Length int
-- Replace the HTML entity & with the '&' character (this needs to be done first, as
-- '&' might be double encoded as '&amp;')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity < with the '<' character
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '<')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity > with the '>' character
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '>')
SET @Start = CHARINDEX('>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity & with the '&' character
SET @Start = CHARINDEX('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '&')
SET @Start = CHARINDEX('&amp;', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace the HTML entity with the ' ' character
SET @Start = CHARINDEX(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, ' ')
SET @Start = CHARINDEX(' ', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Replace any <br> tags with a newline
SET @Start = CHARINDEX('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX('<br>', @HTMLText)
SET @End = @Start + 3
SET @Length = (@End - @Start) + 1
END
-- Replace any <br/> tags with a newline
SET @Start = CHARINDEX('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX('<br/>', @HTMLText)
SET @End = @Start + 4
SET @Length = (@End - @Start) + 1
END
-- Replace any <br /> tags with a newline
SET @Start = CHARINDEX('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, CHAR(13) + CHAR(10))
SET @Start = CHARINDEX('<br />', @HTMLText)
SET @End = @Start + 5
SET @Length = (@End - @Start) + 1
END
-- Remove anything between <STYLE> tags
SET @Start = CHARINDEX('<STYLE', @HTMLText)
SET @End = CHARINDEX('</STYLE>', @HTMLText, CHARINDEX('<', @HTMLText)) + 7
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<STYLE', @HTMLText)
SET @End = CHARINDEX('</STYLE>', @HTMLText, CHARINDEX('</STYLE>', @HTMLText)) + 7
SET @Length = (@End - @Start) + 1
END
-- Remove anything between <whatever> tags
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
WHILE (@Start > 0 AND @End > 0 AND @Length > 0) BEGIN
SET @HTMLText = STUFF(@HTMLText, @Start, @Length, '')
SET @Start = CHARINDEX('<', @HTMLText)
SET @End = CHARINDEX('>', @HTMLText, CHARINDEX('<', @HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
回答by David
This is not a complete new solution but a correction for afwebservant's solution:
这不是一个全新的解决方案,而是对afwebservant解决方案的修正:
--note comments to see the corrections
CREATE FUNCTION [dbo].[StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
--DECLARE @TempStr varchar(255) (this is not used)
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 4)) <> '<BR>') AND (UPPER(SUBSTRING(@HTMLText, @Start, 5)) <> '</BR>')
begin
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
end
-- this ELSE and SET is important
ELSE
SET @Length = 0;
-- minus @Length here below is important
SET @Start = CHARINDEX('<',@HTMLText, @End-@Length)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText, @Start))
-- instead of -1 it should be +1
SET @Length = (@End - @Start) + 1
END
RETURN RTRIM(LTRIM(@HTMLText))
END
回答by codeaf
Try this. It's a modified version of the one posted by RedFilter ... this SQL removes all tags except BR, B, and P with any accompanying attributes:
尝试这个。这是 RedFilter 发布的版本的修改版本……此 SQL 删除了除 BR、B 和 P 之外的所有标签以及任何附带属性:
CREATE FUNCTION [dbo].[StripHtml] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
DECLARE @TempStr varchar(255)
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 3)) <> '<BR') AND (UPPER(SUBSTRING(@HTMLText, @Start, 2)) <> '<P') AND (UPPER(SUBSTRING(@HTMLText, @Start, 2)) <> '<B') AND (UPPER(SUBSTRING(@HTMLText, @Start, 3)) <> '</B')
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
END
SET @Start = CHARINDEX('<',@HTMLText, @End)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText, @Start))
SET @Length = (@End - @Start) - 1
END
RETURN RTRIM(LTRIM(@HTMLText))
END
回答by Arvin Amir
How about using XQuery with a one liner:
将 XQuery 与单行一起使用如何:
DECLARE @MalformedXML xml, @StrippedText varchar(max)
SET @MalformedXML = @xml.query('for $x in //. return ($x)//text()')
SET @StrippedText = CAST(@MalformedXML as varchar(max))
This loops through all elements and returns the text() only.
这将遍历所有元素并仅返回 text()。
To avoid text between elements concatenating without spaces, use:
为避免元素之间没有空格连接的文本,请使用:
DECLARE @MalformedXML xml, @StrippedText varchar(max)
SET @MalformedXML = @xml.query('for $x in //. return concat((($x)//text())[1]," ")')
SET @StrippedText = CAST(@MalformedXML as varchar(max))
And to respond to "How do you use this for a column:
并回应“您如何将其用于列:
SELECT CAST(html_column.query('for $x in //. return concat((($x)//text()) as varchar(max))
FROM table
For the above code, ensure your html_column
is of data type xml
, if not, you need to save a casted version of the html as xml
. I would do this as a separate exercise when you are loading HTML data, as SQL will throw an error if it finds malformed xml, e.g. mismatched start/end tags, invalid characters.
对于上面的代码,请确保您html_column
的数据类型为xml
,如果不是,则需要将 html 的强制转换版本另存为xml
. 当您加载 HTML 数据时,我会将其作为单独的练习来执行,因为如果 SQL 发现格式错误的 xml,例如不匹配的开始/结束标记、无效字符,则会抛出错误。
These are excellent for when you want to build seachh phrases, strip HTML, etc.
当您想要构建搜索短语、剥离 HTML 等时,这些非常有用。
Just note that this returns type xml, so CAST or COVERT to text where appropriate. The xml version of this data type is useless, as it is not a well formed XML.
请注意,这将返回类型 xml,因此在适当的地方将 CAST 或 COVERT 转换为文本。这种数据类型的 xml 版本没有用,因为它不是格式良好的 XML。
回答by Clement
Here is a version that doesn't require an UDF and works even if the HTML contains tags without matching closing tags.
这是一个不需要 UDF 的版本,即使 HTML 包含没有匹配结束标签的标签也能工作。
TRY_CAST(REPLACE(REPLACE(REPLACE([HtmlCol], '>', '/> '), '</', '<'), '--/>', '-->') AS XML).value('.', 'NVARCHAR(MAX)')
回答by Brian Heward
While Arvin Amir's answer comes close to a full one-line solution you can drop in anywhere; he's got a slight bug in his select statement (missing the end of the line), and I wanted to handle the most common character references.
虽然 Arvin Amir 的答案接近于一个完整的单线解决方案,但您可以随处使用;他的 select 语句中有一个小错误(缺少行尾),我想处理最常见的字符引用。
What I ended up doing was this:
我最终做的是这样的:
SELECT replace(replace(replace(CAST(CAST(replace([columnNameHere], '&', '&') as xml).query('for $x in //. return concat((($x)//text())[1]," ")') as varchar(max)), '&', '&'), ' ', ' '), ' ', ' ')
FROM [tableName]
Without the character reference code it can be simplified to this:
如果没有字符参考代码,它可以简化为:
SELECT CAST(CAST([columnNameHere] as xml).query('for $x in //. return concat((($x)//text())[1]," ")') as varchar(max))
FROM [tableName]
回答by Maghi85
Patrick Honorez code needs a slight change.
Patrick Honorez 代码需要稍作改动。
It returns incomplete results for html that contains <
or >
它返回包含<
或的html 的不完整结果>
This is because the code below the section
这是因为该部分下面的代码
-- Remove anything between tags
-- 删除标签之间的任何内容
will in fact replace the < > to nothing. The fix is to the apply the below two lines at the end:
实际上会将 < > 替换为空。解决方法是在最后应用以下两行:
set @HTMLText = replace(@htmlText, '<' collate Latin1_General_CS_AS, '<' collate Latin1_General_CS_AS)
set @HTMLText = replace(@htmlText, '>' collate Latin1_General_CS_AS, '>' collate Latin1_General_CS_AS)