SQL 使用 PostgreSQL 修剪尾随空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22699535/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:29:17  来源:igfitidea点击:

Trim trailing spaces with PostgreSQL

sqlpostgresqlwhitespacetrimremoving-whitespace

提问by zgall1

I have a column eventDatewhich contains trailing spaces. I am trying to remove them with the PostgreSQL function TRIM(). More specifically, I am running:

我有一列eventDate包含尾随空格。我正在尝试使用 PostgreSQL 函数删除它们TRIM()。更具体地说,我正在运行:

SELECT TRIM(both ' ' from eventDate) 
FROM EventDates;

However, the trailing spaces don't go away. Furthermore, when I try and trim another character from the date (such as a number), it doesn't trim either. If I'm reading the manualcorrectly this should work. Any thoughts?

但是,尾随空格不会消失。此外,当我尝试从日期中修剪另一个字符(例如数字)时,它也不会修剪。如果我正确阅读手册,这应该有效。有什么想法吗?

回答by Erwin Brandstetter

There are many different invisible characters. Many of them have the property WSpace=Y("whitespace") in Unicode. But some special characters are not considered "whitespace" and still have no visible representation. The excellent Wikipedia articles about space (punctuation)and whitespace charactersshould give you an idea.

有许多不同的隐形字符。它们中的许多都具有WSpace=YUnicode 中的属性(“空白”)。但是一些特殊字符不被视为“空白”,仍然没有可见的表示。关于空格(标点符号)空白字符的优秀维基百科文章应该会给你一个想法。

<rant>Unicode sucks in this regard: introducing lots of exotic characters that mainly serve to confuse people.</rant>

<rant>Unicode 在这方面很糟糕:引入了许多主要用来迷惑人们的奇异字符。</rant>

The standard SQL trim()functionby default only trims the basic Latin space character (Unicode: U+0020 / ASCII 32). Same with the rtrim()and ltrim()variants. Your call also only targets that particular character.

trim()默认情况下,标准 SQL函数仅修剪基本的拉丁空格字符(Unicode:U+0020 / ASCII 32)。与rtrim()ltrim()变体相同。您的呼叫也仅针对该特定角色。

Use regular expressions with regexp_replace()instead.

使用正则表达式regexp_replace()代替。

Trailing

尾随

To remove all trailing white space(but not white space insidethe string):

要删除所有尾随空格(但不是字符串的空格):

SELECT regexp_replace(eventdate, '\s+$', '') FROM eventdates;

The regular expression explained:
\s.. regular expression class shorthand for [[:space:]]
    - which is the set of white-space characters - see limitations below
+.. 1 or more consecutive matches
$.. end of string

正则表达式解释:
\s.. 正则表达式类简写[[:space:]]
    - 这是一组空白字符 - 请参阅下面的限制
+.. 1 个或多个连续匹配
$.. 字符串结尾

Demo:

演示:

SELECT regexp_replace('inner white   ', '\s+$', '') || '|'

Returns:

返回:

inner white|

Yes, that's a singlebackslash (\). Details in this related answer.

是的,这是一个反斜线(\)。此相关答案中的详细信息。

Leading

领导

To remove all leading white space(but not white space inside the string):

要删除所有前导空格(但不是字符串内的空格):

regexp_replace(eventdate, '^\s+', '')

^.. start of string

^.. 字符串开头

Both

两个都

To remove both, you can chain above function calls:

要删除两者,您可以链接上面的函数调用:

regexp_replace(regexp_replace(eventdate, '^\s+', ''), '\s+$', '')

Or you can combine both in a single call with two branches.
Add 'g'as 4th parameter to replace all matches, not just the first:

或者,您可以将两者结合在一个呼叫中与两个分支
添加'g'为第 4 个参数以替换所有匹配项,而不仅仅是第一个:

regexp_replace(eventdate, '^\s+|\s+$', '', 'g')

But that should typically be faster with substring():

但这通常应该更快substring()

substring(eventdate, '\S(?:.*\S)*')

\S.. everything butwhite space
(?:re)Non-capturing set of parentheses
.*.. any string of 0-n characters

\S..空格以外的所有内容非捕获括号集.. 任何 0-n 个字符的字符串
(?:re)
.*

Or one of these:

或其中之一:

substring(eventdate, '^\s*(.*\S)')
substring(eventdate, '(\S.*\S)')

(re).. Capturing set of parentheses

(re)..捕获括号集

Effectively takes the first non-whitespace character and everything up to the last non-whitespace character if available.

有效地采用第一个非空白字符以及最后一个非空白字符(如果可用)的所有内容。

Whitespace?

空白?

There are a few more related characters which are not classified as "whitespace" in Unicode- so not contained in the character class [[:space:]].

还有一些相关的字符在 Unicode 中没有被归类为“空白”——所以不包含在字符类中[[:space:]]

These print as invisible glyphs in pgAdmin for me: "mongolian vowel", "zero width space", "zero width non-joiner", "zero width joiner":

对于我来说,这些在 pgAdmin 中打印为不可见的字形:“蒙古元音”、“零宽度空间”、“零宽度非连接器”、“零宽度连接器”:

SELECT E'\u180e', E'\u200B', E'\u200C', E'\u200D';

'?' | '?' | '?' | '?'

Two more, printing as visibleglyphs in pgAdmin, but invisible in my browser: "word joiner", "zero width non-breaking space":

还有两个,在 pgAdmin 中打印为可见字形,但在我的浏览器中不可见:“word joiner”、“零宽度不间断空格”:

SELECT E'\u2060', E'\uFEFF';
'?' | ''

Ultimately, whether characters are rendered invisible or not also depends on the font used for display.

最终,字符是否呈现为不可见还取决于用于显示的字体。

To remove all of theseas well, replace '\s'with '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]'or '[\s?????]'(note trailing invisible characters!).
Example, instead of:

要删除所有这些,请替换'\s''[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]''[\s?????]'(注意尾随的不可见字符!)。
示例,而不是:

regexp_replace(eventdate, '\s+$', '')

use:

用:

regexp_replace(eventdate, '[\s\u180e\u200B\u200C\u200D\u2060\uFEFF]+$', '')

or:

或者:

regexp_replace(eventdate, '[\s?????]+$', '')  -- note invisible characters

Limitations

限制

There is also the Posix character class [[:graph:]]supposed to represent "visible characters". Example:

还有Posix 字符类[[:graph:]]应该表示“可见字符”。例子:

substring(eventdate, '([[:graph:]].*[[:graph:]])')

It works reliably for ASCII characters in every setup (where it boils down to [\x21-\x7E]), but beyond that you currently (incl. pg 10) depend on information provided by the underlying OS (to define ctype) and possibly locale settings.

它在每个设置中都对 ASCII 字符可靠地工作(归结为[\x21-\x7E]),但除此之外,您目前(包括第 10 页)依赖于底层操作系统提供的信息(定义ctype)和可能的区域设置。

Strictly speaking, that's the case for everyreference to a character class, but there seems to be more disagreement with the less commonly used ones like graph. But you may have to add more characters to the character class [[:space:]](shorthand \s) to catch all whitespace characters. Like: \u2007, \u202fand \u00a0seem to also be missing for @XiCoN JFS.

严格来说,每个对字符类的引用都是如此,但似乎与不太常用的类似graph 的分歧更大。但是您可能需要向字符类[[:space:]](简写\s)添加更多字符以捕获所有空白字符。像:\u2007\u202f并且\u00a0似乎也缺少 @XiCoN JFS

The manual:

手册:

Within a bracket expression, the name of a character class enclosed in [:and :]stands for the list of all characters belonging to that class. Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. These stand for the character classes defined in ctype. A locale can provide others.

在方括号表达式中,包含在[:和中的字符类的名称 :]代表属于该类的所有字符的列表。标准字符类名称为:alnumalphablankcntrldigitgraphlowerprintpunctspaceupperxdigit。这些代表ctype 中定义字符类。语言环境可以提供其他语言环境。

Bold emphasis mine.

大胆强调我的。

Also note this limitation that was fixed with Postgres 10:

另请注意Postgres 10修复的此限制:

Fix regular expressions' character class handling for large character codes, particularly Unicode characters above U+7FF(Tom Lane)

Previously, such characters were never recognized as belonging to locale-dependent character classes such as [[:alpha:]].

修复正则表达式对大字符代码的字符类处理,尤其是上面的 Unicode 字符U+7FF(Tom Lane)

以前,此类字符从未被识别为属于依赖于语言环境的字符类,例如[[:alpha:]].

回答by ArthurChamz

It should work the way you're handling it, but it's hard to say without knowing the specific string.

它应该按照您处理它的方式工作,但是在不知道特定字符串的情况下很难说。

If you're only trimming leading spaces, you might want to use the more concise form:

如果您只是修剪前导空格,则可能需要使用更简洁的形式:

SELECT RTRIM(eventDate) 
FROM EventDates;

This is a little testto show you that it works. Tell us if it works out!

这是一个小测试,向您展示它的工作原理。告诉我们它是否有效!

回答by Cody Caughlan

If your whitespace is more than just the spacemeta value than you will need to use regexp_replace:

如果您的空格不仅仅是space元值,您将需要使用regexp_replace

 SELECT '(' || REGEXP_REPLACE(eventDate, E'[[:space:]]', '', 'g') || ')' 
 FROM EventDates;

In the above example I am bounding the return value in (and )just so you can easily seethat the regex replace is working in a psql prompt. So you'll want to remove those in your code.

在上面的示例中,我将返回值限定在(和 中),这样您就可以很容易地看到正则表达式替换在 psql 提示符下工作。因此,您需要删除代码中的那些。

回答by devosystem sarl

SELECT  replace(('       devo    system      ') ,' ','');

It gives: devosystem

它给: devosystem