SQL:按电子邮件域名排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1811531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:34:51  来源:igfitidea点击:

SQL: Sorting By Email Domain Name

sqlsortingemailsql-order-bydomain-name

提问by o.k.w

What is the shortest and/or efficient SQL statement to sort a table with a column of email address by it's DOMAIN name fragment?

什么是最短和/或有效的 SQL 语句,通过它的域名称片段对带有电子邮件地址列的表进行排序?

That's essentially ignoring whatever is before "@" in the email addresses and case-insensitive. Let's ignore the internationalized domain names for this one.

这基本上忽略了电子邮件地址中“@”之前的任何内容并且不区分大小写。让我们忽略这个国际化的域名。

Target at: mySQL, MSSQL, Oracle

目标:mySQL、MSSQL、Oracle

Sample data from TABLE1

样本数据来自 TABLE1

id   name           email 
------------------------------------------
 1   John Doe       [email protected]
 2   Jane Doe       [email protected]
 3   Ali Baba       [email protected]
 4   Foo Bar        [email protected]
 5   Tarrack Ocama  [email protected]

Order By Email
SELECT * FROM TABLE1 ORDER BY EMAIL ASC

通过电子邮件订购
SELECT * FROM TABLE1 ORDER BY EMAIL ASC

id   name           email 
------------------------------------------
 3   Ali Baba       [email protected]
 4   Foo Bar        [email protected]
 2   Jane Doe       [email protected]
 1   John Doe       [email protected]
 5   Tarrack Ocama  [email protected]

Order By Domain
SELECT * FROM TABLE1 ORDER BY ?????? ASC

按域排序
SELECT * FROM TABLE1 ORDER BY ?????? ASC

id   name           email 
------------------------------------------
 5   Tarrack Ocama  [email protected]
 3   Ali Baba       [email protected]
 1   John Doe       [email protected]
 2   Jane Doe       [email protected]
 4   Foo Bar        [email protected]

EDIT:
I am not asking for a single SQL statement that will work on all 3 or more SQL engines. Any contribution are welcomed. :)

编辑:
我不是要求一个可以在所有 3 个或更多 SQL 引擎上运行的 SQL 语句。欢迎任何贡献。:)

回答by priyanka.sarkar

Try this

尝试这个

Query(For Sql Server):

查询(对于 Sql Server):

select * from mytbl
order by SUBSTRING(email,(CHARINDEX('@',email)+1),1)

Query(For Oracle):

查询(对于 Oracle):

select * from mytbl
order by substr(email,INSTR(email,'@',1) + 1,1)

Query(for MySQL)

查询(用于 MySQL)

pygorex1 already answered

Output:

输出:

id name email

身姓名电子邮件

5   Tarrack Ocama   [email protected]
3   Ali Baba    [email protected]
1   John Doe    [email protected]
2   Jane Doe    [email protected]
4   Foo Bar [email protected]

回答by leepowers

For MySQL:

对于 MySQL:

select email, SUBSTRING_INDEX(email,'@',-1) AS domain from user order by domain desc;

For case-insensitive:

对于不区分大小写:

select user_id, username, email, LOWER(SUBSTRING_INDEX(email,'@',-1)) AS domain from user order by domain desc;

回答by paxdiablo

If you want this solution to scale at all, you should notbe trying to extract sub-columns. Per-row functions are notoriously slow as the table gets bigger and bigger.

如果你想在所有这些解决方案的规模,你应该试图提取的子栏目。随着表变得越来越大,每行函数的速度是出了名的慢。

The rightthing to do in this case is to move the cost of extraction from select(where it happens a lot) to insert/updatewhere it happens less (in most normal databases). By incurring the cost only on insertand update, you greatly increase the overall efficiency of the database, since that's the onlypoint in time where you need to do it (i.e., it's the only time when the data changes).

在这种情况下,正确的做法是将提取成本从select(发生较多的地方)转移到insert/update发生较少的地方(在大多数普通数据库中)。通过仅招致的成本insertupdate你大大提高数据库的整体效率,因为那是唯一的一次,你需要做的是点(即,它是唯一的时候,数据的变化)。

In order to achieve this, split the email address into two distinct columns in the table, email_userand email_domain). Then you can either split it in your application before insertion/update or use a trigger (or pre-computed columns if your DBMS supports it) in the database to do it automatically.

为了实现这一点,将电子邮件地址拆分为表中两个不同的列,email_user以及email_domain)。然后,您可以在插入/更新之前在您的应用程序中拆分它,或者在数据库中使用触发器(或预先计算的列,如果您的 DBMS 支持它)来自动执行此操作。

Then you sort on email_domainand, when you want the full email address, you use email_name|'@'|email_domain.

然后您进行排序email_domain,当您想要完整的电子邮件地址时,您可以使用email_name|'@'|email_domain.

Alternatively, you can keep the full emailcolumn and use a trigger to duplicate just the domain part in email_domain, then you never need to worry about concatenating the columns to get the full email address.

或者,您可以保留完整的email列并使用触发器仅复制 中的域部分email_domain,这样您就无需担心连接列以获取完整的电子邮件地址。

It's perfectly acceptable to revert from 3NF for performance reasons provided you know what you're doing. In this case, the data in the two columns can't get out of sync simply because the triggers won't allow it. It's a good way to trade disk space (relatively cheap) for performance (we alwayswant more of that).

如果您知道自己在做什么,出于性能原因从 3NF 恢复是完全可以接受的。在这种情况下,两列中的数据不会因为触发器不允许而失去同步。这是用磁盘空间(相对便宜)换取性能(我们总是想要更多)的好方法。

And, if you're the sort that doesn't like reverting from 3NF at all, the email_name/email_domainsolution will fix that.

而且,如果您根本不喜欢从 3NF 恢复,那么email_name/email_domain解决方案将解决这个问题。

This is also assuming you just want to handle email addresses of the form a@b- there are other valid email addresses but I can't recall seeing any of them in the wild for years.

这也是假设您只想处理表单的电子邮件地址a@b- 还有其他有效的电子邮件地址,但我不记得多年来在野外看到过任何一个。

回答by marc_s

For SQL Server, you could add a computed columnto your table with extracts the domain into a separate field. If you persist that column into the table, you can use it like any other field and even put an index on it, to speed things up, if you query by domain name a lot:

对于 SQL Server,您可以向表中添加一个计算列,并将域提取到一个单独的字段中。如果将该列保留到表中,则可以像使用任何其他字段一样使用它,甚至可以在其上放置索引以加快速度,如果您经常按域名查询:

ALTER TABLE Table1
  ADD DomainName AS 
     SUBSTRING(email, CHARINDEX('@', email)+1, 500) PERSISTED

So now your table would have an additional column "DomainName" which contains anything after the "@" sign in your e-mail address.

因此,现在您的表格将有一个附加列“域名”,其中包含您电子邮件地址中“@”符号之后的任何内容。

回答by P Sharma

This will work with Oracle:

这将适用于 Oracle:

select id,name,email,substr(email,instr(email,'@',1)+1) as domain
from table1
order by domain asc

回答by lexu

Assuming you really must cater for MySQL, Oracle and MSSQL .. the most efficient way might be to store the account name and domain name in two separate fields. The you can do your ordering:

假设您确实必须满足 MySQL、Oracle 和 MSSQL 的需求……最有效的方法可能是将帐户名和域名存储在两个单独的字段中。您可以订购:

select id,name,email from table order by name

select id,name,email,account,domain from table order by email

select id,name,email,account,domain from table order by domain,account

as donnie points out, string manipulation functions are non standard .. that is why you will have to keep the data redundant!

正如唐尼指出的那样,字符串操作函数是非标准的......这就是为什么你必须保持数据冗余!

I've added account and domain to the third query, since I seam to recall not all DBMSs will sort a query on a field that isn't in the selected fields.

我已将帐户和域添加到第三个查询中,因为我记得并非所有 DBMS 都会对不在所选字段中的字段进行排序。

回答by zachaysan

For postgres the query is:

对于 postgres,查询是:

SELECT * FROM table
ORDER BY SUBSTRING(email,(position('@' in email) + 1),252)

The value 252is the longest allowed domain (since, the max length of an email is 254including the local part, the @, and the domain.

该值252是允许的最长域(因为电子邮件的最大长度254包括本地部分@、 和域。

See this for more details: What is the maximum length of a valid email address?

有关更多详细信息,请参阅此内容:有效电子邮件地址的最大长度是多少?

回答by eddievan

The original answer for SQL Server didn't work for me....

SQL Server 的原始答案对我不起作用....

Here is a version for SQL Server...

这是 SQL Server 的一个版本...

select SUBSTRING(email,(CHARINDEX('@',email)+1),len(email)), count(*) 
from table_name 
group by SUBSTRING(email,(CHARINDEX('@',email)+1),len(email))
order by count(*) desc

回答by Mr_KeyCode

My suggestion would be (for mysql):

我的建议是(对于 mysql):

SELECT 
    LOWER(email) AS email,
    SUBSTRING_INDEX(email, '@', + 1) AS account,
 REPLACE(SUBSTRING_INDEX(email, '@', -1), CONCAT('.',SUBSTRING_INDEX(email, '.', -1)),'') -- 2nd part of mail - tld.
  AS domain,
    CONCAT('.',SUBSTRING_INDEX(email, '.', -1)) AS tld
FROM
********
ORDER BY domain, email ASC;
然后只需添加一个 WHERE ...

回答by Donnie

MySQL, an intelligent combination of right()and instr()

MySQL,right()instr()的智能组合

SQL Server, right()and patindex()

SQL Server,right()patindex()

Oracle, instr()and substr()

甲骨文,instr()substr()

And, as said by someone else, if you have a decent to high record count, wrapping your email field in functions in you where clause will make it so the RDBMS can't use any index you might have on that column. So, you may want to consider creating a computed column which holds the domain.

而且,正如其他人所说,如果您的记录数相当高,那么将您的电子邮件字段包装在 where 子句中的函数中将使 RDBMS 无法使用您在该列上可能拥有的任何索引。因此,您可能需要考虑创建一个包含域的计算列。