Mysql 查询以从 url 中提取域

Question

提问by peppolone

sorry for my english

对不起我的英语不好

i have this query to extract domain from urls

我有这个查询从网址中提取域

SELECT SUBSTRING(LEFT(url, LOCATE('/', url, 8) - 1), 8) AS domain...

it works only when the url is like www.google.com/something

仅当网址类似于 www.google.com/something 时才有效

it doesn't work with urls like

它不适用于像这样的网址

www.google.it (without trailing slash)

www.google.it（不带斜杠）

www.google.it/abc/xzy/ (permalink)

www.google.it/abc/xzy/（永久链接）

do you know how to solve it?

你知道如何解决吗？

Answer 1

回答by LostNomad311

I had to combine some of the previous answers , plus a little more hackery for my data set . This is what works for me , it returns the domain and any sub-domains:

我不得不结合一些以前的答案，再加上我的数据集的一些hackery。这对我有用，它返回域和任何子域：

SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain

Explanation ( cause non-trivial SQL rarely makes sense ):

解释（导致非平凡的 SQL 很少有意义）：

SUBSTRING_INDEX(target_url, '/', 3)- strips any path if the url has a protocol
SUBSTRING_INDEX(THAT, '://', -1)- strips any protocol from THAT
SUBSTRING_INDEX(THAT, '/', 1)- strips any path from THAT ( if there was no protocol )
SUBSTRING_INDEX(THAT, '?', 1)- strips the query string from THAT ( if there was no path or trailing / )

SUBSTRING_INDEX(target_url, '/', 3)- 如果 url 有
SUBSTRING_INDEX(THAT, '://', -1)协议，则删除
SUBSTRING_INDEX(THAT, '/', 1)任何路径 -从 THAT 中删除任何协议 - 从 THAT 中删除任何路径（如果没有协议）
SUBSTRING_INDEX(THAT, '?', 1)- 从 THAT 中删除查询字符串（如果没有路径或尾随 / ）

Test Cases:

测试用例：

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(target_url, '/', 3), '://', -1), '/', 1), '?', 1) AS domain
FROM ( 
    SELECT       'http://test.com' as target_url 
    UNION SELECT 'https://test.com' 
    UNION SELECT 'http://test.com/one' 
    UNION SELECT 'http://test.com/?huh' 
    UNION SELECT 'http://test.com?http://ouch.foo' 
    UNION SELECT 'test.com' 
    UNION SELECT 'test.com/one'
    UNION SELECT 'test.com/one/two'
    UNION SELECT 'test.com/one/two/three'
    UNION SELECT 'test.com/one/two/three?u=http://maaaaannn'
    UNION SELECT 'http://one.test.com'
    UNION SELECT 'one.test.com/one'
    UNION SELECT 'two.one.test.com/one' ) AS Test;

Results:

结果：

'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'test.com'
'one.test.com'
'one.test.com'
'two.one.test.com'

Answer 2

回答by eville84

remove www., anysubdomain and everything after /:

删除 www., anysubdomain 和 / 之后的所有内容：

SUBSTRING_INDEX((SUBSTRING_INDEX((SUBSTRING_INDEX(url, 'http://', -1)), '/', 1)), '.', -2) as domain

Answer 3

回答by Al Zziwa

All the answers do not seem to work for me. For example, a url that starts variables with a ? fails for some answers. This works for me for all kinds of URLs:

所有的答案似乎都不适合我。例如，一个以 ? 开头的变量的 url。某些答案失败。这对我适用于所有类型的 URL：

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(REPLACE(LOWER(url), 'https://', ''), 'http://', ''), '/', 1), '?', 1) AS domain;

Answer 4

回答by bikram kc

If you want to remove www. along with http://, https:// and /(path) from your domain please do this:

如果你想删除 www. 连同来自您域的 http://、https:// 和 /（路径），请执行以下操作：

SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(document_url, '/', 3), '://', -1), '/', 1), '?', 1),'www.',-1)

Answer 5

回答by JYelton

If you want to find the first occurrence of /why are you passing 8 into LOCATEas the starting position? (I assume to skip slashes after the protocol such as http:// but if you also expect url's without a protocol, consider short ones like cnn.com/page)

如果你想找到第一次出现的/为什么你把 8LOCATE作为起始位置？（我假设在 http:// 等协议后跳过斜杠，但如果您还希望 url 没有协议，请考虑使用像 cnn.com/page 这样的短网址）

IF(LOCATE('/', url) > 0, SUBSTRING(url, 1, LOCATE('/', url) - 1), url) AS domain

Alternatively:

或者：

SUBSTRING_INDEX(url, '/', 1) AS domain

The latter seems easier to me.

后者对我来说似乎更容易。

Answer 6

回答by sudhanshu sirohi

select SUBSTRING_INDEX(SUBSTRING_INDEX(URL, '://', -1),'/',1) as DOMAIN

Answer 7

回答by Den

This one works well on my very messy data:

这个对我非常凌乱的数据很有效：

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(url, '//', '.'), '/', 1), '.', -2) AS Domain

Answer 8

回答by user3252809

Best to use it as it will also capture url like "www.google.co.in"

最好使用它，因为它还会捕获像“www.google.co.in”这样的网址

SELECT replace((replace((SUBSTRING_INDEX(SUBSTRING_INDEX(REPLACE(url, '//', ''), '/', 1), '*', -2)), 'http:','')),'https:','') AS Domain

Answer 9

回答by David Sayre

The query needs to account for no trailing '/'

查询需要考虑没有尾随 '/'

IF( LOCATE('/', replace(url,'http://', '') ) > 0 , SUBSTRING(replace(url,'http://', ''), 1, LOCATE('/', replace(url,'http://', '') ) - 1), replace(url,'http://', '')) AS domain

Answer 10

回答by Brian Leishman

If you're not afraid of installing MySQL extensions (UDFs), then there's a UDF you can use that does exactly this while respecting different top-level domains like "google.com" and "google.co.uk"

如果您不害怕安装 MySQL 扩展 (UDF)，那么您可以使用一个 UDF，它可以在尊重“google.com”和“google.co.uk”等不同顶级域的同时做到这一点

https://github.com/StirlingMarketingGroup/mysql-get-etld-p1

select`get_etld_p1`('http://a.very.complex-domain.co.uk:8080/foo/bar');-- 'complex-domain.co.uk'
select`get_etld_p1`('https://www.bbc.co.uk/');-- 'bbc.co.uk'
select`get_etld_p1`('https://github.com/StirlingMarketingGroup/');-- 'github.com'
select`get_etld_p1`('https://localhost:10000/index');-- 'localhost'
select`get_etld_p1`('android-app://com.google.android.gm');-- 'com.google.android.gm'
select`get_etld_p1`('example.test.domain.com');-- 'domain.com'
select`get_etld_p1`('postgres://user:[email protected]:5432/path?k=v#f');-- 'host.com'
select`get_etld_p1`('exzvk.omsk.so-ups.ru');-- 'so-ups.ru'
select`get_etld_p1`('http://10.64.3.5/data_check/index.php?r=index/rawdatacheck');-- '10.64.3.5'
select`get_etld_p1`('not a domain');-- null

Mysql 查询以从 url 中提取域

提问by peppolone

回答by LostNomad311

回答by eville84

回答by Al Zziwa

回答by bikram kc

回答by JYelton

回答by sudhanshu sirohi

回答by Den

回答by user3252809

回答by David Sayre

回答by Brian Leishman

相关推荐

最近更新

标签

Mysql 查询以从 url 中提取域

提问by peppolone

回答by LostNomad311

回答by eville84

回答by Al Zziwa

回答by bikram kc

回答by JYelton

回答by sudhanshu sirohi

回答by Den

回答by user3252809

回答by David Sayre

回答by Brian Leishman

相关推荐

MySQL InnoDB 数据库还原

MySQL 在不更改表结构的情况下选择表中最后 n 行的最有效方法是什么？

MySQL 性能和变量调整

无法连接到“本地主机”上的 MySQL 服务器 (10061)

相关推荐

最近更新

标签