获取 URL 的二级域 (java)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1923815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:32:42  来源:igfitidea点击:

Get the second level domain of an URL (java)

javaurl

提问by Richard H

I am wondering if there is a parser or library in java for extracting the second level domain (SLD) in an URL - or failing that an algo or regex for doing the same. For example:

我想知道 java 中是否有用于提取 URL 中的二级域 (SLD) 的解析器或库 - 或者无法通过算法或正则表达式执行相同的操作。例如:

URI uri = new URI("http://www.mydomain.ltd.uk/blah/some/page.html");

String host = uri.getHost();

System.out.println(host);

which prints:

打印:

mydomain.ltd.uk

Now what I'd like to do is robustly identify the SLD ("ltd.uk") component. Any ideas?

现在我想做的是稳健地识别 SLD (“ltd.uk”) 组件。有任何想法吗?

Edit:I'm ideally looking for a general solution, so I'd match ".uk" in "police.uk", ".co.uk" in "bbc.co.uk" and ".com" in "amazon.com".

编辑:理想情况下,我正在寻找通用解决方案,因此我会匹配“police.uk”中的“.uk”、“bbc.co.uk”中的“.co.uk”和“amazon”中的“.com” .com”。

Thanks

谢谢

采纳答案by ZZ Coder

Don't know your purpose but Second-Level Domain may not mean much to you. You probably need to find public suffixand the domain right below it is what you are looking for.

不知道你的目的,但二级域对你来说可能意义不大。您可能需要找到公共后缀,而它正下方的域正是您要查找的。

Apache Http Component (HttpClient 4) comes with classes to handle this,

Apache Http 组件 (HttpClient 4) 带有处理此问题的类,

org.apache.http.impl.cookie.PublicSuffixFilter
org.apache.http.impl.cookie.PublicSuffixListParser

You need to download the public suffix list from here,

您需要从这里下载公共后缀列表,

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

回答by user85155

After reeading everything here, the correct solution should be (with guava)

在这里阅读所有内容后,正确的解决方案应该是(使用番石榴)

InternetDomainName.from(uriHost).topPrivateDomain().toString();

InternetDomainName.from(uriHost).topPrivateDomain().toString();

errors when using Guava to get the private domain name

使用Guava获取私有域名时出错

回答by simbo1905

After looking at these answers and not being satisfied by them I used the class com.google.common.net.InternetDomainNameto subtract the public parts of a domain name from all the parts:

在查看了这些答案并且对它们不满意后,我使用该类com.google.common.net.InternetDomainName从所有部分中减去了域名的公共部分:

Set<String> nonePublicDomainParts(String uriHost) {
    InternetDomainName fullDomainName = InternetDomainName.from(uriHost);
    InternetDomainName publicDomainName = fullDomainName.publicSuffix();
    Set<String> nonePublicParts = new HashSet<String>(fullDomainName.parts());
    nonePublicParts.removeAll(publicDomainName.parts());
    return nonePublicParts;
}

That class is on maven in the guava library:

该类在 guava 库中的 maven 上:

    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>10.0.1</version>
        <scope>compile</scope>
    </dependency>

Internally this class is using a TldPatterns.class which is package private and has the list of top level domains baked into it.

这个类在内部使用了一个 TldPatterns.class,它是一个包私有的并且包含了顶级域的列表。

Interestingly, if you look at that classes source at the link below it explicitly lists "police.uk" as a private domain name. This is correct as police.uk is a private domain controlled by the police; else criminals.police.uk will be emailing you asking for your credit card details in relation to their ongoing investigations into card fraud ;)

有趣的是,如果您在下面的链接中查看该类源,它会明确将“police.uk”列为私有域名。这是正确的,因为police.uk 是由警方控制的私有域;否则,criminals.police.uk 将通过电子邮件向您发送与他们正在进行的信用卡欺诈调查相关的信用卡详细信息;)

http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/net/TldPatterns.java?spec=svn8c3cc7e67132f8dcaae4bd214736a8ddf6611769&r=8c3cc7e67132f8dcaae4bd214736a8ddf6611769

http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/net/TldPatterns.java?spec=svn8c3cc7e67132f8dcaae4bd214736a8ddf6611769&r=8c33cc768df6611769&r=8c33cc768d61cae6a8d61c6a78d61cae6a8d71cae

回答by Iain

The selected answer is the best approach. For those of you that don't want to code it, here is how I have done it.

选择的答案是最好的方法。对于那些不想编码的人,这里是我如何做到的。

Firstly, either I don't understand org.apache.http.impl.cookie.PublicSuffixFilter, or there is a bug in it.

首先,要么我不了解 org.apache.http.impl.cookie.PublicSuffixFilter,要么其中存在错误。

Basically if you pass in google.com it correctly returns false. If you pass in google.com.au it incorrectly returns true. The bug is in the code that applies patterns, e.g. *.au.

基本上,如果您传入 google.com,它会正确返回 false。如果您传入 google.com.au,它会错误地返回 true。错误存在于应用模式的代码中,例如 *.au。

Here is the checker code based on org.apache.http.impl.cookie.PublicSuffixFilter:

下面是基于 org.apache.http.impl.cookie.PublicSuffixFilter 的检查器代码:

public class TopLevelDomainChecker  {
    private Set<String> exceptions;
    private Set<String> suffixes;

    public void setPublicSuffixes(Collection<String> suffixes) {
        this.suffixes = new HashSet<String>(suffixes);
    }
    public void setExceptions(Collection<String> exceptions) {
        this.exceptions = new HashSet<String>(exceptions);
    }

    /**
     * Checks if the domain is a TLD.
     * @param domain
     * @return
     */
    public boolean isTLD(String domain) {
        if (domain.startsWith(".")) 
            domain = domain.substring(1);

        // An exception rule takes priority over any other matching rule.
        // Exceptions are ones that are not a TLD, but would match a pattern rule
        // e.g. bl.uk is not a TLD, but the rule *.uk means it is. Hence there is an exception rule
        // stating that bl.uk is not a TLD. 
        if (this.exceptions != null && this.exceptions.contains(domain)) 
            return false;


        if (this.suffixes == null) 
            return false;

        if (this.suffixes.contains(domain)) 
            return true;

        // Try patterns. ie *.jp means that boo.jp is a TLD
        int nextdot = domain.indexOf('.');
        if (nextdot == -1)
            return false;
        domain = "*" + domain.substring(nextdot);
        if (this.suffixes.contains(domain)) 
            return true;

        return false;
    }


    public String extractSLD(String domain)
    {
        String last = domain;
        boolean anySLD = false;
        do
        {
            if (isTLD(domain))
            {
                if (anySLD)
                    return last;
                else
                    return "";
            }
            anySLD = true;
            last = domain;
            int nextDot = domain.indexOf(".");
            if (nextDot == -1)
                return "";
            domain = domain.substring(nextDot+1);
        } while (domain.length() > 0);
        return "";
    }
}

And the parser. I renamed it.

和解析器。我改名了。

/**
 * Parses the list from <a href="http://publicsuffix.org/">publicsuffix.org
 * Copied from http://svn.apache.org/repos/asf/httpcomponents/httpclient/trunk/httpclient/src/main/java/org/apache/http/impl/cookie/PublicSuffixListParser.java
 */
public class TopLevelDomainParser {
    private static final int MAX_LINE_LEN = 256;
    private final TopLevelDomainChecker filter;

    TopLevelDomainParser(TopLevelDomainChecker filter) {
        this.filter = filter;
    }
    public void parse(Reader list) throws IOException {
        Collection<String> rules = new ArrayList();
        Collection<String> exceptions = new ArrayList();
        BufferedReader r = new BufferedReader(list);
        StringBuilder sb = new StringBuilder(256);
        boolean more = true;
        while (more) {
            more = readLine(r, sb);
            String line = sb.toString();
            if (line.length() == 0) continue;
            if (line.startsWith("//")) continue; //entire lines can also be commented using //
            if (line.startsWith(".")) line = line.substring(1); // A leading dot is optional
            // An exclamation mark (!) at the start of a rule marks an exception to a previous wildcard rule
            boolean isException = line.startsWith("!"); 
            if (isException) line = line.substring(1);

            if (isException) {
                exceptions.add(line);
            } else {
                rules.add(line);
            }
        }

        filter.setPublicSuffixes(rules);
        filter.setExceptions(exceptions);
    }
    private boolean readLine(Reader r, StringBuilder sb) throws IOException {
        sb.setLength(0);
        int b;
        boolean hitWhitespace = false;
        while ((b = r.read()) != -1) {
            char c = (char) b;
            if (c == '\n') break;
            // Each line is only read up to the first whitespace
            if (Character.isWhitespace(c)) hitWhitespace = true;
            if (!hitWhitespace) sb.append(c);
            if (sb.length() > MAX_LINE_LEN) throw new IOException("Line too long"); // prevent excess memory usage
        }
        return (b != -1);
    }
}

And finally, how to use it

最后,如何使用它

    FileReader fr = new FileReader("effective_tld_names.dat.txt");
    TopLevelDomainChecker checker = new TopLevelDomainChecker();
    TopLevelDomainParser parser = new TopLevelDomainParser(checker);
    parser.parse(fr);
    boolean result;
    result = checker.isTLD("com"); // true
    result = checker.isTLD("com.au"); // true
    result = checker.isTLD("ltd.uk"); // true
    result = checker.isTLD("google.com"); // false
    result = checker.isTLD("google.com.au"); // false
    result = checker.isTLD("metro.tokyo.jp"); // false
    String sld;
    sld = checker.extractSLD("com"); // ""
    sld = checker.extractSLD("com.au"); // ""
    sld = checker.extractSLD("google.com"); // "google.com"
    sld = checker.extractSLD("google.com.au"); // "google.com.au"
    sld = checker.extractSLD("www.google.com.au"); // "google.com.au"
    sld = checker.extractSLD("www.google.com"); // "google.com"
    sld = checker.extractSLD("foo.bar.hokkaido.jp"); // "foo.bar.hokkaido.jp"
    sld = checker.extractSLD("moo.foo.bar.hokkaido.jp"); // "foo.bar.hokkaido.jp"

回答by edelwater

  1. the mentioned list + reading the wikipedia updates gives a 98% correct TLD list
  2. going yourself through http://www.iana.org/domains/root/db/and click each nic and see the latest news gives you the other 2% (like .com.aq and .gov.an)
  3. unfortunately large "free webspace" providers are another thing to take into account e.g. the countless *.blogspot.com domains, if you download the alexa top 100.000 (free csv file) you can at least get a good overview of the most used of these that should get you for a certain percentage covered for these domains (e.g. when comparing alexa rating with stumbleupon pageviews with delicious bookmarks) (alexa sometimes only takes the topdomain while delicious really md5's every url, so 1 alexa --> multiple delicious md5 hashes
  4. apart from that sometimes in the case of twitter, that what goes after the / is also of importance if you are looking for uniqueness to rate something.
  1. 提到的列表 + 阅读维基百科更新给出了 98% 正确的 TLD 列表
  2. 自己浏览http://www.iana.org/domains/root/db/并点击每个 nic 并查看最新消息给你另外 2%(如 .com.aq 和 .gov.an)
  3. 不幸的是,大型“免费网络空间”提供商是另一件需要考虑的事情,例如无数 *.blogspot.com 域,如果您下载 alexa top 100.000(免费 csv 文件),您至少可以很好地了解其中最常用的这应该可以让您获得这些域的一定百分比覆盖(例如,在将 alexa 评级与带有美味书签的 stumbleupon 综合浏览量进行比较时)(alexa 有时只使用顶级域,而美味真的是 md5 的每个 url,所以 1 个 alexa --> 多个美味的 md5 哈希
  4. 除了有时在 twitter 的情况下,如果您正在寻找独特性来评价某事,那么 / 之后的内容也很重要。

Here is a list of the Alexa top 40.000 when the real TLD's are filtered out to give you a feeling: (which means Alexa does NOT count the rating together for the domain for the following) :

以下是 Alexa 前 40.000 的列表,当真实 TLD 被过滤掉以给您一种感觉时:(这意味着 Alexa 不会将域的评级一起计入以下内容):

bp.blogspot.com---espn.go.com---files.wordpress.com---abcnews.go.com---disney.go.com---troktiko.blogspot.com---en.wordpress.com---api.ning.com---abc.go.com---220.181.38.82---213.174.154.20---abclocal.go.com---feedproxy.google.com/~r---forums.wordpress.com---googleblog.blogspot.com---1.cnm999.com/user/10008---213.174.143.196---92.42.51.201---googlewebmastercentral.blogspot.com---myespn.go.com---213.174.143.197---61.132.221.146---support.wordpress.com---dashboard.wordpress.com---sethgodin.typepad.com---paygo.17zhifu.com/user/10005---go2.wordpress.com---1.1.1.1---movies.go.com---home.comcast.net---googlesystem.blogspot.com---abcfamily.go.com---home.spaces.live.com---196.1.237.210---kaixin001.com/~record---xhamster.com/user/video---gold-oil-commodity.blogspot.com---journeyplanner.tfl.gov.uk/user/XSLT_TRIP_REQUEST2---206.108.48.238---blog.wordpress.com---67.220.92.21---183.101.80.130---211.94.190.80---youtube-global.blogspot.com---uta-net.com/user/phplib---cinema3satu.blogspot.com---119.147.41.16---sites.google.com/site/sites---kk.iij4u.or.jp/~dyo---220.181.6.19---toontown.go.com---signup.wordpress.com---thesartorialist.blogspot.com---analytics.blogspot.com---ss.iij4u.or.jp/~ceh2---67.220.92.23---gmailblog.blogspot.com---183.99.121.86---vgorode.ru/user/create---61.132.216.243---217.175.53.72---labnol.blogspot.com---adsense.blogspot.com---subscribe.wordpress.com---fimotro.blogspot.com---creators.ning.com---sarkari-naukri.blogspot.com---search.wordpress.com---orange-hiyoko.blogspot.com---cashewmaniakpop.wordpress.com---pixiehollow.go.com---adwords.blogspot.com---202.53.226.102---lorelle.wordpress.com---homestead.com/~site---multiply.com/user/signout---221.231.148.249---183.101.80.77---windowsliveintro.spaces.live.com---124.228.254.234---streaming-web.blogspot.com---id.tianya.cn/user/message---familyfun.go.com---tro-ma-ktiko.blogspot.com---about.ning.com---paygo.17zhifu.com/user/10020---tututina.blogspot.com---toolserver.org/~geohack---superjob.ru/user/resume---ejobs.ro/user/locuri-de-munca---gnula.blogspot.com---alles.or.jp/~uir---chiark.greenend.org.uk/~sgtatham---woork.blogspot.com---88.208.32.218---webstreamingmania.blogspot.com---spaces.live.com---youtube.com/user/RayWilliamJohnson---cloob.com/user/login---asstr.org/~Kristen---getclicky.com/user/login---guesshermuff.blogspot.com---211.98.70.195---222.73.105.196---pp.iij4u.or.jp/~taakii---unsoloclic.blogspot.com---photoshopdisasters.blogspot.com---218.83.161.253---217.16.18.163---217.16.18.207---217.16.28.104---222.73.105.210---youtube.com/user/OldSpice---hubpages.com/user/new---pelisdvdripdd.blogspot.com---95.143.193.60---es.wordpress.com---217.16.18.206---61.147.116.146---damncoolpics.blogspot.com---family.go.com---81.176.235.162---gutteruncensorednewsr.blogspot.com---terselubung.blogspot.com---faisalardhy.blogspot.com---67.220.92.14---goodreads.com/user/show---116.228.55.34---profile.typepad.com---kaixin001.com/~truth---linkbuildersassociated.ning.com---nicotto.jp/user/mypage---ritemail.blogspot.com---hyperboleandahalf.blogspot.com---carscoop.blogspot.com---tubemogul.com/user/dash---press-gr.blogspot.com---81.176.235.164---soapnet.go.com---208.98.30.69---trelokouneli.blogspot.com---help.ning.com---id.tianya.cn/user/register---slovari.yandex.ru/~%D0%BA%D0%BD%D0%B8%D0%B3%D0%B8---printable-coupons.blogspot.com---unic77.blogspot.com---globaleconomicanalysis.blogspot.com---183.101.80.68---221.194.33.60---doujin-games88.blogspot.com---magaseek.com/user/SearchProducts---files.posterous.com---wwwnew.splinder.com---kolom-tutorial.blogspot.com---strobist.blogspot.com---67.21.91.73---needanarticle.com/user/activity---forum.moe.gov.om/~moeoman---milasdaydreams.blogspot.com---88.208.17.189---67.220.92.22---115.238.100.211---nonews-news.blogspot.com---testosterona.blog.br---nn.iij4u.or.jp/~has---cs.tut.fi/~jkorpela---youtube.com/user/oldspice---67.159.53.25---taxalia.blogspot.com---208.98.30.70---filmesporno.blog.br---alles-schallundrauch.blogspot.com---vatera.hu/user/account---78.140.136.182---us.my.alibaba.com/user/join---stores.homestead.com---pes2008editing.blogspot.com---ocn.ne.jp/~matrix---adweek.blogs.com---115.238.55.94---markjaquith.wordpress.com---k3.dion.ne.jp/~dreamlov---38.99.186.222---film.tv.it---android-developers.blogspot.com---217.218.110.147---kadokado.com/user/login---bollyvideolinks4u.blogspot.com---sookyeong.wordpress.com---87.101.230.11---livecodes.blogspot.com---67.220.91.19---homepage2.nifty.com/bustered---pp.iij4u.or.jp/~manga100---110.173.49.202---erogamescape.dyndns.org/~ap2---cs.berkeley.edu/~lorch---cakewrecks.blogspot.com---59.106.117.185---119.75.213.61---id.wordpress.com---de.wordpress.com---telefilmdblink.blogspot.com---61.139.105.138---multiply.com/user/join---programseo.blogspot.com---collectivebias.ning.com---bablorub.blogspot.com---thinkexist.com/user/personalAccount---us.my.alibaba.com/user/sign---66.70.56.90---getsarkari-naukri.blogspot.com---59.106.117.183---productreviewplace.ning.com---support.weebly.com---kaixin001.com/~lucky---football-russia.blogspot.com---magaseek.com/user/ItemDetail---polprav.blogspot.com---atlasshrugs2000.typepad.com---jpn-manga.blogspot.com---88.208.32.219---google-latlong.blogspot.com---59.106.117.188---erogamescape.ddo.jp/~ap2---218.87.32.245---watchhorrormovies.blogspot.com---sarotiko.blogspot.com---googlewebmastercentral-de.blogspot.com---colmeia.blog.br---us.my.alibaba.com/user/webatm---220.170.79.109---darkville.blogspot.com---youtube.com/user/PiMPDailyDose---disneymovierewards.go.com---fukuoka.lg.jp---61.147.115.16---iisc.ernet.in---youtube.com/user/HuskyStarcraft---202.108.212.211---homepage3.nifty.com/otakarando---94.77.215.37---pitchit.ning.com---59.106.117.186---thestar.blogs.com---1.254.254.254---piratesonline.go.com---animedblink.blogspot.com---137.32.44.152---eurus.dti.ne.jp/~yoneyama---state.la.us---lastminute.is.it---bangpai.taobao.com/user/groups---csse.monash.edu.au/~jwb---jquery-howto.blogspot.com---sakura.ne.jp/~moesino---users.skynet.be/mgueury---saitama.lg.jp---portaldasfinancas.gov.pt---bnonline.fi.cr---135.125.60.11---zhuhai.gd.cn---kuna.net.kw---59.175.213.77---58.218.199.7---multiply.com/user/signin---youtube.com/user/HDstarcraft---blinklist.com/user/join---us.my.alibaba.com/user/company---jptwitterhelp.blogspot.com---67.220.92.017---88.208.17.51---youtube.com/user/GoogleWebmasterHelp---208.53.156.229---filmdblink.blogspot.com---blinklist.com/user/signup---3arbtop.blogspot.com---attivissimo.blogspot.com---onlinemovie12.blogspot.com---98.126.189.86---mytvsource.blogspot.com---blinklist.com/user/login---googlejapan.blogspot.com---76.73.65.166---gutteruncensorednewsb.blogspot.com---issuu.com/user/upload---86.51.174.18---88.208.17.120---profile.china.alibaba.com/user/admin---jntuworldportal.blogspot.com---sz.js.cn---disneymovieclub.go.com---a1.com.mk---dd.iij4u.or.jp/~madonna---rr.iij4u.or.jp/~plasma---mlmlaunchformula.ning.com---112.78.7.151---blogdelatele.blogspot.com---googlemobile.blogspot.com---78.109.199.240---wsu.edu/~brians---internapoli-city.blogspot.com---hh.iij4u.or.jp/~dmt---kaixin001.com/~house---61.155.11.14---youtube.com/user/SHAYTARDS---turbobit.net/user/files---qjy168.com/user/do---hubpages.com/user/finished---upload2.dyndns.org---f32.aaa.livedoor.jp/~azusa---naruto-spoilers.blogspot.com---205.209.140.195---193.227.20.21---adsenseforfeeds.blogspot.com---group.ameba.jp/user/groups---

bp.blogspot.com---espn.go.com---files.wordpress.com---abcnews.go.com---disney.go.com---troktiko.blogspot.com---en。 wordpress.com---api.ning.com---abc.go.com---220.181.38.82---213.174.154.20---abclocal.go.com---feedproxy.google.com/~r ---forums.wordpress.com---googleblog.blogspot.com---1.cnm999.com/user/10008---213.174.143.196---92.42.51.201---googlewebmastercentral.blogspot.com-- -myespn.go.com---213.174.143.197---61.132.221.146---support.wordpress.com---dashboard.wordpress.com---sethgodin.typepad.com---paygo.17zhifu.com /user/10005---go2.wordpress.com---1.1.1.1---movies.go.com---home.comcast.net---googlesystem.blogspot.com---abcfamily.go.com ---home.spaces.live.com---196.1.237.210---kaixin001.com/~record---xhamster.com/user/video---gold-oil-commodity.blogspot.com--- travelplanner.tfl.gov.uk/user/XSLT_TRIP_REQUEST2---206.108.48.238---blog.wordpress。com---67.220.92.21---183.101.80.130---211.94.190.80---youtube-global.blogspot.com---uta-net.com/user/phplib---cinema3satu.blogspot.com- --119.147.41.16---sites.google.com/site/sites---kk.iij4u.or.jp/~dyo---220.181.6.19---toontown.go.com---signup.wordpress .com---thesartorialist.blogspot.com---analytics.blogspot.com---ss.iij4u.or.jp/~ceh2---67.220.92.23---gmailblog.blogspot.com---183.99。 121.86---vgorode.ru/user/create---61.132.216.243---217.175.53.72---labnol.blogspot.com---adsense.blogspot.com---subscribe.wordpress.com--- fimotro.blogspot.com---creators.ning.com---sarkari-naukri.blogspot.com---search.wordpress.com---orange-hiyoko.blogspot.com---cashewmaniakpop.wordpress.com- --pixiehollow.go.com---adwords.blogspot.com---202.53.226.102---lorelle.wordpress.com---homestead.com/~site---multiply.com/user/signout-- -221.231.148.249---183.101.80。77---windowsliveintro.spaces.live.com---124.228.254.234---streaming-web.blogspot.com---id.tianya.cn/user/message---familyfun.go.com--- tro-ma-ktiko.blogspot.com---about.ning.com---paygo.17zhifu.com/user/10020---tututina.blogspot.com---toolserver.org/~geohack---superjob .ru/user/resume---ejobs.ro/user/locuri-de-munca---gnula.blogspot.com---alles.or.jp/~uir---chiark.greenend.org.uk/ ~sgtatham---woork.blogspot.com---88.208.32.218---webstreamingmania.blogspot.com---spaces.live.com---youtube.com/user/RayWilliamJohnson---cloob.com/user /login---asstr.org/~Kristen---getclicky.com/user/login---guesshrmuff.blogspot.com---211.98.70.195---222.73.105.196---pp.iij4u.or. jp/~taakii---unsoloclic.blogspot.com---photoshopdisasters.blogspot.com---218.83.161.253---217.16.18.163---217.16.18.207---217.16.28.104---222.721.10 ---youtube.com/user/OldSpice---hubpages。com/user/new---pelisdvdripdd.blogspot.com---95.143.193.60---es.wordpress.com---217.16.18.206---61.147.116.146---damncoolpics.blogspot.com--- family.go.com---81.176.235.162---gutteruncensorednewsr.blogspot.com---terselubung.blogspot.com---faisalardhy.blogspot.com---67.220.92.14---goodreads.com/user/ show---116.228.55.34---profile.typepad.com---kaixin001.com/~truth---linkbuilders associated.ning.com---nicotto.jp/user/mypage---ritemail.blogspot.com ---hyperboleandahalf.blogspot.com---carscoop.blogspot.com---tubemogul.com/user/dash---press-gr.blogspot.com---81.176.235.164---soapnet.go.com ---208.98.30.69---trelokouneli.blogspot.com---help.ning.com---id.tianya.cn/user/register---slovari.yandex.ru/~%D0%BA%D0 %BD%D0%B8%D0%B3%D0%B8---printable-coupons.blogspot.com---unic77.blogspot.com---globaleconomicanalysis.blogspot.com---183.101.80.68---221.194 .33.60---同人游戏88.blogspot.com---magaseek.com/user/SearchProducts---files.posterous.com---wwwnew.splinder.com---kolom-tutorial.blogspot.com--- strobist.blogspot.com---67.21.91.73---needanarticle.com/user/activity---forum.moe.gov.om/~moeoman---milasdaydreams.blogspot.com---88.208.17.189-- -67.220.92.22---115.238.100.211---nonews-news.blogspot.com---testosterona.blog.br---nn.iij4u.or.jp/~has---cs.tut.fi/ ~jkorpela---youtube.com/user/oldspice---67.159.53.25---taxalia.blogspot.com---208.98.30.70---filmesporno.blog.br---alles-schallundrauch.blogspot.com ---vatera.hu/user/account---78.140.136.182---us.my.alibaba.com/user/join---stores.homestead.com---pes2008editing.blogspot.com---ocn .ne.jp/~matrix---adweek.blogs.com---115.238.55.94---markjaquith.wordpress.com---k3.dion.ne.jp/~dreamlov---38.99.186.222-- -film.tv.it---android-developers.blogspot.com---217。218.110.147---kadokado.com/user/login---bollyvideolinks4u.blogspot.com---sookyeong.wordpress.com---87.101.230.11---livecodes.blogspot.com---67.220.91.19- --homepage2.nifty.com/bustered---pp.iij4u.or.jp/~manga100---110.173.49.202---erogamescape.dyndns.org/~ap2---cs.berkeley.edu/~lorch ---cakewrecks.blogspot.com---59.106.117.185---119.75.213.61---id.wordpress.com---de.wordpress.com---telefilmdblink.blogspot.com---61.139.105.138 ---multiply.com/user/join---programseo.blogspot.com---collectivebias.ning.com---bablorub.blogspot.com---thinkexist.com/user/personalAccount---us.my .alibaba.com/user/sign---66.70.56.90---getsarkari-naukri.blogspot.com---59.106.117.183---productreviewplace.ning.com---support.weebly.com---kaixin001 .com/~lucky---football-russia.blogspot.com---magaseek.com/user/ItemDetail---polprav.blogspot.com---atlasshrugs2000.typepad。com---jpn-manga.blogspot.com---88.208.32.219---google-latlong.blogspot.com---59.106.117.188---erogamescape.ddo.jp/~ap2---218.87.32.245 ---watchhorrormovies.blogspot.com---sarotiko.blogspot.com---googlewebmastercentral-de.blogspot.com---colmeia.blog.br---us.my.alibaba.com/user/webatm-- -220.170.79.109---darkville.blogspot.com---youtube.com/user/PiMPDailyDose---disneymovierewards.go.com---fukuoka.lg.jp---61.147.115.16---iisc.ernet .in---youtube.com/user/HuskyStarcraft---202.108.212.211---homepage3.nifty.com/otakarando---94.77.215.37---pitchit.ning.com---59.106.117.186-- -thestar.blogs.com---1.254.254.254---piratesonline.go.com---animedblink.blogspot.com---137.32.44.152---eurus.dti.ne.jp/~yoneyama--- state.la.us---lastminute.is.it---bangpai.taobao.com/user/groups---csse.monash.edu.au/~jwb---jquery-howto.blogspot.com-- -sakura.ne.jp/~moesino---users.skynet。be/mgueury---saitama.lg.jp---portaldasfinancas.gov.pt---bnonline.fi.cr---135.125.60.11---zhuhai.gd.cn---kuna.net.kw- --59.175.213.77---58.218.199.7---multiply.com/user/signin---youtube.com/user/HDstarcraft---blinklist.com/user/join---us.my.alibaba。 com/user/company---jptwitterhelp.blogspot.com---67.220.92.017---88.208.17.51---youtube.com/user/GoogleWebmasterHelp---208.53.156.229---filmdblink.blogspot.com- --blinklist.com/user/signup---3arbtop.blogspot.com---attivissimo.blogspot.com---onlinemovie12.blogspot.com---98.126.189.86---mytvsource.blogspot.com--- flashlist.com/user/login---googlejapan.blogspot.com---76.73.65.166---gutteruncensorednewsb.blogspot.com---issuu.com/user/upload---86.51.174.18---88.208。 17.120---profile.china.alibaba.com/user/admin---jntuworldportal.blogspot.com---sz.js.cn---disneymovieclub.go.com---a1.com.mk--- dd.iij4u. 或jp/~madonna---rr.iij4u.or.jp/~plasma---mlmlaunchformula.ning.com---112.78.7.151---blogdelatele.blogspot.com---googlemobile.blogspot.com--- 78.109.199.240---wsu.edu/~brians---internapoli-city.blogspot.com---hh.iij4u.or.jp/~dmt---kaixin001.com/~house---61.155.11.14 ---youtube.com/user/SHAYTARDS---turbobit.net/user/files---qjy168.com/user/do---hubpages.com/user/finished---upload2.dyndns.org-- -f32.aaa.livedoor.jp/~azusa---naruto-spoilers.blogspot.com---205.209.140.195---193.227.20.21---adsenseforfeeds.blogspot.com---group.ameba.jp/用户/组---com/user/do---hubpages.com/user/finished---upload2.dyndns.org---f32.aaa.livedoor.jp/~azusa---naruto-spoilers.blogspot.com---205.209 .140.195---193.227.20.21---adsenseforfeeds.blogspot.com---group.ameba.jp/user/groups---com/user/do---hubpages.com/user/finished---upload2.dyndns.org---f32.aaa.livedoor.jp/~azusa---naruto-spoilers.blogspot.com---205.209 .140.195---193.227.20.21---adsenseforfeeds.blogspot.com---group.ameba.jp/user/groups---

回答by Avi Flax

I don't have an answer for your specific case — and Jonathan's comment points out that you should probably refactor your question.

对于您的具体情况,我没有答案——乔纳森的评论指出您可能应该重构您的问题。

Still, I suggest taking a look at the Referenceclass of the Restletproject. It has a ton of useful methods. And since Restlet is Open Source, you wouldn't have to use the entire library — you could download the source and add just that one class to your project.

不过,我还是建议看看Restlet项目的Reference类。它有很多有用的方法。由于 Restlet 是开源的,因此您不必使用整个库——您可以下载源代码并将该类添加到您的项目中。

回答by Mr. Nobody

1.

1.

Method nonePublicDomainParts from simbo1905 contribution should be corrected because of TLD that contain ".", for example "com.ac":

来自 simbo1905 贡献的方法 nonePublicDomainParts 应该更正,因为 TLD 包含".",例如"com.ac"

input: "com.abc.com.ac"

input: "com.abc.com.ac"

output: "abc"

output: "abc"

correct output is "com.abc".

正确的输出是"com.abc".

To get SLDyou may cut TLDfrom a given domain using method publicSuffix().

为了让SLD您可以TLD使用方法从给定的域中删除publicSuffix()

2.

2.

A set should not be used because of domains that contain the same parts, for example:

不应使用集合,因为域包含相同的部分,例如:

input: part1.part2.part1.TLD

input: part1.part2.part1.TLD

output: part1, part2

output: part1, part2

correct output is: part1, part2, part1or in the form part1.part2.part1

正确的输出是:part1, part2, part1或在形式part1.part2.part1

So instead of Set<String>use List<String>.

所以,而不是Set<String>使用List<String>.

回答by joyel george

public static String getTopLevelDomain(String uri) {

InternetDomainName fullDomainName = InternetDomainName.from(uri);
InternetDomainName publicDomainName = fullDomainName.topPrivateDomain();
String topDomain = "";

Iterator<String> it = publicDomainName.parts().iterator();
while(it.hasNext()){
    String part = it.next();
    if(!topDomain.isEmpty())topDomain += ".";
    topDomain += part;
}
return topDomain;
}

Just give the domain, and u will get the top level domain. download jar file from http://code.google.com/p/guava-libraries/

只要给出域名,你就会得到顶级域名。从http://code.google.com/p/guava-libraries/下载 jar 文件

回答by sandyp

Dnspyis another more flexible alternative to the publicsuffixlib.

Dnspy是另一个更灵活的publicsuffixlib替代品。

回答by danben

If you want the second-level domain, you can split the string on "." and take the last two parts. Of course, this assumes you always have a second-level domain that is not specific to the site (since it sounds like that's what you want).

如果你想要二级域名,你可以在“.”上拆分字符串。并取最后两部分。当然,这假设您始终拥有一个并非特定于站点的二级域(因为这听起来像是您想要的)。