用于生成 slug 的 Java 代码/库（用于漂亮的 URL）

Question

提问by knorv

Web frameworks such as Rails and Django has built-in support for "slugs" which are used to generate readable and SEO-friendly URLs:

诸如 Rails 和 Django 之类的 Web 框架内置了对“slugs”的支持，用于生成可读且对 SEO 友好的 URL：

A slug string typically contains only of the characters a-z, 0-9and -and can hence be written without URL-escaping (think "foo%20bar").

slug 字符串通常只包含字符a-z,0-9并且-因此可以在没有 URL 转义的情况下写入（想想“foo%20bar”）。

I'm looking for a Java slug function that given any valid Unicode string will return a slug representation (a-z, 0-9and -).

我正在寻找一个 Java slug 函数，该函数给定任何有效的 Unicode 字符串都将返回一个 slug 表示（a-z,0-9和-）。

A trivial slug function would be something along the lines of:

一个微不足道的 slug 函数类似于：

return input.toLowerCase().replaceAll("[^a-z0-9-]", "");

However, this implementation would not handle internationalization and accents (?> e). One way around this would be to enumerate all special cases, but that would not be very elegant. I'm looking for something more well thought out and general.

但是，此实现不会处理国际化和重音 ( ?> e)。解决此问题的一种方法是枚举所有特殊情况，但这不会很优雅。我正在寻找更深思熟虑和更通用的东西。

My question:

我的问题：

What is the most general/practical way to generate Django/Rails type slugs in Java?

在 Java 中生成 Django/Rails 类型 slug 的最通用/最实用的方法是什么？

Answer 1

回答by McDowell

Normalizeyour string using canonical decomposition:

使用规范分解规范您的字符串：

  private static final Pattern NONLATIN = Pattern.compile("[^\w-]");
  private static final Pattern WHITESPACE = Pattern.compile("[\s]");

  public static String toSlug(String input) {
    String nowhitespace = WHITESPACE.matcher(input).replaceAll("-");
    String normalized = Normalizer.normalize(nowhitespace, Form.NFD);
    String slug = NONLATIN.matcher(normalized).replaceAll("");
    return slug.toLowerCase(Locale.ENGLISH);
  }

This is still a fairly naive process, though. It isn't going to do anything for s-sharp (ß - used in German), or any non-Latin-based alphabet (Greek, Cyrillic, CJK, etc).

不过，这仍然是一个相当幼稚的过程。它不会对 s-sharp（ß - 在德语中使用）或任何非拉丁字母（希腊语、西里尔字母、CJK 等）做任何事情。

Be careful when changing the case of a string. Upper and lower case forms are dependent on alphabets. In Turkish, the capitalization of U+0069 (i) is U+0130 (İ), not U+0049 (I) so you risk introducing a non-latin1 character back into your string if you use String.toLowerCase()under a Turkish locale.

更改字符串的大小写时要小心。大小写形式取决于字母表。在土耳其，U + 0069（的资本我）是U + 0130（İ），而不是U + 0049（我）如果你使用这样你的风险引入非latin1的字符回你的字符串String.toLowerCase()土耳其语言环境下。

Answer 2

回答by dtrunk

http://search.maven.org/#search|ga|1|slugify

And here's the GitHub repository to take a look at the code and its usage:

这是 GitHub 存储库，可以查看代码及其用法：

https://github.com/slugify/slugify

Answer 3

回答by Mariano Ruiz

The proposition of McDowel almost works, but in cases like this Hello World !!it returns hello-world--(note the --at the end of the string) instead of hello-world.

McDowel 的命题几乎有效，但在这种情况下，Hello World !!它返回hello-world--（注意--字符串末尾的）而不是hello-world。

A fixed version could be:

固定版本可能是：

private static final Pattern NONLATIN = Pattern.compile("[^\w-]");
private static final Pattern WHITESPACE = Pattern.compile("[\s]");
private static final Pattern EDGESDHASHES = Pattern.compile("(^-|-$)");

public static String toSlug(String input) {
    String nowhitespace = WHITESPACE.matcher(input).replaceAll("-");
    String normalized = Normalizer.normalize(nowhitespace, Normalizer.Form.NFD);
    String slug = NONLATIN.matcher(normalized).replaceAll("");
    slug = EDGESDHASHES.matcher(slug).replaceAll("");
    return slug.toLowerCase(Locale.ENGLISH);
}

Answer 4

回答by Mike Godin

I've extended the answer by @McDowell to include escaping punctuation as hyphens and to remove duplicate and leading/trailing hyphens.

我已经扩展了@McDowell 的答案，将标点符号转义为连字符并删除重复的和前导/尾随的连字符。

  private static final Pattern NONLATIN = Pattern.compile("[^\w_-]");  
  private static final Pattern SEPARATORS = Pattern.compile("[\s\p{Punct}&&[^-]]");  

  public static String makeSlug(String input) {  
    String noseparators = SEPARATORS.matcher(input).replaceAll("-");
    String normalized = Normalizer.normalize(noseparators, Form.NFD);
    String slug = NONLATIN.matcher(normalized).replaceAll("");
    return slug.toLowerCase(Locale.ENGLISH).replaceAll("-{2,}","-").replaceAll("^-|-$","");
  }

Answer 5

回答by Rafael Sanches

reference library, for other languageS: http://www.codecodex.com/wiki/Generate_a_url_slug

其他语言的参考库：http: //www.codecodex.com/wiki/Generate_a_url_slug

用于生成 slug 的 Java 代码/库（用于漂亮的 URL）

提问by knorv

回答by McDowell

回答by dtrunk

回答by Mariano Ruiz

回答by Mike Godin

回答by Rafael Sanches

相关推荐

最近更新

标签

用于生成 slug 的 Java 代码/库（用于漂亮的 URL）

提问by knorv

回答by McDowell

回答by dtrunk

回答by Mariano Ruiz

回答by Mike Godin

回答by Rafael Sanches

相关推荐

使用 Java 读取/写入 linux 管道

java 测量 HttpSession 对象的大小

java 我可以在多个环境中使用单个 war 文件吗？我是不是该？

java POI / Excel：以“相对”方式应用公式

相关推荐

最近更新

标签