Html URL 中的 Unicode 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2742852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 02:54:51  来源:igfitidea点击:

Unicode characters in URLs

htmlurlunicodeutf-8

提问by Pekka

In 2010, would you serve URLs containing UTF-8 characters in a large web portal?

2010 年,您是否会在大型门户网站中提供包含 UTF-8 字符的 URL?

Unicode characters are forbidden as per the RFC on URLs (see here). They would have to be percent encoded to be standards compliant.

根据 URL 上的 RFC(请参阅此处),Unicode 字符是被禁止的。它们必须经过百分比编码才能符合标准。

My main point, though, is serving the unencoded characters for the sole purpose of having nice-looking URLs, so percent encoding is out.

不过,我的主要观点是提供未编码字符的唯一目的是获得漂亮的 URL,因此百分比编码已经过时了。

All major browsers seem to be parsing those URLs okay no matter what the RFC says. My general impression, though, is that it gets very shaky when leaving the domain of web browsers:

无论 RFC 怎么说,所有主要浏览器似乎都可以解析这些 URL。不过,我的总体印象是,离开 Web 浏览器的域时它会变得非常不稳定:

  • URLs getting copy+pasted into text files, E-Mails, even Web sites with a different encoding
  • HTTP Client libraries
  • Exotic browsers, RSS readers
  • URL 被复制并粘贴到文本文件、电子邮件,甚至具有不同编码的网站中
  • HTTP 客户端库
  • 异国情调的浏览器、RSS 阅读器

Is my impression correct that trouble is to be expected here, and thus it's not a practical solution (yet) if you're serving a non-technical audience and it's important that all your links work properly even if quoted and passed on?

我的印象是否正确,这里会出现问题,因此如果您为非技术受众提供服务,这不是一个实用的解决方案(还),并且即使引用和传递,您的所有链接也能正常工作很重要?

Is there some magic way of serving nice-looking URLs in HTML

是否有一些神奇的方式可以在 HTML 中提供漂亮的 URL

http://www.example.com/düsseldorf?neighbourhood=L?rick

that can be copy+pasted with the special characters intact, but work correctly when re-used in older clients?

可以复制+粘贴完整的特殊字符,但在旧客户端中重新使用时可以正常工作吗?

采纳答案by Tgr

Use percent encoding. Modern browsers will take care of display & paste issues and make it human-readable. E. g. http://ko.wikipedia.org/wiki/????:??

使用百分比编码。现代浏览器将处理显示和粘贴问题,并使其易于阅读。例如 http://ko.wikipedia.org/wiki/????:??

Edit:when you copy such an url in Firefox, the clipboard will hold the percent-encoded form (which is usually a good thing), but if you copy only a part of it, it will remain unencoded.

编辑:当您在 Firefox 中复制这样的 url 时,剪贴板将保存百分比编码形式(这通常是一件好事),但如果您只复制其中的一部分,它将保持未编码状态。

回答by bobince

What Tgr said. Background:

Tgr 说的话。背景:

http://www.example.com/düsseldorf?neighbourhood=L?rick

That's not a URI. But it isan IRI.

那不是 URI。但它一个IRI

You can't include an IRI in an HTML4 document; the type of attributes like hrefis defined as URI and not IRI. Some browsers will handle an IRI here anyway, but it's not really a good idea.

您不能在 HTML4 文档中包含 IRI;像这样的属性类型href被定义为 URI 而不是 IRI。无论如何,有些浏览器会在这里处理 IRI,但这并不是一个好主意。

To encode an IRI into a URI, take the path and query parts, UTF-8-encode them then percent-encode the non-ASCII bytes:

要将 IRI 编码为 URI,请获取路径和查询部分,对它们进行 UTF-8 编码,然后对非 ASCII 字节进行百分比编码:

http://www.example.com/d%C3%BCsseldorf?neighbourhood=L%C3%B6rick

If there are non-ASCII characters in the hostname part of the IRI, eg. http://例え.テスト/, they have be encoded using Punycodeinstead.

如果 IRI 的主机名部分中有非 ASCII 字符,例如。http://例え.テスト/,它们已使用Punycode编码。

Now you have a URI. It's an ugly URI. But most browsers will hide that for you: copy and paste it into the address bar or follow it in a link and you'll see it displayed with the original Unicode characters. Wikipedia have been using this for years, eg.:

现在你有了一个 URI。这是一个丑陋的URI。但是大多数浏览器会为您隐藏它:将其复制并粘贴到地址栏中或在链接中关注它,您将看到它显示为原始 Unicode 字符。维基百科多年来一直在使用它,例如:

http://en.wikipedia.org/wiki/?

The one browser whose behaviour is unpredictable and doesn't always display the pretty IRI version is...

一种行为不可预测且并不总是显示漂亮的 IRI 版本的浏览器是......

...well, you know.

……嗯,你知道。

回答by Dean Harding

Depending on your URL scheme, you can make the UTF-8 encoded part "not important". For example, if you look at Stack Overflow URLs, they're of the following form:

根据您的 URL 方案,您可以使 UTF-8 编码部分“不重要”。例如,如果您查看 Stack Overflow URL,它们的形式如下:

http://stackoverflow.com/questions/2742852/unicode-characters-in-urls

However, the server doesn't actually care if you get the part after the identifier wrong, so this also works:

但是,服务器实际上并不关心您是否将标识符后面的部分弄错了,所以这也有效:

http://stackoverflow.com/questions/2742852/これは、これを日本語のテキストです

So if you had a layout like this, then you could potentially use UTF-8 in the part after the identifier and it wouldn't really matter if it got garbled. Of course this probably only works in somewhat specialised circumstances...

因此,如果您有这样的布局,那么您可能会在标识符后面的部分使用 UTF-8,如果出现乱码也无所谓。当然,这可能只适用于某些特殊情况......

回答by Nasser Hadjloo

As all of these comments are true, you should note that as far as ICANNapproved Arabic (Persian) and Chinese characters to be registered as Domain Name, all of the browser-making companies (Microsoft, Mozilla, Apple, etc.) have to support Unicode in URLs without any encoding, and those should be searchable by Google, etc.

由于所有这些评论都是真实的,您应该注意到,只要ICANN批准将阿拉伯语(波斯语)和中文字符注册为域名,所有浏览器制造公司(Microsoft、Mozilla、Apple 等)都必须在没有任何编码的 URL 中支持 Unicode,并且那些应该可以被谷歌搜索等。

So this issue will resolve ASAP.

所以这个问题会尽快解决。

回答by EKons

Use percent-encoded form. Some (mainly old) computers running Windows XP for example do not support Unicode, but rather ISO encodings. That is the reason percent-encoded URLs were invented. Also, if you give a URL printed on paper to a user, containing characters that cannot be easily typed, that user may have a hard time typing it (or just ignore it). Percent-encoded form can even be used in many of the oldest machines that ever existed (although they don't support internet of course).

使用百分比编码形式。例如,一些(主要是旧的)运行 Windows XP 的计算机不支持 Unicode,而是支持 ISO 编码。这就是发明百分比编码 URL 的原因。此外,如果您将打印在纸上的 URL 提供给用户,其中包含无法轻松键入的字符,则该用户可能很难键入它(或只是忽略它)。百分比编码形式甚至可以用于许多曾经存在的最古老的机器(尽管它们当然不支持互联网)。

There is a downside though, as percent-encoded characters are longer than the original ones, thus possibly resulting in really long URLs. But just try to ignore it, or use a URL shortener (I would recommend goo.glin this case, which makes a 13-character long URL). Also, if you don't want to register for a Google account, try bit.ly(bit.ly makes slightly longer URLs, with the length being 14 characters).

但是有一个缺点,因为百分比编码的字符比原始字符长,因此可能导致 URL 非常长。但是尝试忽略它,或者使用 URL 缩短器(在这种情况下我会推荐goo.gl,它可以生成 13 个字符长的 URL)。另外,如果您不想注册 Google 帐户,请尝试bit.ly(bit.ly 制作的 URL 稍长,长度为 14 个字符)。

回答by Peter Manoukian

For me this is the correct way, This just worked:

对我来说,这是正确的方法,这刚刚奏效:

    $linker = rawurldecode("$link");
    <a href="<?php echo $link;?>"   target="_blank"><?php echo $linker ;?></a>

This worked, and now links are displayed properly:

这有效,现在链接正确显示:

http://newspaper.annahar.com/article/121638-????--????-???-??-??????-?????-????-??????-??????-????-??????-?????-????????

http://newspaper.annahar.com/article/121638-????--????-???-??-??????-?????-????- ??????-??????-????-??????-???????-??????????

Link found on:

链接在:

http://www.galeriejaninerubeiz.com/newsite/news

http://www.galeriejaninerubeiz.com/newsite/news