XML、HTML 和 XHTML 文档的有效内容类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2965587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 03:18:31  来源:igfitidea点击:

Valid content-type for XML, HTML and XHTML documents

htmlxmlhttpxhtmlweb-standards

提问by astropanic

What are the correct content-types for XML, HTML and XHTML documents?

XML、HTML 和 XHTML 文档的正确内容类型是什么?

I need to write a simple crawler that only fetches these kinds of files.

我需要编写一个只获取这些类型文件的简单爬虫。

Nowadays http://example.net/index.htmlcan serve for example a JPEG file due to mod_rewrite, so I need to check the content-type from response header and compare it with a list of allowed content-types.

现在http://example.net/index.html由于 mod_rewrite 可以提供例如 JPEG 文件,所以我需要检查响应头中的内容类型并将其与允许的内容类型列表进行比较。

Where can I get such a list from?

我从哪里可以获得这样的清单?

回答by bobince

HTML: text/html, full-stop.

HTML: text/html,句号。

XHTML: application/xhtml+xml, or only if following HTML compatbility guidelines, text/html. See the W3 Media Types Note.

XHTML: application/xhtml+xml,或仅当遵循 HTML 兼容性指南时,text/html. 请参阅 W3媒体类型说明

XML: text/xml, application/xml(RFC 2376).

XML: text/xml, application/xml( RFC 2376)。

There are also many other media types based around XML, for example application/rss+xmlor image/svg+xml. It's a safe bet that any unrecognised but registered ending in +xmlis XML-based. See the IANA listfor registered media types ending in +xml.

还有许多其他基于 XML 的媒体类型,例如application/rss+xmlimage/svg+xml。可以肯定的是,任何无法识别但已注册的结尾+xml都是基于 XML 的。请参阅IANA 列表以了解以 结尾的已注册媒体类型+xml

(For unregistered x-types, all bets are off, but you'd hope +xmlwould be respected.)

(对于未注册的x-类型,所有赌注都已取消,但您希望+xml得到尊重。)