为什么 <br> 是 HTML 元素而不是 HTML 实体?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3488198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 04:08:07  来源:igfitidea点击:

Why is <br> an HTML element rather than an HTML entity?

html

提问by Wabbitseason

Why indeed? Wouldn't something like &br;be more appropriate?

为什么呢?像这样的东西不是&br;更合适吗?

采纳答案by Jon Hanna

An HTML entity reference is, depending on HTML version either an SGML entity or an XML entity (HTML inherits entities from the underlying technology). Entities are a way of inserting chunks of content defined elsewhere into the document.

根据 HTML 版本,HTML 实体引用是 SGML 实体或 XML 实体(HTML 从底层技术继承实体)。实体是一种将在别处定义的内容块插入到文档中的方法。

All HTML entities are single-character entities, and are hence basically the same as character references (technically they are different to character references, but as there are no multi-character entities defined, the distinction has no impact on HTML).

所有 HTML 实体都是单字符实体,因此与字符引用基本相同(从技术上讲,它们与字符引用不同,但由于没有定义多字符实体,这种区别对 HTML 没有影响)。

When an HTML processor sees, for example &mdash;it replaces it with the content of that entity reference with the appropriate entity, based on the section in the DTD that says:

例如&mdash;,当 HTML 处理器看到时,它会根据 DTD 中的部分内容将其替换为该实体引用的内容以及适当的实体:

<!ENTITY mdash   CDATA "&#8212;" -- em dash, U+2014 ISOpub -->

So it replaces the entity reference with the entity &#8212;which is in turn a character reference that gets replaced by the character (U+2014). In reality unless you are doing this with a general-purpose XML or SGML processor that doesn't understand HTML directly, this will really be done in one step.

因此,它将实体引用替换为实体,而实体&#8212;又是一个被字符替换的字符引用(U+2014)。实际上,除非您使用不直接理解 HTML 的通用 XML 或 SGML 处理器来执行此操作,否则这实际上只需一步即可完成。

Now, what would we replace your hypothetical &br; with to cause a line-break to happen? We can't do so with a newline character, or even the lesser known U+2028 LINE SEPARATOR (which semantically in plain text has the same meaning as <br/>in HTML), because they are whitespace characters which are not significant in most HTML code, which is something that you should be grateful for as writing HTML would be much harder if we couldn't format for readability within the source code.

现在,我们将用什么代替您假设的 &br; 导致换行发生?我们不能使用换行符,甚至是鲜为人知的 U+2028 LINE SEPARATOR(在纯文本中的语义与<br/>HTML 中的含义相同),因为它们是在大多数 HTML 代码中不重要的空白字符,这是您应该感谢的事情,因为如果我们无法在源代码中设置可读性格式,那么编写 HTML 会困难得多。

What we need is not an entity, but a way to indicate semantically that the renderedcontent contains a line-break at this point. We also need to not indicate anything else (we can already indicate a line-break by beginning or ending a block element, but that's not what we want). The only reasonable way to do so is to have an element that means exactly that, and so we have the <br/>element, with its related tag being put into the source code.

我们需要的不是实体,而是一种在语义上指示渲染内容此时包含换行符的方法。我们也不需要指示其他任何东西(我们已经可以通过开始或结束块元素来指示换行符,但这不是我们想要的)。这样做的唯一合理方法是拥有一个确切含义的元素,因此我们拥有该<br/>元素,并将其相关标签放入源代码中。

回答by Oded

A tag and a character entity reference exist for different reasons - character entities are stand-ins for certain characters (sometimes required as escape sequences - for example &amp;for an ampersand &), tags are there for structure.

标签和字符实体引用存在的原因不同 - 字符实体是某些字符的替代品(有时需要作为转义序列 - 例如&amp;与符号&),标签用于结构。

The reason the <br>tag exists is that HTML collapses whitespace. There needs to be a way to specify a hard line break - a place that hasto have a line break. This is the function of the <br>tag.

<br>标签存在的原因是 HTML 折叠空白。需要有指定硬换行的方式-这样的地方有一个换行符。这就是<br>标签的功能。

There is no single character that has this meaning, though U+2028 LINE SEPARATORhas similar meaning, and even if it were to be used it would not help as it is considered to be whitespace and HTML would collapse it.

没有单个字符具有这种含义,尽管U+2028 LINE SEPARATOR具有相似的含义,即使使用它也无济于事,因为它被认为是空格,而 HTML 会将其折叠。

See the answers from @John Kugelmanand @John Hannafor more detail on this aspect.

有关这方面的更多详细信息,请参阅@John Kugelman@John Hanna的回答。



Not entirely related, there is another reason why a &br;character entity reference does not exist: a line break is defined in such a way that it could have more than one character, see the HTML 4 spec:

不完全相关,&br;字符实体引用不存在的另一个原因是:换行符的定义方式可以有多个字符,请参阅HTML 4 规范

A line break is defined to be a carriage return (&#x000D;), a line feed (&#x000A;), or a carriage return/line feed pair.

换行符定义为回车 ( &#x000D;)、换行 ( &#x000A;) 或回车/换行对。

Character entities are single character escapes, so cannot represent this, again in the HTML 4 spec:

字符实体是单个字符转义,因此无法在HTML 4 规范中再次表示:

A character entity reference is an SGML construct that references a character of the document character set.

字符实体引用是引用文档字符集字符的 SGML 构造。

You will see that all the defined character entities map to a singlecharacter. A line break/new line cannot be cleanly mapped this way, thus an entity is required instead of a character entity reference.

您将看到所有定义的字符实体都映射到单个字符。换行符/新行不能以这种方式干净地映射,因此需要实体而不是字符实体引用。

This is why a line break cannot be represented by a character entity reference.

这就是为什么不能用字符实体引用来表示换行符的原因。

Regardless, it not not needed as simply using the Enterkey inserts a line break.

无论如何,它不需要,因为简单地使用Enter键插入换行符。

回答by John Kugelman

Entities are stand-ins for other characters or bits of text. In HTML they are used to represent characters that are hard to type (e.g. &mdash;for "—") or for characters that need to be escaped (&amp;for "&"). What would a hypothetical &br;entity stand for?

实体是其他字符或文本位的替代品。在 HTML 中,它们用于表示难以输入的字符(例如&mdash;“—”)或需要转义的字符(&amp;“&”)。一个假设的&br;实体代表什么?

It couldn't be \ror \nor \r\nas these are already easy enough to type (just press enter). The issue you're trying to workaround is that HTML collapses whitespacein most contexts and treats newlines as spaces. That is, \nis not a line break character, it is just whitespace like tabs and spaces.

不可能\r\n\r\n因为这些已经很容易输入(只需按回车键)。您尝试解决的问题是HTML在大多数上下文中折叠空白并将换行符视为空格。也就是说,\n不是换行符,它只是像制表符和空格一样的空格。

An entity &br;would have to be replaced by some other text. What character do you use to represent the concept of "hard line break"? The standard line break character \nis exactly the right character, but unfortunately it's unsuitable since it's thrown in the generic "whitespace" bucket. You'd have to either overload some other control character to represent "hard line break", or use some extended Unicode character. When HTML was designed Unicode was only a nascent, still-developing standard, so that wasn't an option.

实体&br;必须被其他一些文本替换。你用什么字符来表示“硬断线”的概念?标准换行符\n正是正确的字符,但不幸的是它不合适,因为它被扔到了通用的“空白”桶中。您必须重载一些其他控制字符来表示“硬换行符”,或者使用一些扩展的 Unicode 字符。当 HTML 被设计时,Unicode 只是一个新生的、仍在发展中的标准,所以这不是一个选择。

A <br>element was the simple, straightforward way to add the concept of "hard line break" to a document since no character could represent that concept.

一个<br>元素是“硬换行”的概念添加到文档,因为没有字符可以代表这个概念的简单,直接的方式。

回答by Gumbo

In HTML all line breaksare treated as white space:

在 HTML 中,所有换行符都被视为空白:

A line break is defined to be a carriage return (&#x000D;), a line feed (&#x000A;), or a carriage return/line feed pair. All line breaks constitute white space.

换行符定义为回车 ( &#x000D;)、换行 ( &#x000A;) 或回车/换行对。所有换行符都构成空白。

And white spacedoes only separate words and sequences of white space is collapsed:

并且空格只分隔单词并且空格序列被折叠:

For all HTML elements except PRE, sequences of white space separate "words" (we use the term "word" here to mean "sequences of non-white space characters"). […]

[…]

Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PREelement). In particular, user agents should collapse input white space sequences when producing output inter-word space. […]

对于除 之外的所有 HTML 元素PRE,空格序列分隔“单词”(我们在这里使用术语“单词”表示“非空格字符序列”)。[…]

[…]

请注意,源文档中单词之间的一系列空格可能会导致呈现完全不同的单词间距(PRE元素除外)。特别是,用户代理在产生输出词间空间时应该折叠输入空白序列。[…]

This means that line breaks cannot be expressed by plain characters. And although there are certain special characters in Unicode to unambiguously separate lines and paragraphs, they are not specified to do this in HTML too:

这意味着换行符不能用普通字符表示。尽管 Unicode 中有某些特殊字符可以明确分隔行和段落,但在 HTML 中也没有指定它们这样做:

Note that although &#x2028;and &#x2029;are defined in [ISO10646] to unambiguously separate lines and paragraphs, respectively, these do not constitute line breaks in HTML […]

需要注意的是,虽然&#x2028;&#x2029;在[ISO10646]中定义明确地分开行和段落,分别为这些不构成换行符在HTML [...]

That means there is no plain character or sequence of plain characters that is to mark a line break in HTML. And that's why there is the BRelement.

这意味着在 HTML 中没有用于标记换行符的纯字符或纯字符序列。这就是为什么有BRelement的原因。

Now if you want to use &br;instead of <br>, you just need to declare the entity brto represent the value <br>:

现在如果你想使用&br;而不是<br>,你只需要声明实体br来表示值<br>

<!ENTITY br "<br>">

Having this additional entity named brdeclared, a general-purpose XML or SGML processor will replace every occurrence of the entity reference &br;with the value it represents (<br>). An example document:

声明了这个名为br 的附加实体后,通用 XML 或 SGML 处理器将用&br;它表示的值 ( <br>)替换实体引用的每个出现。示例文档:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd" [
   <!ENTITY br "<br>">
]>
<HTML>
   <HEAD>
      <TITLE>My first HTML document</TITLE>
   </HEAD>
   <BODY>
      <P>Hello &br;world!
   </BODY>
</HTML>

回答by Dan Diplo

HTML is a mark-uplanguage - it represents the structure of a document, not how that document should appear visually. Take the <EM>tag as an example - it tells user-agents that they should give emphasis to any text that is placed between the opening and closing <EM>tags. However, it does notstate howthat emphasis should be represented. Yes, most visual web-browsers will place the text in italics, but this is only convention. Other browsers, such as monochrome text-only browsers may display the text in inverse. A screen reader might read the text in a louder voice, or change the pronunciation. A search-engine spider might decide the text is more important than other elements.

HTML 是一种标记语言 - 它表示文档的结构,而不是该文档的视觉外观。以<EM>标签为例 - 它告诉用户代理他们应该强调位于开始和结束<EM>标签之间的任何文本。但是,它并没有说明如何应强调表示。是的,大多数可视化网络浏览器会将文本放置为斜体,但这只是惯例。其他浏览器,例如单色纯文本浏览器,可能会反向显示文本。屏幕阅读器可能会用更大的声音阅读文本,或更改发音。搜索引擎蜘蛛可能会认为文本比其他元素更重要。

The same goes for the <BR>tag - it isn't just another character entity, it actually represents a break in the document structure. A <BR> is not just a replacement for a newline character, but is a "semantic" part of the document and how it is structured. This is similar to the way an <H1>is not just a way of making text bigger and bolder, but is an integral part of the way the document is structured.

这同样适用于该<BR>标签-它不只是一个字符实体,它实际上代表了文档结构休息。一<BR>不只是一个换行符替换,但该文件的“语义”的一部分,它是如何构成的。这类似于 an<H1>不仅仅是一种使文本更大更粗的方式,而且是文档结构方式的一个组成部分。

回答by Nicolas78

Entities are content, tags are structure or layout (very roughly speaking). It seems whoever made the <br>a tag decided that breaking a line has more to do with structure and layout than with content. Not being able to actually "see" a <br>I'd tend to agree. Oh and I'm making this up as I go so feel free to disagree ;)

实体是内容,标签是结构或布局(非常粗略地说)。似乎制作<br>a 标签的人认为,断线更多地与结构和布局有关,而不是与内容有关。无法真正“看到”<br>我倾向于同意的。哦,我正在编造这个,所以请随意不同意;)

回答by Gregory Baker

brelements can be styled, though. How would you style an HTML entity? Because they're elements it makes them more flexible.

br但是,元素可以设置样式。您将如何设置 HTML 实体的样式?因为它们是元素,所以它们更灵活。

回答by Borealid

Yes. An HTML entity would be more appropriate, as a break tag cannot contain text and behaves much like a newline.

是的。HTML 实体会更合适,因为 break 标签不能包含文本并且行为很像换行符。

That's just not the way things are, though. Too late. I can't tell you the number of non-XML-compatible HTML documents I've had to deal with because of unclosed break tags...

然而,事情并非如此。为时已晚。由于未关闭的中断标签,我无法告诉您我必须处理的非 XML 兼容 HTML 文档的数量......