HTML 解析是什么意思？

Question

提问by LightningBolt?

I have heard of HTML Parser libraries like Simple HTML DOM and HTML Parser. I have also heard of questions containing HTML Parsing. What does it mean to parse HTML?

我听说过像 Simple HTML DOM 和 HTML Parser 这样的 HTML Parser 库。我也听说过包含 HTML 解析的问题。解析 HTML 是什么意思？

Answer 1

回答by Anshu Dwibhashi

Unlike what Spudley said, parsing is basically to resolve (a sentence) into its component parts and describe their syntactic roles.

与 Spudley 所说的不同，解析基本上是将（一个句子）解析为它的组成部分并描述它们的句法作用。

According to wikipedia, Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural languageor in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).

根据维基百科，解析或句法分析是根据形式语法规则分析自然语言或计算机语言中的一串符号的过程。术语解析来自拉丁文 pars (orationis)，意思是（语音的）部分。

In your case, HTML parsing is basically: taking in HTML code and extracting relevant information like the title of the page, paragraphs in the page, headings in the page, links, bold text etc.

在您的情况下，HTML 解析基本上是：接收 HTML 代码并提取相关信息，例如页面标题、页面中的段落、页面中的标题、链接、粗体文本等。

Parsers:

解析器：

A computer program that parses content is called a parser. There are in general 2 kinds of parsers:

解析内容的计算机程序称为解析器。通常有两种解析器：

Top-down parsing- Top-down parsing can be viewed as an attempt to find left-most derivations of an input-stream by searching for parse trees using a top-down expansion of the given formal grammar rules. Tokens are consumed from left to right. Inclusive choice is used to accommodate ambiguity by expanding all alternative right-hand-sides of grammar rules.

自顶向下解析- 自顶向下解析可以被视为通过使用给定形式语法规则的自顶向下扩展来搜索解析树来尝试找到输入流的最左派生。代币从左到右消耗。包含选择用于通过扩展语法规则的所有替代右侧来适应歧义。

Bottom-up parsing- A parser can start with the input and attempt to rewrite it to the start symbol. Intuitively, the parser attempts to locate the most basic elements, then the elements containing these, and so on. LR parsers are examples of bottom-up parsers. Another term used for this type of parser is Shift-Reduce parsing.

自底向上解析- 解析器可以从输入开始并尝试将其重写为开始符号。直观地说，解析器尝试定位最基本的元素，然后是包含这些元素的元素，依此类推。LR 解析器是自底向上解析器的例子。用于此类解析器的另一个术语是 Shift-Reduce 解析。

A few example parsers:

一些示例解析器：

Top-down parsers:

自顶向下的解析器：

Bottom-up parsers:

自底向上解析器：

Precedence parser
- Operator-precedence parser
- Simple precedence parser
BC (bounded context) parsing
LR parser(Left-to-right, Rightmost derivation)
- Simple LR (SLR) parser
- LALR parser
- Canonical LR (LR(1)) parser
- GLR parser
CYK parser
Recursive ascent parser

优先解析器
- 运算符优先级解析器
- 简单的优先级解析器
BC（有界上下文）解析
LR语法分析程序（大号EFT到右，- [Rightmost推导）
- 简单的 LR (SLR) 解析器
- LALR 解析器
- 规范的 LR (LR(1)) 解析器
- GLR 解析器
CYK解析器
递归上升解析器

Example parser:

示例解析器：

Here's an example HTML parser in python:

这是 Python 中的 HTML 解析器示例：

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')

Here's the output:

这是输出：

Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data  : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: h1
Encountered some data  : Parse me!
Encountered an end tag : h1
Encountered an end tag : body
Encountered an end tag : html

Encountered a start tag: html
Encountered a start tag: head
Encountered a start tag: title
Encountered some data  : Test
Encountered an end tag : title
Encountered an end tag : head
Encountered a start tag: body
Encountered a start tag: h1
Encountered some data  : Parse me!
Encountered an end tag : h1
Encountered an end tag : body
Encountered an end tag : html

References

参考

Answer 2

回答by Spudley

Parsing in general applies to any computer language, and is the process of taking the code as text and producing a structure in memory that the computer can understand and work with.

解析通常适用于任何计算机语言，是将代码作为文本并在内存中生成计算机可以理解和使用的结构的过程。

Specifically for HTML, HTML parsing is the process of taking raw HTML code, reading it, and generating a DOM tree object structure from it.

特别是对于 HTML，HTML 解析是获取原始 HTML 代码、读取它并从中生成 DOM 树对象结构的过程。

HTML 解析是什么意思？

提问by LightningBolt?

回答by Anshu Dwibhashi

Parsers:

解析器：

A few example parsers:

一些示例解析器：

Top-down parsers:

自顶向下的解析器：

Bottom-up parsers:

自底向上解析器：

Example parser:

示例解析器：

References

参考

回答by Spudley

相关推荐

最近更新

标签

HTML 解析是什么意思？

提问by LightningBolt?

回答by Anshu Dwibhashi

Parsers:

解析器：

A few example parsers:

一些示例解析器：

Top-down parsers:

自顶向下的解析器：

Bottom-up parsers:

自底向上解析器：

Example parser:

示例解析器：

References

参考

回答by Spudley

相关推荐

Html 显示选项：none 未隐藏在 IE 中

Html 如何使用 CSS 模拟 <br/>（br 标签）？

Html 在 <table> 中制作等宽的列

Html 如何水平居中 <div>

相关推荐

最近更新

标签