WebKit Inside: DOM树的构建


当客户端App主进程创建WKWebView对象时,会创建另外两个子进程:渲染进程与网络进程。主进程WKWebView发起请求时,先将请求转发给渲染进程,渲染进程再转发给网络进程,网络进程请求服务器。如果请求的是一个网页,网络进程会将服务器的响应数据HTML文件字符流吐给渲染进程。渲染进程拿到HTML文件字符流,首先要进行解析,将HTML文件字符流转换成DOM树,然后在DOM树的基础上,进行渲染操作,也就是布局、绘制。最后渲染进程将渲染数据吐给主进程WKWebView,WKWebView根据渲染数据创建对应的View展现视图。整个流程如下图所示:

什么是DOM树

渲染进程获取到HTML文件字符流,会将HTML文件字符流转换成DOM树。下图中左侧是一个HTML文件,右边就是转换而成的DOM树。

可以看到DOM树的根节点是HTMLDocument,代表整个文档。根节点下面的子节点与HTML文件中的标签是一一对应的,比如HTML中的标签就对应DOM树中的head节点。同时HTML文件中的文本,也成为DOM树中的一个节点,比如文本'Hello, World!',在DOM树中就成为div节点的子节点。

在DOM树中每一个节点都是具有一定方法与属性的对象,这些对象由对应的类创建出来。比如HTMLDocument节点,它对应的类是class HTMLDocument,下面是HTMLDocument的部分源码:

1 class HTMLDocument : public Document { // 继承自Document
2    ...
3     WEBCORE_EXPORT int width();
4     WEBCORE_EXPORT int height();
5     ...
6  }

从源码中可以看到,HTMLDocument继承自类Document,Document类的部分源码如下:

 1 class Document
 2     : public ContainerNode  // Document继承自ContainerNode,ContainerNode继承自Node
 3     , public TreeScope
 4     , public ScriptExecutionContext
 5     , public FontSelectorClient
 6     , public FrameDestructionObserver
 7     , public Supplementable
 8     , public Logger::Observer
 9     , public CanvasObserver {
10     WEBCORE_EXPORT ExceptionOr> createElementForBindings(const AtomString& tagName);  // 创建Element的方法
11     WEBCORE_EXPORT Ref createTextNode(const String& data); // 创建文本节点的方法
12     WEBCORE_EXPORT Ref createComment(const String& data); // 创建注释的方法
13     WEBCORE_EXPORT Ref createElement(const QualifiedName&, bool createdByParser); // 创建Element方法
14     ....
15  }

上面源码可以看到Document继承自Node,而且还可以看到前端十分熟悉的createElement、createTextNode等方法,JavaScript对这些方法的调用,最后都转换为对应C++方法的调用。

类Document有这些方法,并不是没有原因的,而是W3C组织给出的标准规定的,这个标准就是DOM(Document Object Model,文档对象模型)。DOM定义了DOM树中每个节点需要实现的接口和属性,下面是HTMLDocument、Document、HTMLDivElment的部分IDL(Interactive Data Language,接口描述语言,与具体平台和语言无关)描述,完整的IDL可以参看W3C

 1 interface HTMLDocument : Document {   // HTMLDocument 
 2     getter (WindowProxy or Element or HTMLCollection) (DOMString name);
 3 };
 4 
 5 
 6 interface Document : Node { // Document
 7    [NewObject, ImplementedAs=createElementForBindings] Element createElement(DOMString localName); // createElement
 8    [NewObject] Text createTextNode(DOMString data); // createTextNode
 9    ...
10  }
11  
12  
13  interface HTMLDivElement : HTMLElement { // HTMLDivElement
14     [CEReactions=NotNeeded, Reflect] attribute DOMString align;
15 };

在DOM树中,每一个节点都继承自类Node,同时Node还有一个子类Element,有的节点直接继承自类Node,比如文本节点,而有的节点继承自类Element,比如div节点。因此针对上面图中的DOM树,执行下面的JavaScript语句返回的结果是不一样的:

1 document.childNodes; // 返回子Node集合,返回DocumentType与HTML节点,都继承自Node
2 document.children; // 返回子Element集合,只返回HTML节点,DocumentType不继承自Element

下图给出部分节点的继承关系图:

DOM树的构建

DOM树的构建流程可以分位4个步骤: 解码、分词、创建节点、添加节点

1 解码

渲染进程从网络进程接收过来的是HTML字节流,而下一步分词是以字符为单位进行的。由于各种编码规范的存在,比如ISO-8859-1、UTF-8等,一个字符常常可能对应一个或者多个编码后的字节,解码的目的就是将HTML字节流转换成HTML字符流,或者换句话说,就是将原始的HTML字节流转换成字符串。

2 解码类图

从类图上看,类HTMLDocumentParser处于解码的核心位置,由这个类调用解码器将HTML字节流解码成字符流,存储到类HTMLInputStream中。

3 解码流程

整个解码流程当中,最关健的是如何找到正确的编码方式。只有找到了正确的编码方式,才能使用对应的解码器进行解码。解码发生的地方如下面源代码所示,这个方法在上图第3个栈帧被调用:

 1 // HTMLDocumentParser是DecodedDataDocumentParser的子类
 2 void DecodedDataDocumentParser::appendBytes(DocumentWriter& writer, const uint8_t* data, size_t length)
 3 {
 4     if (!length)
 5         return;
 6 
 7     String decoded = writer.decoder().decode(data, length); // 真正解码发生在这里
 8     if (decoded.isEmpty())
 9         return;
10 
11     writer.reportDataReceived();
12     append(decoded.releaseImpl());
13 }

上面代码第7行writer.decoder()返回一个TextResourceDecoder对象,解码操作由TextResourceDecoder::decode方法完成。下面逐步查看TextResourceDecoder::decode方法的源码:

 1 // 只保留了最重要的部分
 2  2 String TextResourceDecoder::decode(const char* data, size_t length)
 3  3 {
 4  4    ...
 5  5 
 6  6    // 如果是HTML文件,就从head标签中寻找字符集
 7  7     if ((m_contentType == HTML || m_contentType == XML) && !m_checkedForHeadCharset) // HTML and XML
 8  8         if (!checkForHeadCharset(data, length, movedDataToBuffer))
 9  9             return emptyString();
10 10     
11 11      ...
12 12    
13 13     // m_encoding存储者从HTML文件中找到的编码名称
14 14     if (!m_codec)
15 15         m_codec = newTextCodec(m_encoding);  // 创建具体的编码器
16 16 
17 17     ...
18 18 
19 19    // 解码并返回
20 20    String result = m_codec->decode(m_buffer.data() + lengthOfBOM, m_buffer.size() - lengthOfBOM, false, m_contentType == XML && !m_useLenientXMLDecoding, m_sawError);
21 21     m_buffer.clear(); // 清空存储的原始未解码的HTML字节流
22 22     return result;
23 23 }

从源码中可以看到,TextResourceDecoder首先从HTML的标签中去找编码方式,因为标签可以包含标签,标签可以设置HTML文件的字符集:

1 <head>
2         <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 
3         <title>DOM Treetitle>
4         <script>window.name = 'Lucy';script>
5  head>

如果能找到对应的字符集,TextResourceDeocder将其存储在成员变量m_encoding当中,并且根据对应的编码创建真正的解码器存储在成员变量m_codec中,最终使用m_codec对字节流进行解码,并且返回解码后的字符串。如果带有字符集的标签没有找到,TextResourceDeocder的m_encoding有默认值windows-1252(等同于ISO-8859-1)。

下面看一下TextResourceDecoder寻找标签中字符集的流程,也就是上面源码中第8行对checkForHeadCharset函数的调用:

 1 // 只保留了关健代码
 2 bool TextResourceDecoder::checkForHeadCharset(const char* data, size_t len, bool& movedDataToBuffer)
 3 {
 4     ...
 5 
 6     // This is not completely efficient, since the function might go
 7     // through the HTML head several times.
 8 
 9     size_t oldSize = m_buffer.size();
10     m_buffer.grow(oldSize + len);
11     memcpy(m_buffer.data() + oldSize, data, len); // 将字节流数据拷贝到自己的缓存m_buffer里面
12 
13     movedDataToBuffer = true;
14 
15     // Continue with checking for an HTML meta tag if we were already doing so.
16     if (m_charsetParser)
17         return checkForMetaCharset(data, len);  // 如果已经存在了meta标签解析器,直接开始解析
18    
19      ....
20 
21     m_charsetParser = makeUnique(); // 创建meta标签解析器
22     return checkForMetaCharset(data, len);
23 }

上面源代码中第11行,类TextResourceDecoder内部存储了需要解码的HTML字节流,这一步骤很重要,后面会讲到。先看第17行、21行、22行,这3行主要是使用标签解析器解析字符集,使用了懒加载的方式。下面看下checkForMetaCharset这个函数的实现:

 1 bool TextResourceDecoder::checkForMetaCharset(const char* data, size_t length)
 2 {
 3     if (!m_charsetParser->checkForMetaCharset(data, length))  // 解析meta标签字符集
 4         return false;
 5 
 6     setEncoding(m_charsetParser->encoding(), EncodingFromMetaTag); // 找到后设置字符编码名称
 7     m_charsetParser = nullptr;
 8     m_checkedForHeadCharset = true;
 9     return true;
10 }

上面源码第3行可以看到,整个解析标签的任务在类HTMLMetaCharsetParser::checkForMetaCharset中完成。

 1 // 只保留了关健代码
 2 bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
 3 {
 4     if (m_doneChecking) // 标志位,避免重复解析
 5         return true;
 6 
 7 
 8     // We still don't have an encoding, and are in the head.
 9     // The following tags are allowed in :
10     // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE
11     //
12     // We stop scanning when a tag that is not permitted in 
13     // is seen, rather when  is seen, because that more closely
14     // matches behavior in other browsers; more details in
15     // <http://bugs.webkit.org/show_bug.cgi?id=3590>.
16     //
17     // Additionally, we ignore things that looks like tags in , <script>
</span><span style="color: rgba(0, 128, 128, 1)">18</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> and <noscript>; see <</span><span style="color: rgba(0, 128, 0, 1); text-decoration: underline">http://bugs.webkit.org/show_bug.cgi?id=4560</span><span style="color: rgba(0, 128, 0, 1)">>,
</span><span style="color: rgba(0, 128, 128, 1)">19</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> <</span><span style="color: rgba(0, 128, 0, 1); text-decoration: underline">http://bugs.webkit.org/show_bug.cgi?id=12165</span><span style="color: rgba(0, 128, 0, 1)">> and
</span><span style="color: rgba(0, 128, 128, 1)">20</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> <</span><span style="color: rgba(0, 128, 0, 1); text-decoration: underline">http://bugs.webkit.org/show_bug.cgi?id=12389</span><span style="color: rgba(0, 128, 0, 1)">>.
</span><span style="color: rgba(0, 128, 128, 1)">21</span>     <span style="color: rgba(0, 128, 0, 1)">//</span>
<span style="color: rgba(0, 128, 128, 1)">22</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Since many sites have charset declarations after <body> or other tags
</span><span style="color: rgba(0, 128, 128, 1)">23</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> that are disallowed in <head>, we don't bail out until we've checked at
</span><span style="color: rgba(0, 128, 128, 1)">24</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> least bytesToCheckUnconditionally bytes of input.</span>
<span style="color: rgba(0, 128, 128, 1)">25</span> 
<span style="color: rgba(0, 128, 128, 1)">26</span>     constexpr <span style="color: rgba(0, 0, 255, 1)">int</span> bytesToCheckUnconditionally = <span style="color: rgba(128, 0, 128, 1)">1024</span>;  <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果解析了1024个字符还未找到带有字符集的<meta>标签,整个解析也算完成,此时没有解析到正确的字符集,就使用默认编码windows-1252(等同于ISO-8859-1)</span>
<span style="color: rgba(0, 128, 128, 1)">27</span> 
<span style="color: rgba(0, 128, 128, 1)">28</span>     <span style="color: rgba(0, 0, 255, 1)">bool</span><span style="color: rgba(0, 0, 0, 1)"> ignoredSawErrorFlag;
</span><span style="color: rgba(0, 128, 128, 1)">29</span>     m_input.append(m_codec->decode(data, length, <span style="color: rgba(0, 0, 255, 1)">false</span>, <span style="color: rgba(0, 0, 255, 1)">false</span>, ignoredSawErrorFlag)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 对字节流进行解码</span>
<span style="color: rgba(0, 128, 128, 1)">30</span> 
<span style="color: rgba(0, 128, 128, 1)">31</span>     <span style="color: rgba(0, 0, 255, 1)">while</span> (auto token = m_tokenizer.nextToken(m_input)) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> m_tokenizer进行分词操作,找meta标签也需要进行分词,分词操作后面讲</span>
<span style="color: rgba(0, 128, 128, 1)">32</span>         <span style="color: rgba(0, 0, 255, 1)">bool</span> isEnd = token->type() ==<span style="color: rgba(0, 0, 0, 1)"> HTMLToken::EndTag;
</span><span style="color: rgba(0, 128, 128, 1)">33</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (isEnd || token->type() ==<span style="color: rgba(0, 0, 0, 1)"> HTMLToken::StartTag) {
</span><span style="color: rgba(0, 128, 128, 1)">34</span>             AtomString tagName(token-><span style="color: rgba(0, 0, 0, 1)">name());
</span><span style="color: rgba(0, 128, 128, 1)">35</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(0, 0, 0, 1)">isEnd) {
</span><span style="color: rgba(0, 128, 128, 1)">36</span> <span style="color: rgba(0, 0, 0, 1)">                m_tokenizer.updateStateFor(tagName);
</span><span style="color: rgba(0, 128, 128, 1)">37</span>                 <span style="color: rgba(0, 0, 255, 1)">if</span> (tagName == metaTag && processMeta(*token)) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 找到meta标签进行处理</span>
<span style="color: rgba(0, 128, 128, 1)">38</span>                     m_doneChecking = <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">39</span>                     <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span>; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果找到了带有编码的meta标签,直接返回</span>
<span style="color: rgba(0, 128, 128, 1)">40</span> <span style="color: rgba(0, 0, 0, 1)">                }
</span><span style="color: rgba(0, 128, 128, 1)">41</span> <span style="color: rgba(0, 0, 0, 1)">            }
</span><span style="color: rgba(0, 128, 128, 1)">42</span> 
<span style="color: rgba(0, 128, 128, 1)">43</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (tagName != scriptTag && tagName !=<span style="color: rgba(0, 0, 0, 1)"> noscriptTag
</span><span style="color: rgba(0, 128, 128, 1)">44</span>                 && tagName != styleTag && tagName !=<span style="color: rgba(0, 0, 0, 1)"> linkTag
</span><span style="color: rgba(0, 128, 128, 1)">45</span>                 && tagName != metaTag && tagName !=<span style="color: rgba(0, 0, 0, 1)"> objectTag
</span><span style="color: rgba(0, 128, 128, 1)">46</span>                 && tagName != titleTag && tagName !=<span style="color: rgba(0, 0, 0, 1)"> baseTag
</span><span style="color: rgba(0, 128, 128, 1)">47</span>                 && (isEnd || tagName !=<span style="color: rgba(0, 0, 0, 1)"> htmlTag)
</span><span style="color: rgba(0, 128, 128, 1)">48</span>                 && (isEnd || tagName !=<span style="color: rgba(0, 0, 0, 1)"> headTag)) {
</span><span style="color: rgba(0, 128, 128, 1)">49</span>                 m_inHeadSection = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">50</span> <span style="color: rgba(0, 0, 0, 1)">            }
</span><span style="color: rgba(0, 128, 128, 1)">51</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">52</span> 
<span style="color: rgba(0, 128, 128, 1)">53</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!m_inHeadSection && m_input.numberOfCharactersConsumed() >= bytesToCheckUnconditionally) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果分词已经进入了<body>标签范围,同时分词数量已经超过了1024,也算成功</span>
<span style="color: rgba(0, 128, 128, 1)">54</span>             m_doneChecking = <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">55</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">56</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">57</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">58</span> 
<span style="color: rgba(0, 128, 128, 1)">59</span>     <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">60</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">上面源码第29行,类HTMLMetaCharsetParser也有一个解码器m_codec,解码器是在HTMLMetaCharsetParser对象创建时生成,这个解码器的真实类型是TextCodecLatin1(Latin1编码也就是ISO-8859-1,等同于windows-1252编码)。之所以可以直接使用TextCodecLatin1解码器,是因为<meta>标签如果设置正确,都是英文字符,完全可以使用TextCodecLatin1进行解析出来。这样就避免了为了找到<meta>标签,需要对字节流进行解码,而要解码就必须要找到<meta>标签这种鸡生蛋、蛋生鸡的问题。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">代码第37行对找到的<meta>标签进行处理,这个函数比较简单,主要是解析<meta>标签当中的属性,然后查看这些属性名中有没有charset。</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLMetaCharsetParser::processMeta(HTMLToken&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">    AttributeList attributes;
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">for</span> (auto& attribute : token.attributes()) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 获取meta标签属性</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span>         String attributeName =<span style="color: rgba(0, 0, 0, 1)"> StringImpl::create8BitIfPossible(attribute.name);
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>         String attributeValue =<span style="color: rgba(0, 0, 0, 1)"> StringImpl::create8BitIfPossible(attribute.value);
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 0, 1)">        attributes.append(std::make_pair(attributeName, attributeValue));
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> 
<span style="color: rgba(0, 128, 128, 1)">10</span>     m_encoding = encodingFromMetaAttributes(attributes); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 从属性中找字符集设置属性charset</span>
<span style="color: rgba(0, 128, 128, 1)">11</span>     <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> m_encoding.isValid();
</span><span style="color: rgba(0, 128, 128, 1)">12</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">上面分析TextResourceDecoder::checkForHeadCharset函数时,讲过第11行TextResourceDecoder类存储HTML字节流的操作很重要。原因是可能整个HTML字节流里面可能确实没有设置charset的<meta>标签,此时TextResourceDecoder::checkForHeadCharset函数就要返回false,导致TextResourceDecoder::decode函数返回空字符串,也就是不进行任何解码。是不是这样呢?真实的情况是,在接收HTML字节流整个过程中由于确实没有找到带有charset属性的<meta>标签,那么整个接收期间都不会解码。但是完整的HTML字节流会被存储在TextResourceDecoder的成员变量m_buffer里面,当整个HTML字节流接收结束的时,会有如下调用栈:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202201/489427-20220116211139965-1330322670.png" width="1000" height="500" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>
<p> <span style="font-family: "courier new", courier; font-size: 16px">从调用栈可以看到,当HTML字节流接收完成,最终会调用TextResourceDecoder::flush方法,这个方法会将TextResourceDecoder中有m_buffer存储的HTML字节流进行解码,由于在接收HTML字节流期间未成功找到编码方式,因此m_buffer里面存储的就是所有待解码的HTML字节流,然后在这里使用默认的编码windows-1252对全部字节流进行解码。因此,如果HTML字节流中包含汉字,那么如果不指定字符集,最终页面就会出现乱码。解码完成后,会将解码之后的字符流存储到HTMLDocumentParser中。</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)">1</span> <span style="color: rgba(0, 0, 255, 1)">void</span> DecodedDataDocumentParser::flush(DocumentWriter&<span style="color: rgba(0, 0, 0, 1)"> writer)
</span><span style="color: rgba(0, 128, 128, 1)">2</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">3</span>     String remainingData =<span style="color: rgba(0, 0, 0, 1)"> writer.decoder().flush();
</span><span style="color: rgba(0, 128, 128, 1)">4</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (remainingData.isEmpty())
</span><span style="color: rgba(0, 128, 128, 1)">5</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">6</span> 
<span style="color: rgba(0, 128, 128, 1)">7</span> <span style="color: rgba(0, 0, 0, 1)">    writer.reportDataReceived();
</span><span style="color: rgba(0, 128, 128, 1)">8</span>     append(remainingData.releaseImpl()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 解码后的字符流存储到HTMLDocumentParser</span>
<span style="color: rgba(0, 128, 128, 1)">9</span> }</pre>


<p><span style="font-family: "courier new", courier; font-size: 16px">4 解码总结</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">整个解码过程可以分位两种情形: 第一种情形是HTML字节流可以解析出带有charset属性的<meta>标签,这样就可以获取相应的编码方式,那么每接收到一个HML字节流,都可以使用相应的编码方式进行解码,将解码后的字符流添加到HTMLInputStream当中;第二种是HTML字节流不能解析带有charset属性的<meta>标签,这样每接收到一个HTML字节流,都缓存到TextResourceDecoder的m_buffer缓存,等完整的HTML字节流接收完毕,就会使用默认的编码windows-1252进行解码。</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202201/489427-20220116224145428-1432872655.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202201/489427-20220116224240920-2012130020.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><strong><span style="font-family: "courier new", courier; font-size: 16px">分词</span></strong></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">接收到的HTML字节流经过解码,成为存储在HTMLInputStream中的字符流。分词的过程就是从HTMLInputStream中依次取出每一个字符,然后判断字符是否是特殊的HTML字符'<'、'/'、'>'、'='等。根据这些特殊字符的分割,就能解析出HTML标签名以及属性列表,类HTMLToken就是存储分词出来的结果。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">1 分词类图</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202201/489427-20220116232436919-1626702446.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">从类图中可以看到,分词最重要的是类HTMLTokenizer和类HTMLToken。下面是类HTMLToken的主要信息:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留了主要信息</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span>  <span style="color: rgba(128, 0, 128, 1)">2</span> <span style="color: rgba(0, 0, 255, 1)">class</span><span style="color: rgba(0, 0, 0, 1)"> HTMLToken {
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>  <span style="color: rgba(128, 0, 128, 1)">3</span> <span style="color: rgba(0, 0, 255, 1)">public</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>  <span style="color: rgba(128, 0, 128, 1)">4</span>     <span style="color: rgba(0, 0, 255, 1)">enum</span> Type { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Token的类型</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span>  <span style="color: rgba(128, 0, 128, 1)">5</span>         Uninitialized, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Token初始化时的类型</span>
<span style="color: rgba(0, 128, 128, 1)"> 6</span>  <span style="color: rgba(128, 0, 128, 1)">6</span>         DOCTYPE, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是DOCType标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>  <span style="color: rgba(128, 0, 128, 1)">7</span>         StartTag, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是一个开始标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>  <span style="color: rgba(128, 0, 128, 1)">8</span>         EndTag, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是一个结束标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span>  <span style="color: rgba(128, 0, 128, 1)">9</span>         Comment, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是一个注释</span>
<span style="color: rgba(0, 128, 128, 1)">10</span> <span style="color: rgba(128, 0, 128, 1)">10</span>         Character, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是文本</span>
<span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(128, 0, 128, 1)">11</span>         EndOfFile, <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代表Token是文件结尾</span>
<span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(128, 0, 128, 1)">12</span><span style="color: rgba(0, 0, 0, 1)">     };
</span><span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(128, 0, 128, 1)">13</span> 
<span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(128, 0, 128, 1)">14</span>     <span style="color: rgba(0, 0, 255, 1)">struct</span> Attribute { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 存储属性的数据结构</span>
<span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(128, 0, 128, 1)">15</span>         Vector<UChar, <span style="color: rgba(128, 0, 128, 1)">32</span>> name; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 属性名</span>
<span style="color: rgba(0, 128, 128, 1)">16</span> <span style="color: rgba(128, 0, 128, 1)">16</span>         Vector<UChar, <span style="color: rgba(128, 0, 128, 1)">64</span>> value; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 属性值</span>
<span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(128, 0, 128, 1)">17</span> 
<span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(128, 0, 128, 1)">18</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Used by HTMLSourceTracker.</span>
<span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(128, 0, 128, 1)">19</span><span style="color: rgba(0, 0, 0, 1)">         unsigned startOffset;
</span><span style="color: rgba(0, 128, 128, 1)">20</span> <span style="color: rgba(128, 0, 128, 1)">20</span><span style="color: rgba(0, 0, 0, 1)">         unsigned endOffset;
</span><span style="color: rgba(0, 128, 128, 1)">21</span> <span style="color: rgba(128, 0, 128, 1)">21</span><span style="color: rgba(0, 0, 0, 1)">     };
</span><span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(128, 0, 128, 1)">22</span> 
<span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(128, 0, 128, 1)">23</span>     typedef Vector<Attribute, <span style="color: rgba(128, 0, 128, 1)">10</span>> AttributeList; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 属性列表</span>
<span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(128, 0, 128, 1)">24</span>     typedef Vector<UChar, <span style="color: rgba(128, 0, 128, 1)">256</span>> DataVector; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 存储Token名</span>
<span style="color: rgba(0, 128, 128, 1)">25</span> <span style="color: rgba(128, 0, 128, 1)">25</span> 
<span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(128, 0, 128, 1)">26</span><span style="color: rgba(0, 0, 0, 1)">  ...
</span><span style="color: rgba(0, 128, 128, 1)">27</span> <span style="color: rgba(128, 0, 128, 1)">27</span> 
<span style="color: rgba(0, 128, 128, 1)">28</span> <span style="color: rgba(128, 0, 128, 1)">28</span> <span style="color: rgba(0, 0, 255, 1)">private</span><span style="color: rgba(0, 0, 0, 1)">:
</span><span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(128, 0, 128, 1)">29</span><span style="color: rgba(0, 0, 0, 1)">     Type m_type;
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(128, 0, 128, 1)">30</span><span style="color: rgba(0, 0, 0, 1)">     DataVector m_data;
</span><span style="color: rgba(0, 128, 128, 1)">31</span> <span style="color: rgba(128, 0, 128, 1)">31</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> For StartTag and EndTag</span>
<span style="color: rgba(0, 128, 128, 1)">32</span> <span style="color: rgba(128, 0, 128, 1)">32</span>     <span style="color: rgba(0, 0, 255, 1)">bool</span> m_selfClosing; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Token是注入<img>一样自结束标签</span>
<span style="color: rgba(0, 128, 128, 1)">33</span> <span style="color: rgba(128, 0, 128, 1)">33</span><span style="color: rgba(0, 0, 0, 1)">     AttributeList m_attributes;
</span><span style="color: rgba(0, 128, 128, 1)">34</span> <span style="color: rgba(128, 0, 128, 1)">34</span>     Attribute* m_currentAttribute; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 当前正在解析的属性</span>
<span style="color: rgba(0, 128, 128, 1)">35</span> <span style="color: rgba(128, 0, 128, 1)">35</span> };</pre>


<p><span style="font-family: "courier new", courier; font-size: 16px">2 分词流程</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202201/489427-20220116233602833-1723316583.png" width="1000" height="500" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">上面分词流程中HTMLDocumentParser::pumpTokenizerLoop方法是最重要的,从方法名字可以看出这个方法里面包含循环逻辑:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLDocumentParser::pumpTokenizerLoop(SynchronousMode mode, <span style="color: rgba(0, 0, 255, 1)">bool</span> parsingFragment, PumpSession&<span style="color: rgba(0, 0, 0, 1)"> session)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">do</span> { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 分词循环体开始</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)">        ...
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> 
<span style="color: rgba(0, 128, 128, 1)"> 7</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (UNLIKELY(mode == AllowYield && m_parserScheduler->shouldYieldBeforeToken(session))) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 避免长时间处于分词循环中,这里根据条件暂时退出循环</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> 
<span style="color: rgba(0, 128, 128, 1)">10</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(0, 0, 0, 1)">parsingFragment)
</span><span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 0, 1)">            m_sourceTracker.startToken(m_input.current(), m_tokenizer);
</span><span style="color: rgba(0, 128, 128, 1)">12</span> 
<span style="color: rgba(0, 128, 128, 1)">13</span>         auto token = m_tokenizer.nextToken(m_input.current()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 进行分词操作,取出一个token</span>
<span style="color: rgba(0, 128, 128, 1)">14</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(0, 0, 0, 1)">token)
</span><span style="color: rgba(0, 128, 128, 1)">15</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span>; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 分词没有产生token,就跳出循环</span>
<span style="color: rgba(0, 128, 128, 1)">16</span> 
<span style="color: rgba(0, 128, 128, 1)">17</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(0, 0, 0, 1)">parsingFragment)
</span><span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 0, 1)">            m_sourceTracker.endToken(m_input.current(), m_tokenizer);
</span><span style="color: rgba(0, 128, 128, 1)">19</span> 
<span style="color: rgba(0, 128, 128, 1)">20</span>         constructTreeFromHTMLToken(token); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 根据token构建DOM树</span>
<span style="color: rgba(0, 128, 128, 1)">21</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (!<span style="color: rgba(0, 0, 0, 1)">isStopped()); 
</span><span style="color: rgba(0, 128, 128, 1)">22</span> 
<span style="color: rgba(0, 128, 128, 1)">23</span>     <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">24</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">上面代码中第7行会有一个yield退出操作,这是为了避免长时间处于分词循环,占用主线程。当退出条件为真时,会从分词循环中返回,返回值为true。下面是退出判断代码:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLParserScheduler::shouldYieldBeforeToken(PumpSession&<span style="color: rgba(0, 0, 0, 1)"> session)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">    {
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span> <span style="color: rgba(0, 0, 0, 1)">        ...
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> 
<span style="color: rgba(0, 128, 128, 1)"> 6</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> numberOfTokensBeforeCheckingForYield是静态变量,定义为4096
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> session.processedTokensOnLastCheck表示从上一次退出为止,以及处理过的token个数
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> session.didSeeScript表示在分词过程中是否出现过script标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (UNLIKELY(session.processedTokens > session.processedTokensOnLastCheck + numberOfTokensBeforeCheckingForYield ||<span style="color: rgba(0, 0, 0, 1)"> session.didSeeScript))
</span><span style="color: rgba(0, 128, 128, 1)">10</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> checkForYield(session);
</span><span style="color: rgba(0, 128, 128, 1)">11</span> 
<span style="color: rgba(0, 128, 128, 1)">12</span>         ++<span style="color: rgba(0, 0, 0, 1)">session.processedTokens;
</span><span style="color: rgba(0, 128, 128, 1)">13</span>         <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">15</span> 
<span style="color: rgba(0, 128, 128, 1)">16</span> 
<span style="color: rgba(0, 128, 128, 1)">17</span>     <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLParserScheduler::checkForYield(PumpSession&<span style="color: rgba(0, 0, 0, 1)"> session)
</span><span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 0, 1)">    {
</span><span style="color: rgba(0, 128, 128, 1)">19</span>         session.processedTokensOnLastCheck =<span style="color: rgba(0, 0, 0, 1)"> session.processedTokens;
</span><span style="color: rgba(0, 128, 128, 1)">20</span>         session.didSeeScript = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">21</span> 
<span style="color: rgba(0, 128, 128, 1)">22</span>         Seconds elapsedTime = MonotonicTime::now() -<span style="color: rgba(0, 0, 0, 1)"> session.startTime;
</span><span style="color: rgba(0, 128, 128, 1)">23</span>         <span style="color: rgba(0, 0, 255, 1)">return</span> elapsedTime > m_parserTimeLimit; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> m_parserTimeLimit的值默认是500ms,从分词开始超过500ms就要先yield</span>
<span style="color: rgba(0, 128, 128, 1)">24</span>     }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">如果命中了上面的yield退出条件,那么什么时候再次进入分词呢?下面的代码展示了再次进入分词的过程:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 保留关键代码<br><br></span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span><span style="color: rgba(0, 0, 0, 1)"> HTMLDocumentParser::pumpTokenizer(SynchronousMode mode)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> 
<span style="color: rgba(0, 128, 128, 1)"> 6</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (shouldResume) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 从pumpTokenizerLoop中yield退出时返回值为true</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>         m_parserScheduler-><span style="color: rgba(0, 0, 0, 1)">scheduleForResume(); 
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span> 
<span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">10</span> 
<span style="color: rgba(0, 128, 128, 1)">11</span> 
<span style="color: rgba(0, 128, 128, 1)">12</span> 
<span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(0, 0, 255, 1)">void</span><span style="color: rgba(0, 0, 0, 1)"> HTMLParserScheduler::scheduleForResume()
</span><span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">15</span>     ASSERT(!<span style="color: rgba(0, 0, 0, 1)">m_suspended);
</span><span style="color: rgba(0, 128, 128, 1)">16</span>     m_continueNextChunkTimer.startOneShot(0_s); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 触发timer(0s后触发),触发后的响应函数为HTMLParserScheduler::continueNextChunkTimerFired</span>
<span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">18</span> 
<span style="color: rgba(0, 128, 128, 1)">19</span> 
<span style="color: rgba(0, 128, 128, 1)">20</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)">21</span> <span style="color: rgba(0, 0, 255, 1)">void</span><span style="color: rgba(0, 0, 0, 1)"> HTMLParserScheduler::continueNextChunkTimerFired()
</span><span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">24</span> 
<span style="color: rgba(0, 128, 128, 1)">25</span>     m_parser.resumeParsingAfterYield(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 重新Resume分词过程</span>
<span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">27</span> 
<span style="color: rgba(0, 128, 128, 1)">28</span> 
<span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(0, 0, 255, 1)">void</span><span style="color: rgba(0, 0, 0, 1)"> HTMLDocumentParser::resumeParsingAfterYield()
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">31</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> pumpTokenizer can cause this parser to be detached from the Document,
</span><span style="color: rgba(0, 128, 128, 1)">32</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> but we need to ensure it isn't deleted yet.</span>
<span style="color: rgba(0, 128, 128, 1)">33</span>     Ref<HTMLDocumentParser> protectedThis(*<span style="color: rgba(0, 0, 255, 1)">this</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)">34</span> 
<span style="color: rgba(0, 128, 128, 1)">35</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> We should never be here unless we can pump immediately.
</span><span style="color: rgba(0, 128, 128, 1)">36</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Call pumpTokenizer() directly so that ASSERTS will fire if we're wrong.</span>
<span style="color: rgba(0, 128, 128, 1)">37</span>     pumpTokenizer(AllowYield); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 重新进入分词过程,该函数会调用pumpTokenizerLoop</span>
<span style="color: rgba(0, 128, 128, 1)">38</span> <span style="color: rgba(0, 0, 0, 1)">    endIfDelayed();
</span><span style="color: rgba(0, 128, 128, 1)">39</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">从上面代码可以看出,再次进入分词过程是通过触发一个Timer来实现的,虽然这个Timer在0s后触发,但是并不意味着Timer的响应函数会立刻执行。如果在此之前主线程已经有其他任务到达了执行时机,会有被执行的机会。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">继续看HTMLDocumentParser::pumpTokenizerLoop函数的第13行,这一行进行分词操作,从解码后的字符流中分出一个token。实现分词的代码位于HTMLTokenizer::processToken:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关键代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLTokenizer::processToken(SegmentedString&<span style="color: rgba(0, 0, 0, 1)"> source)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>    
<span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> 
<span style="color: rgba(0, 128, 128, 1)"> 7</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state))) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 取出source内部指向的字符,赋给m_nextInputCharacter</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> haveBufferedCharacterToken();
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span>     UChar character = m_preprocessor.nextInputCharacter(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 获取character
</span><span style="color: rgba(0, 128, 128, 1)">10</span> 
<span style="color: rgba(0, 128, 128, 1)">11</span>     <span style="color: rgba(0, 128, 0, 1)">//</span> <span style="color: rgba(0, 128, 0, 1); text-decoration: underline">https://html.spec.whatwg.org/</span><span style="color: rgba(0, 128, 0, 1)">#tokenization</span>
<span style="color: rgba(0, 128, 128, 1)">12</span>     <span style="color: rgba(0, 0, 255, 1)">switch</span> (m_state) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 进行状态转换,m_state初始值为DataState</span>
<span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">15</span> 
<span style="color: rgba(0, 128, 128, 1)">16</span>     <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">17</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">这个方法由于内部要做很多状态转换,总共有1200多行,后面会有4个例子来解释状态转换的逻辑。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">首先来看InputStreamPreprocessor::peek方法:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span>  <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Returns whether we succeeded in peeking at the next character.
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span>  <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> The only way we can fail to peek is if there are no more
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>  <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> characters in |source| (after collapsing \r\n, etc).</span>
<span style="color: rgba(0, 128, 128, 1)"> 4</span>  ALWAYS_INLINE <span style="color: rgba(0, 0, 255, 1)">bool</span> InputStreamPreprocessor::peek(SegmentedString& source, <span style="color: rgba(0, 0, 255, 1)">bool</span> skipNullCharacters = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)"> {
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>      <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (UNLIKELY(source.isEmpty()))
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span>          <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>  
<span style="color: rgba(0, 128, 128, 1)"> 9</span>      m_nextInputCharacter = source.currentCharacter(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 获取字符流source内部指向的当前字符
</span><span style="color: rgba(0, 128, 128, 1)">10</span>  
<span style="color: rgba(0, 128, 128, 1)">11</span>      <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Every branch in this function is expensive, so we have a
</span><span style="color: rgba(0, 128, 128, 1)">12</span>      <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> fast-reject branch for characters that don't require special
</span><span style="color: rgba(0, 128, 128, 1)">13</span>      <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> handling. Please run the parser benchmark whenever you touch
</span><span style="color: rgba(0, 128, 128, 1)">14</span>      <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> this function. It's very hot.</span>
<span style="color: rgba(0, 128, 128, 1)">15</span>      constexpr UChar specialCharacterMask = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span> | <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\r</span><span style="color: rgba(128, 0, 0, 1)">'</span> | <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\0</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">16</span>      <span style="color: rgba(0, 0, 255, 1)">if</span> (LIKELY(m_nextInputCharacter & ~<span style="color: rgba(0, 0, 0, 1)">specialCharacterMask)) {
</span><span style="color: rgba(0, 128, 128, 1)">17</span>          m_skipNextNewLine = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">18</span>          <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(0, 0, 0, 1)">     }
</span><span style="color: rgba(0, 128, 128, 1)">20</span>  
<span style="color: rgba(0, 128, 128, 1)">21</span>      <span style="color: rgba(0, 0, 255, 1)">return</span> processNextInputCharacter(source, skipNullCharacters); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳过空字符,将\r\n换行符合并成\n</span>
<span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 0, 0, 1)"> }
</span><span style="color: rgba(0, 128, 128, 1)">23</span> 
<span style="color: rgba(0, 128, 128, 1)">24</span> 
<span style="color: rgba(0, 128, 128, 1)">25</span> <span style="color: rgba(0, 0, 255, 1)">bool</span> InputStreamPreprocessor::processNextInputCharacter(SegmentedString& source, <span style="color: rgba(0, 0, 255, 1)">bool</span><span style="color: rgba(0, 0, 0, 1)"> skipNullCharacters)
</span><span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(0, 0, 0, 1)">    {
</span><span style="color: rgba(0, 128, 128, 1)">27</span> <span style="color: rgba(0, 0, 0, 1)">    ProcessAgain:
</span><span style="color: rgba(0, 128, 128, 1)">28</span>         ASSERT(m_nextInputCharacter ==<span style="color: rgba(0, 0, 0, 1)"> source.currentCharacter());
</span><span style="color: rgba(0, 128, 128, 1)">29</span> 
<span style="color: rgba(0, 128, 128, 1)">30</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 针对\r\n换行符,下面if语句处理\r字符并且设置m_skipNextNewLine=true,后面处理\n就直接忽略</span>
<span style="color: rgba(0, 128, 128, 1)">31</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (m_nextInputCharacter == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span> &&<span style="color: rgba(0, 0, 0, 1)"> m_skipNextNewLine) {
</span><span style="color: rgba(0, 128, 128, 1)">32</span>             m_skipNextNewLine = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">33</span>             source.advancePastNewline(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 向前移动字符</span>
<span style="color: rgba(0, 128, 128, 1)">34</span>             <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (source.isEmpty())
</span><span style="color: rgba(0, 128, 128, 1)">35</span>                 <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">36</span>             m_nextInputCharacter =<span style="color: rgba(0, 0, 0, 1)"> source.currentCharacter();
</span><span style="color: rgba(0, 128, 128, 1)">37</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">38</span> 
<span style="color: rgba(0, 128, 128, 1)">39</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果是\r\n连续的换行符,那么第一次遇到\r字符,将\r字符替换成\n字符,同时设置标志m_skipNextNewLine=true</span>
<span style="color: rgba(0, 128, 128, 1)">40</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (m_nextInputCharacter == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\r</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) { 
</span><span style="color: rgba(0, 128, 128, 1)">41</span>             m_nextInputCharacter = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\n</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">42</span>             m_skipNextNewLine = <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">43</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">44</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">45</span>         m_skipNextNewLine = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">46</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (m_nextInputCharacter ||<span style="color: rgba(0, 0, 0, 1)"> isAtEndOfFile(source))
</span><span style="color: rgba(0, 128, 128, 1)">47</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">48</span> 
<span style="color: rgba(0, 128, 128, 1)">49</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳过空字符</span>
<span style="color: rgba(0, 128, 128, 1)">50</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (skipNullCharacters && !<span style="color: rgba(0, 0, 0, 1)">m_tokenizer.neverSkipNullCharacters()) {
</span><span style="color: rgba(0, 128, 128, 1)">51</span> <span style="color: rgba(0, 0, 0, 1)">            source.advancePastNonNewline();
</span><span style="color: rgba(0, 128, 128, 1)">52</span>             <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (source.isEmpty())
</span><span style="color: rgba(0, 128, 128, 1)">53</span>                 <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">54</span>             m_nextInputCharacter =<span style="color: rgba(0, 0, 0, 1)"> source.currentCharacter();
</span><span style="color: rgba(0, 128, 128, 1)">55</span>             <span style="color: rgba(0, 0, 255, 1)">goto</span> ProcessAgain; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到开头</span>
<span style="color: rgba(0, 128, 128, 1)">56</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">57</span>         m_nextInputCharacter =<span style="color: rgba(0, 0, 0, 1)"> replacementCharacter;
</span><span style="color: rgba(0, 128, 128, 1)">58</span>         <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">59</span>     }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">由于peek方法会跳过空字符,同时合并\r\n字符为\n字符,所以一个字符流source如果包含了空格或者\r\n换行符,实际上处理起来如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221003955148-1542889028.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">HTMLTokenizer::processToken内部定义了一个状态机,下面以四种情形来进行解释。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">第一种 <!DCOTYPE>标签</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)">  1</span> BEGIN_STATE(DataState) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 刚开始解析是DataState状态</span>
<span style="color: rgba(0, 128, 128, 1)">  2</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">  3</span> <span style="color: rgba(0, 0, 0, 1)">            ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInDataState);
</span><span style="color: rgba(0, 128, 128, 1)">  4</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span>) {<span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 整个字符流一开始是'<',那么表示是一个标签的开始</span>
<span style="color: rgba(0, 128, 128, 1)">  5</span>             <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (haveBufferedCharacterToken())
</span><span style="color: rgba(0, 128, 128, 1)">  6</span>                 RETURN_IN_CURRENT_STATE(<span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)">  7</span>             ADVANCE_PAST_NON_NEWLINE_TO(TagOpenState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到TagOpenState状态,并取去下一个字符是'!"</span>
<span style="color: rgba(0, 128, 128, 1)">  8</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">  9</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker)
</span><span style="color: rgba(0, 128, 128, 1)"> 10</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitEndOfFile(source);
</span><span style="color: rgba(0, 128, 128, 1)"> 11</span> <span style="color: rgba(0, 0, 0, 1)">        bufferCharacter(character);
</span><span style="color: rgba(0, 128, 128, 1)"> 12</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 13</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 14</span> 
<span style="color: rgba(0, 128, 128, 1)"> 15</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> ADVANCE_PAST_NON_NEWLINE_TO定义</span>
<span style="color: rgba(0, 128, 128, 1)"> 16</span> <span style="color: rgba(0, 0, 255, 1)">#define</span> ADVANCE_PAST_NON_NEWLINE_TO(newState)                   \
<span style="color: rgba(0, 128, 128, 1)"> 17</span>     <span style="color: rgba(0, 0, 255, 1)">do</span><span style="color: rgba(0, 0, 0, 1)"> {                                                        \
</span><span style="color: rgba(0, 128, 128, 1)"> 18</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!m_preprocessor.advancePastNonNewline(source, isNullCharacterSkippingState(newState))) { \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果往下移动取不到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 19</span>             m_state = newState;                                 \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 保存状态</span>
<span style="color: rgba(0, 128, 128, 1)"> 20</span>             <span style="color: rgba(0, 0, 255, 1)">return</span> haveBufferedCharacterToken();                \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 返回</span>
<span style="color: rgba(0, 128, 128, 1)"> 21</span> <span style="color: rgba(0, 0, 0, 1)">        }                                                       \
</span><span style="color: rgba(0, 128, 128, 1)"> 22</span>         character = m_preprocessor.nextInputCharacter();        \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 先取出下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 23</span>         <span style="color: rgba(0, 0, 255, 1)">goto</span> newState;                                          \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到指定状态</span>
<span style="color: rgba(0, 128, 128, 1)"> 24</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (<span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 25</span> 
<span style="color: rgba(0, 128, 128, 1)"> 26</span> 
<span style="color: rgba(0, 128, 128, 1)"> 27</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(TagOpenState)
</span><span style="color: rgba(0, 128, 128, 1)"> 28</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">!</span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 满足此条件</span>
<span style="color: rgba(0, 128, 128, 1)"> 29</span>             ADVANCE_PAST_NON_NEWLINE_TO(MarkupDeclarationOpenState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 同理,跳转到MarkupDeclarationOpenState状态,并且取出下一个字符'D'</span>
<span style="color: rgba(0, 128, 128, 1)"> 30</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 31</span> <span style="color: rgba(0, 0, 0, 1)">            ADVANCE_PAST_NON_NEWLINE_TO(EndTagOpenState);
</span><span style="color: rgba(0, 128, 128, 1)"> 32</span>         <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isASCIIAlpha(character)) {
</span><span style="color: rgba(0, 128, 128, 1)"> 33</span> <span style="color: rgba(0, 0, 0, 1)">            m_token.beginStartTag(convertASCIIAlphaToLower(character));
</span><span style="color: rgba(0, 128, 128, 1)"> 34</span> <span style="color: rgba(0, 0, 0, 1)">            ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 35</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)"> 36</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">?</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)"> 37</span> <span style="color: rgba(0, 0, 0, 1)">            parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 38</span>             <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> The spec consumes the current character before switching
</span><span style="color: rgba(0, 128, 128, 1)"> 39</span>             <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> to the bogus comment state, but it's easier to implement
</span><span style="color: rgba(0, 128, 128, 1)"> 40</span>             <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> if we reconsume the current character.</span>
<span style="color: rgba(0, 128, 128, 1)"> 41</span> <span style="color: rgba(0, 0, 0, 1)">            RECONSUME_IN(BogusCommentState);
</span><span style="color: rgba(0, 128, 128, 1)"> 42</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)"> 43</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 44</span>         bufferASCIICharacter(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)"> 45</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 46</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 47</span> 
<span style="color: rgba(0, 128, 128, 1)"> 48</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(MarkupDeclarationOpenState)
</span><span style="color: rgba(0, 128, 128, 1)"> 49</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">-</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)"> 50</span>             auto result = source.advancePast(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">--</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)"> 51</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result ==<span style="color: rgba(0, 0, 0, 1)"> SegmentedString::DidMatch) {
</span><span style="color: rgba(0, 128, 128, 1)"> 52</span> <span style="color: rgba(0, 0, 0, 1)">                m_token.beginComment();
</span><span style="color: rgba(0, 128, 128, 1)"> 53</span> <span style="color: rgba(0, 0, 0, 1)">                SWITCH_TO(CommentStartState);
</span><span style="color: rgba(0, 128, 128, 1)"> 54</span> <span style="color: rgba(0, 0, 0, 1)">            }
</span><span style="color: rgba(0, 128, 128, 1)"> 55</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result ==<span style="color: rgba(0, 0, 0, 1)"> SegmentedString::NotEnoughCharacters)
</span><span style="color: rgba(0, 128, 128, 1)"> 56</span> <span style="color: rgba(0, 0, 0, 1)">                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
</span><span style="color: rgba(0, 128, 128, 1)"> 57</span>         } <span style="color: rgba(0, 0, 255, 1)">else</span> <span style="color: rgba(0, 0, 255, 1)">if</span> (isASCIIAlphaCaselessEqual(character, <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">d</span><span style="color: rgba(128, 0, 0, 1)">'</span>)) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 由于character == 'D',满足此条件</span>
<span style="color: rgba(0, 128, 128, 1)"> 58</span>             auto result = source.advancePastLettersIgnoringASCIICase(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">doctype</span><span style="color: rgba(128, 0, 0, 1)">"</span>); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 看解码后的字符流中是否有完整的"doctype"</span>
<span style="color: rgba(0, 128, 128, 1)"> 59</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result ==<span style="color: rgba(0, 0, 0, 1)"> SegmentedString::DidMatch)
</span><span style="color: rgba(0, 128, 128, 1)"> 60</span>                 SWITCH_TO(DOCTYPEState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果匹配,则跳转到DOCTYPEState,同时取出当前指向的字符,由于上面source字符流已经移动了"doctype",因此此时取出的字符为'>'</span>
<span style="color: rgba(0, 128, 128, 1)"> 61</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result == SegmentedString::NotEnoughCharacters) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果不匹配</span>
<span style="color: rgba(0, 128, 128, 1)"> 62</span>                 RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 保存状态,直接返回</span>
<span style="color: rgba(0, 128, 128, 1)"> 63</span>         } <span style="color: rgba(0, 0, 255, 1)">else</span> <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">[</span><span style="color: rgba(128, 0, 0, 1)">'</span> &&<span style="color: rgba(0, 0, 0, 1)"> shouldAllowCDATA()) {
</span><span style="color: rgba(0, 128, 128, 1)"> 64</span>             auto result = source.advancePast(<span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">[CDATA[</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)"> 65</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result ==<span style="color: rgba(0, 0, 0, 1)"> SegmentedString::DidMatch)
</span><span style="color: rgba(0, 128, 128, 1)"> 66</span> <span style="color: rgba(0, 0, 0, 1)">                SWITCH_TO(CDATASectionState);
</span><span style="color: rgba(0, 128, 128, 1)"> 67</span>             <span style="color: rgba(0, 0, 255, 1)">if</span> (result ==<span style="color: rgba(0, 0, 0, 1)"> SegmentedString::NotEnoughCharacters)
</span><span style="color: rgba(0, 128, 128, 1)"> 68</span> <span style="color: rgba(0, 0, 0, 1)">                RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
</span><span style="color: rgba(0, 128, 128, 1)"> 69</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)"> 70</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 71</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(BogusCommentState);
</span><span style="color: rgba(0, 128, 128, 1)"> 72</span> <span style="color: rgba(0, 0, 0, 1)">    END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 73</span> 
<span style="color: rgba(0, 128, 128, 1)"> 74</span> 
<span style="color: rgba(0, 128, 128, 1)"> 75</span> <span style="color: rgba(0, 0, 255, 1)">#define</span> SWITCH_TO(newState)                                     \
<span style="color: rgba(0, 128, 128, 1)"> 76</span>     <span style="color: rgba(0, 0, 255, 1)">do</span><span style="color: rgba(0, 0, 0, 1)"> {                                                        \
</span><span style="color: rgba(0, 128, 128, 1)"> 77</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!<span style="color: rgba(0, 0, 0, 1)">m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \
</span><span style="color: rgba(0, 128, 128, 1)"> 78</span>             m_state =<span style="color: rgba(0, 0, 0, 1)"> newState;                                 \
</span><span style="color: rgba(0, 128, 128, 1)"> 79</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> haveBufferedCharacterToken();                \
</span><span style="color: rgba(0, 128, 128, 1)"> 80</span> <span style="color: rgba(0, 0, 0, 1)">        }                                                       \
</span><span style="color: rgba(0, 128, 128, 1)"> 81</span>         character = m_preprocessor.nextInputCharacter();        \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 取出下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 82</span>         <span style="color: rgba(0, 0, 255, 1)">goto</span> newState;                                          \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到指定的state</span>
<span style="color: rgba(0, 128, 128, 1)"> 83</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (<span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 84</span> 
<span style="color: rgba(0, 128, 128, 1)"> 85</span> 
<span style="color: rgba(0, 128, 128, 1)"> 86</span> <span style="color: rgba(0, 0, 255, 1)">#define</span> RETURN_IN_CURRENT_STATE(expression)                     \
<span style="color: rgba(0, 128, 128, 1)"> 87</span>     <span style="color: rgba(0, 0, 255, 1)">do</span><span style="color: rgba(0, 0, 0, 1)"> {                                                        \
</span><span style="color: rgba(0, 128, 128, 1)"> 88</span>         m_state = currentState;                                 \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 保存当前状态</span>
<span style="color: rgba(0, 128, 128, 1)"> 89</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> expression;                                      \
</span><span style="color: rgba(0, 128, 128, 1)"> 90</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (<span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 91</span> 
<span style="color: rgba(0, 128, 128, 1)"> 92</span> 
<span style="color: rgba(0, 128, 128, 1)"> 93</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(DOCTYPEState)
</span><span style="color: rgba(0, 128, 128, 1)"> 94</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)"> 95</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeDOCTYPENameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 96</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)"> 97</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 98</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.beginDOCTYPE();
</span><span style="color: rgba(0, 128, 128, 1)"> 99</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.setForceQuirks();
</span><span style="color: rgba(0, 128, 128, 1)">100</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)">101</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">102</span> <span style="color: rgba(0, 0, 0, 1)">    parseError();
</span><span style="color: rgba(0, 128, 128, 1)">103</span> <span style="color: rgba(0, 0, 0, 1)">    RECONSUME_IN(BeforeDOCTYPENameState);
</span><span style="color: rgba(0, 128, 128, 1)">104</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)">105</span> 
<span style="color: rgba(0, 128, 128, 1)">106</span> 
<span style="color: rgba(0, 128, 128, 1)">107</span> <span style="color: rgba(0, 0, 255, 1)">#define</span> RECONSUME_IN(newState)                                  \
<span style="color: rgba(0, 128, 128, 1)">108</span>     <span style="color: rgba(0, 0, 255, 1)">do</span> {                                                        \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 直接跳转到指定state</span>
<span style="color: rgba(0, 128, 128, 1)">109</span>         <span style="color: rgba(0, 0, 255, 1)">goto</span><span style="color: rgba(0, 0, 0, 1)"> newState;                                          \
</span><span style="color: rgba(0, 128, 128, 1)">110</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (<span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">111</span> 
<span style="color: rgba(0, 128, 128, 1)">112</span> 
<span style="color: rgba(0, 128, 128, 1)">113</span> <span style="color: rgba(0, 0, 0, 1)"> BEGIN_STATE(BeforeDOCTYPENameState)
</span><span style="color: rgba(0, 128, 128, 1)">114</span>         <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)">115</span> <span style="color: rgba(0, 0, 0, 1)">            ADVANCE_TO(BeforeDOCTYPENameState);
</span><span style="color: rgba(0, 128, 128, 1)">116</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span>) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> character == '>',匹配此处,到此DOCTYPE标签匹配完毕</span>
<span style="color: rgba(0, 128, 128, 1)">117</span> <span style="color: rgba(0, 0, 0, 1)">            parseError();
</span><span style="color: rgba(0, 128, 128, 1)">118</span> <span style="color: rgba(0, 0, 0, 1)">            m_token.beginDOCTYPE();
</span><span style="color: rgba(0, 128, 128, 1)">119</span> <span style="color: rgba(0, 0, 0, 1)">            m_token.setForceQuirks();
</span><span style="color: rgba(0, 128, 128, 1)">120</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)">121</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">122</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)">123</span> <span style="color: rgba(0, 0, 0, 1)">            parseError();
</span><span style="color: rgba(0, 128, 128, 1)">124</span> <span style="color: rgba(0, 0, 0, 1)">            m_token.beginDOCTYPE();
</span><span style="color: rgba(0, 128, 128, 1)">125</span> <span style="color: rgba(0, 0, 0, 1)">            m_token.setForceQuirks();
</span><span style="color: rgba(0, 128, 128, 1)">126</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)">127</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">128</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.beginDOCTYPE(toASCIILower(character));
</span><span style="color: rgba(0, 128, 128, 1)">129</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(DOCTYPENameState);
</span><span style="color: rgba(0, 128, 128, 1)">130</span> <span style="color: rgba(0, 0, 0, 1)">    END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)">131</span> 
<span style="color: rgba(0, 128, 128, 1)">132</span> 
<span style="color: rgba(0, 128, 128, 1)">133</span> 
<span style="color: rgba(0, 128, 128, 1)">134</span> 
<span style="color: rgba(0, 128, 128, 1)">135</span> inline <span style="color: rgba(0, 0, 255, 1)">bool</span> HTMLTokenizer::emitAndResumeInDataState(SegmentedString&<span style="color: rgba(0, 0, 0, 1)"> source)
</span><span style="color: rgba(0, 128, 128, 1)">136</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">137</span> <span style="color: rgba(0, 0, 0, 1)">    saveEndTagNameIfNeeded();
</span><span style="color: rgba(0, 128, 128, 1)">138</span>     m_state = DataState; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 重置状态为初始状态DataState</span>
<span style="color: rgba(0, 128, 128, 1)">139</span>     source.advancePastNonNewline(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">140</span>     <span style="color: rgba(0, 0, 255, 1)">return</span> <span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">141</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">DOCTYPE Token经历了6个状态最终被解析出来,整个过程如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221010543397-1011636632.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">当Token解析完毕之后,分词状态又被重置为DataState,同时需要注意的时,此时字符流source内部指向的是下一个字符'<'。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">上面代码第61行在用字符流source匹配字符串"doctype"时,可能出现匹配不上的情形。为什么会这样呢?这是因为整个DOM树的构建流程,并不是先要解码完成,解码完成之后获取到完整的字符流才进行分词。从前面解码可以知道,解码可能是一边接收字节流,一边进行解码的,因此分词也是这样,只要能解码出一段字符流,就会立即进行分词。整个流程会出现如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221011645495-1606865185.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">由于这个原因,用来分词的字符流可能是不完整的。对于出现不完整情形的DOCTYPE分词过程如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221013251814-1477152779.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">上面介绍了解码、分词、解码、分词处理DOCTYPE标签的情形,可以看到从逻辑上这种情形与完整解码再分词是一样的。后续介绍的时都会只针对完整解码再分词的情形,对于一边解码一边分词的情形,只需要正确的认识source字符流内部指针的移动,并不难分析。</span></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">第二种 html标签</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">html标签的分词过程和DOCTYPE类似,其相关代码如下:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(TagOpenState)
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">!</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(MarkupDeclarationOpenState);
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(EndTagOpenState);
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (isASCIIAlpha(character)) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在开标签状态下,当前字符为'h'</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>         m_token.beginStartTag(convertASCIIAlphaToLower(character)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 将'h'添加到Token名中</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>         ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到TagNameState,并移动到下一个字符't'</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">10</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">?</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">12</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> The spec consumes the current character before switching
</span><span style="color: rgba(0, 128, 128, 1)">13</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> to the bogus comment state, but it's easier to implement
</span><span style="color: rgba(0, 128, 128, 1)">14</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> if we reconsume the current character.</span>
<span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(BogusCommentState);
</span><span style="color: rgba(0, 128, 128, 1)">16</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 0, 1)">    parseError();
</span><span style="color: rgba(0, 128, 128, 1)">18</span>     bufferASCIICharacter(<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(0, 0, 0, 1)">    RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)">20</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)">21</span> 
<span style="color: rgba(0, 128, 128, 1)">22</span> 
<span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(TagNameState)
</span><span style="color: rgba(0, 128, 128, 1)">24</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)">25</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)">26</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">27</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
</span><span style="color: rgba(0, 128, 128, 1)">28</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span>) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在这个状态下遇到起始标签终止字符</span>
<span style="color: rgba(0, 128, 128, 1)">29</span>         <span style="color: rgba(0, 0, 255, 1)">return</span> emitAndResumeInDataState(source); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 当前分词结束,重置分词状态为DataState</span>
<span style="color: rgba(0, 128, 128, 1)">30</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_options.usePreHTML5ParserQuirks && character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">31</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)">32</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)">33</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">34</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)">35</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">36</span>     m_token.appendToName(toASCIILower(character)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 将当前字符添加到Token名</span>
<span style="color: rgba(0, 128, 128, 1)">37</span>     ADVANCE_PAST_NON_NEWLINE_TO(TagNameState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 继续跳转到当前状态,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">38</span> END_STATE()</pre>


<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221020631833-437265361.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">第三种 带有属性的标签div</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">HTML标签可以带有属性,属性由属性名和属性值组成,属性之间以及属性与标签名之间用空格分隔:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)">1</span>  <span style="color: rgba(0, 128, 0, 1)"><!--</span><span style="color: rgba(0, 128, 0, 1)"> div标签有两个属性,属性名为class和align,它们的值都带有引号 </span><span style="color: rgba(0, 128, 0, 1)">--></span>
<span style="color: rgba(0, 128, 128, 1)">2</span>  <span style="color: rgba(0, 0, 255, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">div </span><span style="color: rgba(255, 0, 0, 1)">class</span><span style="color: rgba(0, 0, 255, 1)">="news"</span><span style="color: rgba(255, 0, 0, 1)"> align</span><span style="color: rgba(0, 0, 255, 1)">="center"</span><span style="color: rgba(0, 0, 255, 1)">></span>Hello,World!<span style="color: rgba(0, 0, 255, 1)"></</span><span style="color: rgba(128, 0, 0, 1)">div</span><span style="color: rgba(0, 0, 255, 1)">></span>
<span style="color: rgba(0, 128, 128, 1)">3</span>  
<span style="color: rgba(0, 128, 128, 1)">4</span>  
<span style="color: rgba(0, 128, 128, 1)">5</span>  <span style="color: rgba(0, 128, 0, 1)"><!--</span><span style="color: rgba(0, 128, 0, 1)"> 属性值也可以不带引号 </span><span style="color: rgba(0, 128, 0, 1)">--></span>
<span style="color: rgba(0, 128, 128, 1)">6</span>  <span style="color: rgba(0, 0, 255, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">div </span><span style="color: rgba(255, 0, 0, 1)">class</span><span style="color: rgba(0, 0, 255, 1)">=news </span><span style="color: rgba(255, 0, 0, 1)">align</span><span style="color: rgba(0, 0, 255, 1)">=center</span><span style="color: rgba(0, 0, 255, 1)">></span>Hello,World!<span style="color: rgba(0, 0, 255, 1)"></</span><span style="color: rgba(128, 0, 0, 1)">div</span><span style="color: rgba(0, 0, 255, 1)">></span></pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">整个div标签的解析中,标签名div的解析流程和上面的html标签解析一样,当在解析标签名的过程中,碰到了空白字符,说明要开始解析属性了,下面是相关代码:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)">  1</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(TagNameState)
</span><span style="color: rgba(0, 128, 128, 1)">  2</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (isTokenizerWhitespace(character)) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在解析TagName时遇到空白字符,标志属性开始</span>
<span style="color: rgba(0, 128, 128, 1)">  3</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)">  4</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">  5</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
</span><span style="color: rgba(0, 128, 128, 1)">  6</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">  7</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)">  8</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_options.usePreHTML5ParserQuirks && character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">  9</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)"> 10</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)"> 11</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 12</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 13</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 14</span> <span style="color: rgba(0, 0, 0, 1)">    m_token.appendToName(toASCIILower(character));
</span><span style="color: rgba(0, 128, 128, 1)"> 15</span> <span style="color: rgba(0, 0, 0, 1)">    ADVANCE_PAST_NON_NEWLINE_TO(TagNameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 16</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 17</span> 
<span style="color: rgba(0, 128, 128, 1)"> 18</span> <span style="color: rgba(0, 0, 255, 1)">#define</span> ADVANCE_TO(newState)                                    \
<span style="color: rgba(0, 128, 128, 1)"> 19</span>     <span style="color: rgba(0, 0, 255, 1)">do</span><span style="color: rgba(0, 0, 0, 1)"> {                                                        \
</span><span style="color: rgba(0, 128, 128, 1)"> 20</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 21</span>             m_state =<span style="color: rgba(0, 0, 0, 1)"> newState;                                 \
</span><span style="color: rgba(0, 128, 128, 1)"> 22</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> haveBufferedCharacterToken();                \
</span><span style="color: rgba(0, 128, 128, 1)"> 23</span> <span style="color: rgba(0, 0, 0, 1)">        }                                                       \
</span><span style="color: rgba(0, 128, 128, 1)"> 24</span>         character =<span style="color: rgba(0, 0, 0, 1)"> m_preprocessor.nextInputCharacter();        \
</span><span style="color: rgba(0, 128, 128, 1)"> 25</span>         <span style="color: rgba(0, 0, 255, 1)">goto</span> newState;                                          \ <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到指定状态</span>
<span style="color: rgba(0, 128, 128, 1)"> 26</span>     } <span style="color: rgba(0, 0, 255, 1)">while</span> (<span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 27</span> 
<span style="color: rgba(0, 128, 128, 1)"> 28</span> 
<span style="color: rgba(0, 128, 128, 1)"> 29</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(BeforeAttributeNameState)
</span><span style="color: rgba(0, 128, 128, 1)"> 30</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (isTokenizerWhitespace(character)) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果标签名后有连续空格,那么就不停的跳过,在当前状态不停循环</span>
<span style="color: rgba(0, 128, 128, 1)"> 31</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 32</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 33</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
</span><span style="color: rgba(0, 128, 128, 1)"> 34</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 35</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)"> 36</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_options.usePreHTML5ParserQuirks && character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 37</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)"> 38</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)"> 39</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 40</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 41</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 42</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\'</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 43</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 44</span>     m_token.beginAttribute(source.numberOfCharactersConsumed()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Token的属性列表增加一个,用来存放新的属性名与属性值</span>
<span style="color: rgba(0, 128, 128, 1)"> 45</span>     m_token.appendToAttributeName(toASCIILower(character)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 添加属性名</span>
<span style="color: rgba(0, 128, 128, 1)"> 46</span>     ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到AttributeNameState,并且移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 47</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 48</span> 
<span style="color: rgba(0, 128, 128, 1)"> 49</span> 
<span style="color: rgba(0, 128, 128, 1)"> 50</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(AttributeNameState)
</span><span style="color: rgba(0, 128, 128, 1)"> 51</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)"> 52</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(AfterAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 53</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 54</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
</span><span style="color: rgba(0, 128, 128, 1)"> 55</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 56</span>         ADVANCE_PAST_NON_NEWLINE_TO(BeforeAttributeValueState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在解析属性名的过程中如果碰到=,说明属性名结束,属性值就要开始</span>
<span style="color: rgba(0, 128, 128, 1)"> 57</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 58</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)"> 59</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_options.usePreHTML5ParserQuirks && character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 60</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)"> 61</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)"> 62</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 63</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 64</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 65</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\'</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 66</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 67</span> <span style="color: rgba(0, 0, 0, 1)">    m_token.appendToAttributeName(toASCIILower(character));
</span><span style="color: rgba(0, 128, 128, 1)"> 68</span> <span style="color: rgba(0, 0, 0, 1)">    ADVANCE_PAST_NON_NEWLINE_TO(AttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)"> 69</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 70</span> 
<span style="color: rgba(0, 128, 128, 1)"> 71</span> 
<span style="color: rgba(0, 128, 128, 1)"> 72</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(BeforeAttributeValueState)
</span><span style="color: rgba(0, 128, 128, 1)"> 73</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)"> 74</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeAttributeValueState);
</span><span style="color: rgba(0, 128, 128, 1)"> 75</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 76</span>         ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueDoubleQuotedState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 有的属性值有引号包围,这里跳转到AttributeValueDoubleQuotedState,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 77</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 78</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(AttributeValueUnquotedState);
</span><span style="color: rgba(0, 128, 128, 1)"> 79</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\'</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 80</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueSingleQuotedState);
</span><span style="color: rgba(0, 128, 128, 1)"> 81</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)"> 82</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 83</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)"> 84</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 85</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)"> 86</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 87</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 88</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 89</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">`</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 90</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)"> 91</span>     m_token.appendToAttributeValue(character); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 有的属性值没有引号包围,添加属性值字符到Token</span>
<span style="color: rgba(0, 128, 128, 1)"> 92</span>     ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到AttributeValueUnquotedState,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 93</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)"> 94</span> 
<span style="color: rgba(0, 128, 128, 1)"> 95</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(AttributeValueDoubleQuotedState)
</span><span style="color: rgba(0, 128, 128, 1)"> 96</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span>) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在当前状态下如果遇到引号,说明属性值结束</span>
<span style="color: rgba(0, 128, 128, 1)"> 97</span>         m_token.endAttribute(source.numberOfCharactersConsumed()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 结束属性解析</span>
<span style="color: rgba(0, 128, 128, 1)"> 98</span>         ADVANCE_PAST_NON_NEWLINE_TO(AfterAttributeValueQuotedState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到AfterAttributeValueQuotedState,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 99</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">100</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)">101</span>         m_additionalAllowedCharacter = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">102</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);
</span><span style="color: rgba(0, 128, 128, 1)">103</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">104</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)">105</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">106</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.endAttribute(source.numberOfCharactersConsumed());
</span><span style="color: rgba(0, 128, 128, 1)">107</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)">108</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">109</span>     m_token.appendToAttributeValue(character); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 将属性值字符添加到Token</span>
<span style="color: rgba(0, 128, 128, 1)">110</span>     ADVANCE_TO(AttributeValueDoubleQuotedState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到当前状态</span>
<span style="color: rgba(0, 128, 128, 1)">111</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)">112</span> 
<span style="color: rgba(0, 128, 128, 1)">113</span> 
<span style="color: rgba(0, 128, 128, 1)">114</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(AfterAttributeValueQuotedState)
</span><span style="color: rgba(0, 128, 128, 1)">115</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (isTokenizerWhitespace(character))
</span><span style="color: rgba(0, 128, 128, 1)">116</span>         ADVANCE_TO(BeforeAttributeNameState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 属性值解析完毕,如果后面继续跟着空白字符,说明后续还有属性要解析,调回到BeforeAttributeNameState</span>
<span style="color: rgba(0, 128, 128, 1)">117</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">/</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">118</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(SelfClosingStartTagState);
</span><span style="color: rgba(0, 128, 128, 1)">119</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">120</span>         <span style="color: rgba(0, 0, 255, 1)">return</span> emitAndResumeInDataState(source); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 属性值解析完毕,如果遇到'>'字符,说明整个标签也要解析完毕了,此时结束当前标签解析,并且重置分词状态为DataState,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">121</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_options.usePreHTML5ParserQuirks && character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">122</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndReconsumeInDataState();
</span><span style="color: rgba(0, 128, 128, 1)">123</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)">124</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">125</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)">126</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">127</span> <span style="color: rgba(0, 0, 0, 1)">    parseError();
</span><span style="color: rgba(0, 128, 128, 1)">128</span> <span style="color: rgba(0, 0, 0, 1)">    RECONSUME_IN(BeforeAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)">129</span> <span style="color: rgba(0, 0, 0, 1)">END_STATE()
</span><span style="color: rgba(0, 128, 128, 1)">130</span> 
<span style="color: rgba(0, 128, 128, 1)">131</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(AttributeValueUnquotedState)
</span><span style="color: rgba(0, 128, 128, 1)">132</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (isTokenizerWhitespace(character)) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 当解析不带引号的属性值时遇到空白字符(这与带引号的属性值不一样,带引号的属性值可以包含空白字符),说明当前属性解析完毕,后面还有其他属性,跳转到BeforeAttributeNameState,并且移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">133</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.endAttribute(source.numberOfCharactersConsumed());
</span><span style="color: rgba(0, 128, 128, 1)">134</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_TO(BeforeAttributeNameState);
</span><span style="color: rgba(0, 128, 128, 1)">135</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">136</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">) {
</span><span style="color: rgba(0, 128, 128, 1)">137</span>         m_additionalAllowedCharacter = <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">138</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInAttributeValueState);
</span><span style="color: rgba(0, 128, 128, 1)">139</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">140</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">></span><span style="color: rgba(128, 0, 0, 1)">'</span>) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 解析过程中如果遇到'>'字符,说明整个标签也要解析完毕了,此时结束当前标签解析,并且重置分词状态为DataState,并移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">141</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.endAttribute(source.numberOfCharactersConsumed());
</span><span style="color: rgba(0, 128, 128, 1)">142</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitAndResumeInDataState(source);
</span><span style="color: rgba(0, 128, 128, 1)">143</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">144</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker) {
</span><span style="color: rgba(0, 128, 128, 1)">145</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">146</span> <span style="color: rgba(0, 0, 0, 1)">        m_token.endAttribute(source.numberOfCharactersConsumed());
</span><span style="color: rgba(0, 128, 128, 1)">147</span> <span style="color: rgba(0, 0, 0, 1)">        RECONSUME_IN(DataState);
</span><span style="color: rgba(0, 128, 128, 1)">148</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">149</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">"</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">\'</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">=</span><span style="color: rgba(128, 0, 0, 1)">'</span> || character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">`</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)">150</span> <span style="color: rgba(0, 0, 0, 1)">        parseError();
</span><span style="color: rgba(0, 128, 128, 1)">151</span>     m_token.appendToAttributeValue(character); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 将遇到的属性值字符添加到Token</span>
<span style="color: rgba(0, 128, 128, 1)">152</span>     ADVANCE_PAST_NON_NEWLINE_TO(AttributeValueUnquotedState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 跳转到当前状态,并且移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">153</span> END_STATE()</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">从代码中可以看到,当属性值带引号和不带引号时,解析的逻辑是不一样的。当属性值带有引号时,属性值里面是可以包含空白字符的。如果属性值不带引号,那么一旦碰到空白字符,说明这个属性就解析结束了,会进入下一个属性的解析当中。</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221152019235-1378834961.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">第四种 纯文本解析</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">这里的纯文本指起始标签与结束标签之间的任何纯文字,包括脚本文、CSS文本等等,如下图所示:</span></p>

<pre><span style="color: rgba(0, 128, 0, 1)"><!--</span><span style="color: rgba(0, 128, 0, 1)"> div标签中的纯文本 Hello,Word! </span><span style="color: rgba(0, 128, 0, 1)">--></span>
<span style="color: rgba(0, 0, 255, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">div </span><span style="color: rgba(255, 0, 0, 1)">class</span><span style="color: rgba(0, 0, 255, 1)">=news </span><span style="color: rgba(255, 0, 0, 1)">align</span><span style="color: rgba(0, 0, 255, 1)">=center</span><span style="color: rgba(0, 0, 255, 1)">></span>Hello,World!<span style="color: rgba(0, 0, 255, 1)"></</span><span style="color: rgba(128, 0, 0, 1)">div</span><span style="color: rgba(0, 0, 255, 1)">></span>


<span style="color: rgba(0, 128, 0, 1)"><!--</span><span style="color: rgba(0, 128, 0, 1)"> script标签中的纯文本 window.name = 'Lucy'; </span><span style="color: rgba(0, 128, 0, 1)">--></span>
<span style="color: rgba(0, 0, 255, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">script</span><span style="color: rgba(0, 0, 255, 1)">></span><span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">window.name </span><span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">=</span> <span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">'</span><span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">Lucy</span><span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">'</span><span style="background-color: rgba(245, 245, 245, 1); color: rgba(0, 0, 0, 1)">;</span><span style="color: rgba(0, 0, 255, 1)"></</span><span style="color: rgba(128, 0, 0, 1)">script</span><span style="color: rgba(0, 0, 255, 1)">></span></pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">纯文本的解析过程比较简单,就是不停的在DataState状态上跳转,缓存遇到的字符,直到遇见一个结束标签的'<'字符,相关代码如下:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 0, 1)">BEGIN_STATE(DataState)
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">&</span><span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(0, 0, 0, 1)">)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">        ADVANCE_PAST_NON_NEWLINE_TO(CharacterReferenceInDataState);
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character == <span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)"><</span><span style="color: rgba(128, 0, 0, 1)">'</span>) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果在解析文本的过程中遇到开标签,分两种情况</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (haveBufferedCharacterToken()) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 第一种,如果缓存了文本字符就直接按当前DataState返回,并不移动字符,所以下次再进入分词操作时取到的字符仍为'<'</span>
<span style="color: rgba(0, 128, 128, 1)"> 6</span>             RETURN_IN_CURRENT_STATE(<span style="color: rgba(0, 0, 255, 1)">true</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span>         ADVANCE_PAST_NON_NEWLINE_TO(TagOpenState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 第二种,如果没有缓存任何文本字符,直接进入TagOpenState状态,进入到起始标签解析过程,并且移动下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (character ==<span style="color: rgba(0, 0, 0, 1)"> kEndOfFileMarker)
</span><span style="color: rgba(0, 128, 128, 1)">10</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)"> emitEndOfFile(source);
</span><span style="color: rgba(0, 128, 128, 1)">11</span>     bufferCharacter(character); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 缓存遇到的字符</span>
<span style="color: rgba(0, 128, 128, 1)">12</span>     ADVANCE_TO(DataState); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 循环跳转到当前DataState状态,并且移动到下一个字符</span>
<span style="color: rgba(0, 128, 128, 1)">13</span> END_STATE()</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">由于流程比较简单,下面只给出解析div标签中纯文本的结果:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221153754382-1283643926.jpg" width="500" height="200" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px"><strong>创建节点与添加节点</strong></span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">1 相关类图</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221182513301-182191441.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">2 创建、添加流程</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">上面的分词循环中,每分出一个Token,就会根据Token创建对应的Node,然后将Node添加到DOM树上。(HTMLDocumentParser::pumpTokenizerLoop方法在上面分词中有介绍)。</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221180254345-900914304.png" width="1000" height="500" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>


<p><span style="font-family: "courier new", courier; font-size: 16px">上面方法中首先看HTMLTreeBuilder::constructTree,代码如下:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLTreeBuilder::constructTree(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> 
<span style="color: rgba(0, 128, 128, 1)"> 6</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (shouldProcessTokenInForeignContent(token))
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 0, 1)">        processTokenInForeignContent(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>     <span style="color: rgba(0, 0, 255, 1)">else</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span>         processToken(WTFMove(token)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTMLToken在这里被处理</span>
<span style="color: rgba(0, 128, 128, 1)">10</span> 
<span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">12</span> 
<span style="color: rgba(0, 128, 128, 1)">13</span>     m_tree.executeQueuedTasks(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTMLContructionSiteTask在这里被执行,有时候也直接在创建的过程中直接执行,然后这个方法发现队列为空就会直接返回
</span><span style="color: rgba(0, 128, 128, 1)">14</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> The tree builder might have been destroyed as an indirect result of executing the queued tasks.</span>
<span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">16</span> 
<span style="color: rgba(0, 128, 128, 1)">17</span> 
<span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 255, 1)">void</span><span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSite::executeQueuedTasks()
</span><span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">20</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_taskQueue.isEmpty()) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 队列为空,就直接返回</span>
<span style="color: rgba(0, 128, 128, 1)">21</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">22</span> 
<span style="color: rgba(0, 128, 128, 1)">23</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Copy the task queue into a local variable in case executeTask
</span><span style="color: rgba(0, 128, 128, 1)">24</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> re-enters the parser.</span>
<span style="color: rgba(0, 128, 128, 1)">25</span>     TaskQueue queue =<span style="color: rgba(0, 0, 0, 1)"> WTFMove(m_taskQueue);
</span><span style="color: rgba(0, 128, 128, 1)">26</span> 
<span style="color: rgba(0, 128, 128, 1)">27</span>     <span style="color: rgba(0, 0, 255, 1)">for</span> (auto& task : queue) <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 这里的task就是HTMLContructionSiteTask</span>
<span style="color: rgba(0, 128, 128, 1)">28</span>         executeTask(task); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 执行task
</span><span style="color: rgba(0, 128, 128, 1)">29</span> 
<span style="color: rgba(0, 128, 128, 1)">30</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> We might be detached now.</span>
<span style="color: rgba(0, 128, 128, 1)">31</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">上面代码中HTMLTreeBuilder::processToken就是处理Token生成对应Node的地方,代码如下所示:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLTreeBuilder::processToken(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span>     <span style="color: rgba(0, 0, 255, 1)">switch</span><span style="color: rgba(0, 0, 0, 1)"> (token.type()) {
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> HTMLToken::Uninitialized:
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span> <span style="color: rgba(0, 0, 0, 1)">        ASSERT_NOT_REACHED();
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::DOCTYPE: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTML中的DOCType标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>         m_shouldSkipLeadingNewline = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 0, 1)">        processDoctypeToken(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">10</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">11</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::StartTag: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 起始HTML标签</span>
<span style="color: rgba(0, 128, 128, 1)">12</span>         m_shouldSkipLeadingNewline = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(0, 0, 0, 1)">        processStartTag(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">14</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">15</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::EndTag: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 结束HTML标签</span>
<span style="color: rgba(0, 128, 128, 1)">16</span>         m_shouldSkipLeadingNewline = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 0, 1)">        processEndTag(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">18</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">19</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::Comment: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTML中的注释</span>
<span style="color: rgba(0, 128, 128, 1)">20</span>         m_shouldSkipLeadingNewline = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">21</span> <span style="color: rgba(0, 0, 0, 1)">        processComment(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">22</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">23</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::Character: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTML中的纯文本</span>
<span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">        processCharacter(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">25</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">26</span>     <span style="color: rgba(0, 0, 255, 1)">case</span> HTMLToken::EndOfFile: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTML结束标志</span>
<span style="color: rgba(0, 128, 128, 1)">27</span>         m_shouldSkipLeadingNewline = <span style="color: rgba(0, 0, 255, 1)">false</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">28</span> <span style="color: rgba(0, 0, 0, 1)">        processEndOfFile(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">29</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">31</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">可以看到上面代码对7类Token做了处理,由于处理的流程都是类似的,这里只给出3种HTML标签的创建添加过程,分别是DOCTYPE标签,html标签,title标签文本,剩下的过程都使用图表示。</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">2.1 DOCTYPE标签</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLTreeBuilder::processDoctypeToken(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     ASSERT(token.type() ==<span style="color: rgba(0, 0, 0, 1)"> HTMLToken::DOCTYPE);
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_insertionMode == InsertionMode::Initial) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> m_insertionMode的初始值就是InsertionMode::Initial</span>
<span style="color: rgba(0, 128, 128, 1)"> 6</span>         m_tree.insertDoctype(WTFMove(token)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 插入DOCTYPE标签</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>         m_insertionMode = InsertionMode::BeforeHTML; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 插入DOCTYPE标签之后,m_insertionMode设置为InsertionMode::BeforeHTML,表示下面要开是HTML标签插入</span>
<span style="color: rgba(0, 128, 128, 1)"> 8</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">10</span>    
<span style="color: rgba(0, 128, 128, 1)">11</span> <span style="color: rgba(0, 0, 0, 1)">   ...
</span><span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">13</span> 
<span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLConstructionSite::insertDoctype(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)">16</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">18</span> 
<span style="color: rgba(0, 128, 128, 1)">19</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> m_attachmentRoot就是Document对象,文档根节点
</span><span style="color: rgba(0, 128, 128, 1)">20</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> DocumentType::create方法创建出DOCTYPE节点
</span><span style="color: rgba(0, 128, 128, 1)">21</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> attachLater方法内部创建出HTMLContructionSiteTask</span>
<span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 0, 0, 1)">    attachLater(m_attachmentRoot, DocumentType::create(m_document, token.name(), publicId, systemId));
</span><span style="color: rgba(0, 128, 128, 1)">23</span> 
<span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">25</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">26</span> 
<span style="color: rgba(0, 128, 128, 1)">27</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)">28</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLConstructionSite::attachLater(ContainerNode& parent, Ref<Node>&& child, <span style="color: rgba(0, 0, 255, 1)">bool</span><span style="color: rgba(0, 0, 0, 1)"> selfClosing)
</span><span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 0, 1)">   ...
</span><span style="color: rgba(0, 128, 128, 1)">31</span> 
<span style="color: rgba(0, 128, 128, 1)">32</span>     HTMLConstructionSiteTask task(HTMLConstructionSiteTask::Insert); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 创建HTMLConstructionSiteTask</span>
<span style="color: rgba(0, 128, 128, 1)">33</span>     task.parent = &parent; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> task持有当前节点的父节点</span>
<span style="color: rgba(0, 128, 128, 1)">34</span>     task.child = WTFMove(child); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> task持有需要操作的节点</span>
<span style="color: rgba(0, 128, 128, 1)">35</span>     task.selfClosing = selfClosing; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 是否自关闭节点
</span><span style="color: rgba(0, 128, 128, 1)">36</span> 
<span style="color: rgba(0, 128, 128, 1)">37</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> Add as a sibling of the parent if we have reached the maximum depth allowed.
</span><span style="color: rgba(0, 128, 128, 1)">38</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> m_openElements就是HTMLElementStack,在这里还看不到它的作用,后面会讲。这里可以看到这个stack里面加入的对象个数是有限制的,最大不超过512个。
</span><span style="color: rgba(0, 128, 128, 1)">39</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 所以如果一个HTML标签嵌套过多的子标签,就会触发这里的操作</span>
<span style="color: rgba(0, 128, 128, 1)">40</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_openElements.stackDepth() > m_maximumDOMTreeDepth && task.parent-><span style="color: rgba(0, 0, 0, 1)">parentNode())
</span><span style="color: rgba(0, 128, 128, 1)">41</span>         task.parent = task.parent->parentNode(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 满足条件,就会将当前节点添加到爷爷节点,而不是父节点</span>
<span style="color: rgba(0, 128, 128, 1)">42</span> 
<span style="color: rgba(0, 128, 128, 1)">43</span> <span style="color: rgba(0, 0, 0, 1)">    ASSERT(task.parent);
</span><span style="color: rgba(0, 128, 128, 1)">44</span>     m_taskQueue.append(WTFMove(task)); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 将task添加到Queue当中</span>
<span style="color: rgba(0, 128, 128, 1)">45</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">从代码可以看到,这里只是创建了DOCTYPE节点,还没有真正添加。真正执行添加的操作,需要执行HTMLContructionSite::executeQueuedTasks,这个方法在一开始有列出来。下面就来看下每个Task如何被执行。</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 方法位于HTMLContructionSite.cpp</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">static</span> inline <span style="color: rgba(0, 0, 255, 1)">void</span> executeTask(HTMLConstructionSiteTask&<span style="color: rgba(0, 0, 0, 1)"> task)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     <span style="color: rgba(0, 0, 255, 1)">switch</span> (task.operation) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> HTMLConstructionSiteTask存储了自己要做的操作,构建DOM树一般都是Insert操作</span>
<span style="color: rgba(0, 128, 128, 1)"> 5</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSiteTask::Insert:
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>         executeInsertTask(task); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 这里执行insert操作</span>
<span style="color: rgba(0, 128, 128, 1)"> 7</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> All the cases below this point are only used by the adoption agency.</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSiteTask::InsertAlreadyParsedChild:
</span><span style="color: rgba(0, 128, 128, 1)">10</span> <span style="color: rgba(0, 0, 0, 1)">        executeInsertAlreadyParsedChildTask(task);
</span><span style="color: rgba(0, 128, 128, 1)">11</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">12</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSiteTask::Reparent:
</span><span style="color: rgba(0, 128, 128, 1)">13</span> <span style="color: rgba(0, 0, 0, 1)">        executeReparentTask(task);
</span><span style="color: rgba(0, 128, 128, 1)">14</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">15</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSiteTask::TakeAllChildrenAndReparent:
</span><span style="color: rgba(0, 128, 128, 1)">16</span> <span style="color: rgba(0, 0, 0, 1)">        executeTakeAllChildrenAndReparentTask(task);
</span><span style="color: rgba(0, 128, 128, 1)">17</span>         <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(0, 0, 0, 1)">    ASSERT_NOT_REACHED();
</span><span style="color: rgba(0, 128, 128, 1)">20</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">21</span> 
<span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码,方法位于HTMLContructionSite.cpp</span>
<span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(0, 0, 255, 1)">static</span> inline <span style="color: rgba(0, 0, 255, 1)">void</span> executeInsertTask(HTMLConstructionSiteTask&<span style="color: rgba(0, 0, 0, 1)"> task)
</span><span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">25</span>     ASSERT(task.operation ==<span style="color: rgba(0, 0, 0, 1)"> HTMLConstructionSiteTask::Insert);
</span><span style="color: rgba(0, 128, 128, 1)">26</span> 
<span style="color: rgba(0, 128, 128, 1)">27</span>     insert(task); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 继续调用插入方法</span>
<span style="color: rgba(0, 128, 128, 1)">28</span> 
<span style="color: rgba(0, 128, 128, 1)">29</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">30</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">31</span> 
<span style="color: rgba(0, 128, 128, 1)">32</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码,方法位于HTMLContructionSite.cpp</span>
<span style="color: rgba(0, 128, 128, 1)">33</span> <span style="color: rgba(0, 0, 255, 1)">static</span> inline <span style="color: rgba(0, 0, 255, 1)">void</span> insert(HTMLConstructionSiteTask&<span style="color: rgba(0, 0, 0, 1)"> task)
</span><span style="color: rgba(0, 128, 128, 1)">34</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">35</span> <span style="color: rgba(0, 0, 0, 1)">   ...
</span><span style="color: rgba(0, 128, 128, 1)">36</span> 
<span style="color: rgba(0, 128, 128, 1)">37</span>     ASSERT(!task.child-><span style="color: rgba(0, 0, 0, 1)">parentNode());
</span><span style="color: rgba(0, 128, 128, 1)">38</span>     <span style="color: rgba(0, 0, 255, 1)">if</span><span style="color: rgba(0, 0, 0, 1)"> (task.nextChild)
</span><span style="color: rgba(0, 128, 128, 1)">39</span>         task.parent->parserInsertBefore(*task.child, *<span style="color: rgba(0, 0, 0, 1)">task.nextChild);
</span><span style="color: rgba(0, 128, 128, 1)">40</span>     <span style="color: rgba(0, 0, 255, 1)">else</span>
<span style="color: rgba(0, 128, 128, 1)">41</span>         task.parent->parserAppendChild(*task.child); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 调用父节点方法继续插入</span>
<span style="color: rgba(0, 128, 128, 1)">42</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">43</span> 
<span style="color: rgba(0, 128, 128, 1)">44</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)">45</span> <span style="color: rgba(0, 0, 255, 1)">void</span> ContainerNode::parserAppendChild(Node&<span style="color: rgba(0, 0, 0, 1)"> newChild)
</span><span style="color: rgba(0, 128, 128, 1)">46</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">47</span> <span style="color: rgba(0, 0, 0, 1)">   ...
</span><span style="color: rgba(0, 128, 128, 1)">48</span> 
<span style="color: rgba(0, 128, 128, 1)">49</span>     executeNodeInsertionWithScriptAssertion(*<span style="color: rgba(0, 0, 255, 1)">this</span>, newChild, ChildChange::Source::Parser, ReplacedAllChildren::No, [&<span style="color: rgba(0, 0, 0, 1)">] {
</span><span style="color: rgba(0, 128, 128, 1)">50</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (&document() != &<span style="color: rgba(0, 0, 0, 1)">newChild.document())
</span><span style="color: rgba(0, 128, 128, 1)">51</span> <span style="color: rgba(0, 0, 0, 1)">            document().adoptNode(newChild);
</span><span style="color: rgba(0, 128, 128, 1)">52</span> 
<span style="color: rgba(0, 128, 128, 1)">53</span>         appendChildCommon(newChild); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 在Block回调中调用此方法继续插入</span>
<span style="color: rgba(0, 128, 128, 1)">54</span>         
<span style="color: rgba(0, 128, 128, 1)">55</span> <span style="color: rgba(0, 0, 0, 1)">        ...
</span><span style="color: rgba(0, 128, 128, 1)">56</span> <span style="color: rgba(0, 0, 0, 1)">    });
</span><span style="color: rgba(0, 128, 128, 1)">57</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">58</span> 
<span style="color: rgba(0, 128, 128, 1)">59</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 最终调用的是这个方法进行插入</span>
<span style="color: rgba(0, 128, 128, 1)">60</span> <span style="color: rgba(0, 0, 255, 1)">void</span> ContainerNode::appendChildCommon(Node&<span style="color: rgba(0, 0, 0, 1)"> child)
</span><span style="color: rgba(0, 128, 128, 1)">61</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">62</span> <span style="color: rgba(0, 0, 0, 1)">    ScriptDisallowedScope::InMainThread scriptDisallowedScope;
</span><span style="color: rgba(0, 128, 128, 1)">63</span> 
<span style="color: rgba(0, 128, 128, 1)">64</span>     child.setParentNode(<span style="color: rgba(0, 0, 255, 1)">this</span><span style="color: rgba(0, 0, 0, 1)">);
</span><span style="color: rgba(0, 128, 128, 1)">65</span> 
<span style="color: rgba(0, 128, 128, 1)">66</span>     <span style="color: rgba(0, 0, 255, 1)">if</span> (m_lastChild) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 父节点已经插入子节点,运行在这里</span>
<span style="color: rgba(0, 128, 128, 1)">67</span> <span style="color: rgba(0, 0, 0, 1)">        child.setPreviousSibling(m_lastChild);
</span><span style="color: rgba(0, 128, 128, 1)">68</span>         m_lastChild->setNextSibling(&<span style="color: rgba(0, 0, 0, 1)">child);
</span><span style="color: rgba(0, 128, 128, 1)">69</span>     } <span style="color: rgba(0, 0, 255, 1)">else</span>
<span style="color: rgba(0, 128, 128, 1)">70</span>         m_firstChild = &child; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 如果父节点是首次插入子节点,运行在这里</span>
<span style="color: rgba(0, 128, 128, 1)">71</span> 
<span style="color: rgba(0, 128, 128, 1)">72</span>     m_lastChild = &child; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 更新m_lastChild</span>
<span style="color: rgba(0, 128, 128, 1)">73</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">经过执行上面方法之后,原来只有一个根节点的DOM树变成了下面的样子:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221192501512-1741771365.jpg" width="500" height="200" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">2.2 html标签</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> processStartTag内部有很多状态处理,这里只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLTreeBuilder::processStartTag(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     ASSERT(token.type() ==<span style="color: rgba(0, 0, 0, 1)"> HTMLToken::StartTag);
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span>     <span style="color: rgba(0, 0, 255, 1)">switch</span><span style="color: rgba(0, 0, 0, 1)"> (m_insertionMode) {
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> InsertionMode::Initial:
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 0, 1)">        defaultForInitial();
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span>         ASSERT(m_insertionMode ==<span style="color: rgba(0, 0, 0, 1)"> InsertionMode::BeforeHTML);
</span><span style="color: rgba(0, 128, 128, 1)"> 9</span> <span style="color: rgba(0, 0, 0, 1)">        FALLTHROUGH;
</span><span style="color: rgba(0, 128, 128, 1)">10</span>     <span style="color: rgba(0, 0, 255, 1)">case</span><span style="color: rgba(0, 0, 0, 1)"> InsertionMode::BeforeHTML:
</span><span style="color: rgba(0, 128, 128, 1)">11</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (token.name() == htmlTag) { <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> html标签在这里处理</span>
<span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 0, 1)">            m_tree.insertHTMLHtmlStartTagBeforeHTML(WTFMove(token));
</span><span style="color: rgba(0, 128, 128, 1)">13</span>             m_insertionMode = InsertionMode::BeforeHead; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 插入完html标签,m_insertionMode = InsertionMode::BeforeHead,表明即将处理head标签</span>
<span style="color: rgba(0, 128, 128, 1)">14</span>             <span style="color: rgba(0, 0, 255, 1)">return</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">15</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">16</span> 
<span style="color: rgba(0, 128, 128, 1)">17</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">18</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">19</span> <span style="color: rgba(0, 0, 0, 1)">}
</span><span style="color: rgba(0, 128, 128, 1)">20</span> 
<span style="color: rgba(0, 128, 128, 1)">21</span> 
<span style="color: rgba(0, 128, 128, 1)">22</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)">23</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLConstructionSite::insertHTMLHtmlStartTagBeforeHTML(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)">25</span>     auto element = HTMLHtmlElement::create(m_document); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 创建html节点</span>
<span style="color: rgba(0, 128, 128, 1)">26</span> <span style="color: rgba(0, 0, 0, 1)">    setAttributes(element, token, m_parserContentPolicy);
</span><span style="color: rgba(0, 128, 128, 1)">27</span>     attachLater(m_attachmentRoot, element.copyRef()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 同样调用了attachLater方法,与DOCTYPE类似</span>
<span style="color: rgba(0, 128, 128, 1)">28</span>     m_openElements.pushHTMLHtmlElement(HTMLStackItem::create(element.copyRef(), WTFMove(token))); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 注意这里,这里向HTMLElementStack中压入了正在插入的html起始标签</span>
<span style="color: rgba(0, 128, 128, 1)">29</span> 
<span style="color: rgba(0, 128, 128, 1)">30</span>     executeQueuedTasks(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 这里在插入操作直接执行了task,外面HTMLTreeBuilder::constructTree方法调用的executeQueuedTasks方法就会直接返回</span>
<span style="color: rgba(0, 128, 128, 1)">31</span> 
<span style="color: rgba(0, 128, 128, 1)">32</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">33</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">执行上面代码之后,DOM树变成了如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221193334918-285998398.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">当插入title起始标签之后,DOM树以及HTMLElementStack m_openElements如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221194946953-1489851473.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px">3.3 title标签文本,</span></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">title标签的文本作为文本节点插入,生成文本节点的代码如下:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLConstructionSite::insertTextNode(<span style="color: rgba(0, 0, 255, 1)">const</span> String&<span style="color: rgba(0, 0, 0, 1)"> characters, WhitespaceMode whitespaceMode)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span> <span style="color: rgba(0, 0, 0, 1)">    HTMLConstructionSiteTask task(HTMLConstructionSiteTask::Insert);
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span>     task.parent = ¤tNode(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 直接取HTMLElementStack m_openElements的栈顶节点,此时节点是title</span>
<span style="color: rgba(0, 128, 128, 1)"> 6</span> 
<span style="color: rgba(0, 128, 128, 1)"> 7</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)"> 8</span> 
<span style="color: rgba(0, 128, 128, 1)"> 9</span>     unsigned currentPosition = <span style="color: rgba(128, 0, 128, 1)">0</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">10</span>     unsigned lengthLimit = shouldUseLengthLimit(*task.parent) ? Text::defaultLengthLimit : std::numeric_limits<unsigned>::max(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 限制文本节点最大包含的字符个数为65536</span>
<span style="color: rgba(0, 128, 128, 1)">11</span> 
<span style="color: rgba(0, 128, 128, 1)">12</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">13</span> 
<span style="color: rgba(0, 128, 128, 1)">14</span> 
<span style="color: rgba(0, 128, 128, 1)">15</span>     <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 可以看到如果文本过长,会将分割成多个文本节点</span>
<span style="color: rgba(0, 128, 128, 1)">16</span>     <span style="color: rgba(0, 0, 255, 1)">while</span> (currentPosition <<span style="color: rgba(0, 0, 0, 1)"> characters.length()) {
</span><span style="color: rgba(0, 128, 128, 1)">17</span>         AtomString charactersAtom =<span style="color: rgba(0, 0, 0, 1)"> m_whitespaceCache.lookup(characters, whitespaceMode);
</span><span style="color: rgba(0, 128, 128, 1)">18</span>         auto textNode = Text::createWithLengthLimit(task.parent->document(), charactersAtom.isNull() ? characters : charactersAtom.<span style="color: rgba(0, 0, 255, 1)">string</span><span style="color: rgba(0, 0, 0, 1)">(), currentPosition, lengthLimit);
</span><span style="color: rgba(0, 128, 128, 1)">19</span>         <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> If we have a whole string of unbreakable characters the above could lead to an infinite loop. Exceeding the length limit is the lesser evil.</span>
<span style="color: rgba(0, 128, 128, 1)">20</span>         <span style="color: rgba(0, 0, 255, 1)">if</span> (!textNode-><span style="color: rgba(0, 0, 0, 1)">length()) {
</span><span style="color: rgba(0, 128, 128, 1)">21</span>             String substring =<span style="color: rgba(0, 0, 0, 1)"> characters.substring(currentPosition);
</span><span style="color: rgba(0, 128, 128, 1)">22</span>             AtomString substringAtom =<span style="color: rgba(0, 0, 0, 1)"> m_whitespaceCache.lookup(substring, whitespaceMode);
</span><span style="color: rgba(0, 128, 128, 1)">23</span>             textNode = Text::create(task.parent->document(), substringAtom.isNull() ? substring : substringAtom.<span style="color: rgba(0, 0, 255, 1)">string</span>()); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 生成文本节点</span>
<span style="color: rgba(0, 128, 128, 1)">24</span> <span style="color: rgba(0, 0, 0, 1)">        }
</span><span style="color: rgba(0, 128, 128, 1)">25</span> 
<span style="color: rgba(0, 128, 128, 1)">26</span>         currentPosition += textNode->length(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 下一个文本节点包含的字符起点</span>
<span style="color: rgba(0, 128, 128, 1)">27</span>         ASSERT(currentPosition <=<span style="color: rgba(0, 0, 0, 1)"> characters.length());
</span><span style="color: rgba(0, 128, 128, 1)">28</span>         task.child =<span style="color: rgba(0, 0, 0, 1)"> WTFMove(textNode);
</span><span style="color: rgba(0, 128, 128, 1)">29</span> 
<span style="color: rgba(0, 128, 128, 1)">30</span>         executeTask(task); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 直接执行Task插入</span>
<span style="color: rgba(0, 128, 128, 1)">31</span> <span style="color: rgba(0, 0, 0, 1)">    }
</span><span style="color: rgba(0, 128, 128, 1)">32</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">从代码可以看到,如果一个节点后面跟的文本字符过多,会被分割成多个文本节点插入。下面的例子将title节点后面的文本字符个数设置成85248,使用Safari查看确实生成了2个文本节点:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221195801625-35375221.png" width="1000" height="500" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>

<p> <span style="font-family: "courier new", courier; font-size: 16px">当遇到title结束标签,代码处理如下:</span></p>

<pre><span style="color: rgba(0, 128, 128, 1)"> 1</span> <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 代码内部有很多状态处理,这里只保留关健代码</span>
<span style="color: rgba(0, 128, 128, 1)"> 2</span> <span style="color: rgba(0, 0, 255, 1)">void</span> HTMLTreeBuilder::processEndTag(AtomHTMLToken&&<span style="color: rgba(0, 0, 0, 1)"> token)
</span><span style="color: rgba(0, 128, 128, 1)"> 3</span> <span style="color: rgba(0, 0, 0, 1)">{
</span><span style="color: rgba(0, 128, 128, 1)"> 4</span>     ASSERT(token.type() ==<span style="color: rgba(0, 0, 0, 1)"> HTMLToken::EndTag);
</span><span style="color: rgba(0, 128, 128, 1)"> 5</span>     <span style="color: rgba(0, 0, 255, 1)">switch</span><span style="color: rgba(0, 0, 0, 1)"> (m_insertionMode) {
</span><span style="color: rgba(0, 128, 128, 1)"> 6</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)"> 7</span> 
<span style="color: rgba(0, 128, 128, 1)"> 8</span>         <span style="color: rgba(0, 0, 255, 1)">case</span> InsertionMode::Text: <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 由于遇到title结束标签之前插入了文本,因此此时的插入模式就是InsertionMode::Text</span>
<span style="color: rgba(0, 128, 128, 1)"> 9</span>         
<span style="color: rgba(0, 128, 128, 1)">10</span>         m_tree.openElements().pop(); <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 因为遇到了title结束标签,整个标签已经处理完毕,从HTMLElementStack栈中弹出栈顶元素title</span>
<span style="color: rgba(0, 128, 128, 1)">11</span>         m_insertionMode = m_originalInsertionMode; <span style="color: rgba(0, 128, 0, 1)">//</span><span style="color: rgba(0, 128, 0, 1)"> 恢复之前的插入模式</span>
<span style="color: rgba(0, 128, 128, 1)">12</span>         <span style="color: rgba(0, 0, 255, 1)">break</span><span style="color: rgba(0, 0, 0, 1)">;
</span><span style="color: rgba(0, 128, 128, 1)">13</span>     
<span style="color: rgba(0, 128, 128, 1)">14</span> <span style="color: rgba(0, 0, 0, 1)">    ...
</span><span style="color: rgba(0, 128, 128, 1)">15</span> }</pre>

<p><span style="font-family: "courier new", courier; font-size: 16px">每当遇到一个标签的结束标签,都会像上面一样将HTMLElementStack m_openElementsStack的栈顶元素弹出。执行上面代码之后,DOM树与HTMLElementStack如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221200657918-1637493847.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>

<p><span style="font-family: "courier new", courier; font-size: 16px"> 当整个DOM树构建完成之后,DOM树和HTMLElementStack m_openElements如下图所示:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221202711080-744439465.jpg" width="1000" height="500" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">从上图可以看到,当构建完DOM,HTMLElementStack m_openElements并没有将栈完全清空,而是保留了2个节点:html节点与body节点。这可以从Xcode的控制台输出看到:</span></p>
<p><img src="https://img.yipin100.com/p.php?img=//img.1024sou.com/blog/489427/202202/489427-20220221203249329-1121384860.png" width="1000" height="500" loading="lazy" style="display: block; margin-left: auto; margin-right: auto"></p>
<p><span style="font-family: "courier new", courier; font-size: 16px">同时可以看到,内存中的DOM树结构和文章开头画的逻辑上的DOM树结构是不一样的。逻辑上的DOM树父节点有多少子节点,就有多少指向子节点的指针,而内存中的DOM树,不管父节点有多少子节点,始终只有2个指针指向子节点:m_firstChild与m_lastChild。同时,内存中的DOM树兄弟节点之间也相互有指针引用,而逻辑上的DOM树结构是没有的。通过这样的数据结构,使得内存中的DOM结构所占用的空间大大减少,同时也能达到遍历整棵树的效果。试想一下,如果一个父节点有100个子节点,那么使用逻辑上的DOM树结构,父节点就需要100个指向子节点的指针,如果一个指针占用8字节,那么总共就要占用800字节。但是使用上面内存中DOM的表示方式,父节点只需要2个指针就可以了,总共占用16字节,内存消耗大大减少。虽然两者实现方式不一样,但是两者是等价的,都可以正确的表示HTML文档。</span></p>
						  
					  </div>
						<!--conend-->
							<div class="p-2"></div>

						<div class="arcinfo my-3 fs-7 text-center">
							
							
			<a href='/t/etagid40733-0.html' class='tagbtn' target='_blank'>WebKit</a><a href='/t/etagid2730-0.html' class='tagbtn' target='_blank'>DOM</a>							
						



						</div>
						
						<div class="p-2"></div>

						

						
					</div>
					<div class="p-2"></div>
					<!--xg-->
					<div class="lbox p-4 shadow-sm rounded-3">
						<div class="boxtitle"><h2 class="fs-4">相关</h2></div>
						
<hr>				
						
			<div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-565125.html">DOM修改 使用DOM操作CSS</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-564770.html">react dom移动算法</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-561901.html">全(十五)Jmeter 之 参数化 函数助手:__Random string(译:瑞德.丝锥):随机字符串</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-559892.html">Repeated 和 Random选项</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-550328.html">【.net 深呼吸】细说CodeDom(4):类型定义</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-550264.html">【.net 深呼吸】细说CodeDom(7):索引器</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-539052.html">虚拟dom</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-536813.html">jquery对象转DOM对象,DOM对象转jquery对象</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-535471.html">C#(1):XML DOM、System.Xml.XmlDocument</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-528475.html">'Switch' is not exported from 'react-router-dom'</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-516486.html">'Switch' is not exported from 'react-router-dom'</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div><div class="row g-0 py-2 border-bottom align-items-center">
																
								<div class="col-7 col-lg-11 border-lg-end">
										<h3 class="fs-6 mb-0 mb-lg-2"><a href="/a/1-515201.html">jQuery操作DOM元素</a></h3>
									
									<div class="ltag fs-8 d-none d-lg-block">
								 

        </div>
								</div>
							
							</div>            
            
            <!---->
                                    
           <!---->
  			
						

					</div>
					<!--xgend-->
				</div>

				<div class="col-lg-3 col-12 p-0 ps-lg-2">
					<!--box-->									
					<!--boxend-->
					<!--<div class="p-2"></div>-->

					<!--box-->
									<div class="lbox p-4 shadow-sm rounded-3">
					
									   <div class="boxtitle pb-2"><h2 class="fs-4"><a href="#">标签</a></h2></div>
										<div class="clearfix"></div>
										<ul class="m-0 p-0 fs-7 r-tag">
										</ul>
									

										
										<div class="clearfix"></div>
									</div>
					<!--box end-->

					
				</div>

			</div>
		
		
		
		</div>	

</main>
						<div class="p-2"></div>
<footer>
<div class="container-fluid p-0 bg-black">
	<div class="container p-0  fs-8">
	<p class="text-center m-0 py-2 text-white-50">一品网 <a class="text-white-50" href="https://beian.miit.gov.cn/" target="_blank">冀ICP备14022925号-6</a></p>
	</div>	
</div>
<script>
var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?6e3dd49b5f14d985cc4c6bdb9248f52b";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
</script>
</footer>
		
<script src="/skin/bootstrap.bundle.js"></script>

</body>
</html>