如何遍历HTML

How to traverse HTML

本文关键字:HTML 遍历 何遍历      更新时间:2023-09-26

我有一个HTML代码变量:

let htmlDocument = '<div id="buildings-wrapper"> '
    <div id="building-info"> '
    <h2><span class="field-content">Britney Spears' House</span></h2> '
    <div class="building-field"> '
    <div class="field-content">9999 Hollywood Blvd</div> '
    </div> '
    <div class="building-field"> '
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> '
    </div> '
    <div class="building-field"> '
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> '
    </div> '
    </div> '
    <div id="building-image"> '
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> '
        </div> '
        </div>';

我想遍历变量并将这部分HTML存储在一个单独的变量中:

<div class="field-content">9999 Hollywood Blvd</div>

这是我目前为止写的:

public traverseHTML(htmlDocument: any): any {
    let htmlBlock: any;
    let divs: any = htmlDocument.getElementsByTagName('div');
    for (var i = 0; i < divs.length; i++) {
        if (divs[i].getAttribute("id") == "field-content") {
            htmlBlock = divs[i];
        }
    }
    return htmlBlock;
}

我确信我的函数有各种各样的问题,但我不能得到他们,因为我甚至不能超过第二行。我得到一个错误,说htmlDocument.getElementsByTagName不是一个函数。如何通过div迭代HTML ?

请注意,由于项目规范,我不能使用JQuery。

编辑:

我得到document is not defined时,我尝试document.createElement('div')和DOMParser没有定义,当我尝试创建一个DOMParser。我设置的类不正确吗?这是整个类的代码:

import parse5 = require('parse5');
import {ASTNode} from 'parse5';

export default class DSController {
//private parser: DOMParser;
constructor() {
    //this.parser = new DOMParser();
}
public traverseHTML(htmlDocument: any): any {
    let parser = new DOMParser();
    let parsed: any = parser.parseFromString(htmlDocument, "text/html");
    let selectParsed: any = parsed.querySelectorAll('field-content')[1];
    console.log(selectParsed);
    return selectParsed;
   /* let element = document.createElement("div");
    element.innerHTML = htmlDocument;
    console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>
    */
}


public parseHTML(): any {
    //let document: parse5.ASTNode;
    return;
}
}

你可以创建一个元素,然后以html的形式插入这个字符串。
然后,您可以查询此元素以查找所需内容:

let htmlDocument = '<div id="buildings-wrapper"> '
    <div id="building-info"> '
    <h2><span class="field-content">Britney Spears House</span></h2> '
    <div class="building-field"> '
    <div class="field-content">9999 Hollywood Blvd</div> '
    </div> '
    <div class="building-field"> '
    <div class="field-content">Building Hours: Mon. 07:00-23:00 Tue.-Fri. 06:30-22:00, Sat. 07:30-18:00, Sun. 12:00-18:00 Holidays - Closed</div> '
    </div> '
    <div class="building-field"> '
    <div class="field-content"><a href="http://www.britneyspears.com">Locate on the stars map</a></div> '
    </div> '
    </div> '
    <div id="building-image"> '
    <div class="field-content"><img src="../../../../ssc.adm.britneyspears.com/classroomservices/image/viewimage?userEvent=ShowBuildingImage&amp;buildingID=britneyspears" alt="Image of BritneySpears"></div> '
        </div> '
        </div>';
let element = document.createElement("div");
element.innerHTML = htmlDocument;
console.log(element.querySelectorAll(".field-content")[1]); // <div class="field-content">9999 Hollywood Blvd</div>

(code in playground)

您也可以使用DOMParser:

new DOMParser().parseFromString(htmlDocument, "text/html")
  .querySelectorAll('.field_content)[1]