我是否可以根据单元格的值使用 excel 解析某个标签的值的 html 文件

Can I parse an html file for the value of a certain tag using excel, based on the value of a cell?

本文关键字:文件 html 标签 excel 是否 单元格      更新时间:2023-09-26

如果问题不是很清楚,请道歉。这是一个更全面的版本:

我有一个包含两列的电子表格:文件路径和文章标题。文件路径包含文章(html 文件)的路径,我手动将其标题从 html 文件复制并粘贴到另一列中。我需要这样做几百次,所以我很好奇是否有办法自动化它。文章标题位于每个 html 页面上第二个<h2>的第一个<span>内。

例:

单元格 A1:传真:''2003''030714.html

单元B1:编织篮子的艺术

单元格 A2:传真:''2003''030718.html

单元B2:为猫做饭

有没有某种魔法可以帮助实现这一目标?如果我能做一个VLOOKUP,这将是小菜一碟,但不幸的是,我的初级 Web 开发和中级 excel 用户都感到困惑。

提前感谢!

选择包含要更新其文章标题的文件路径的单元格区域,然后运行此过程。它将检查每个文件是否存在,如果存在,它将创建一个文件流对象来打开和读取该文件。它将返回文章标题作为第一个 Span 标签之后的第二组 H2 标签之间的文本。不允许检查是否已到达第一个 Span 标记的末尾。希望这有帮助。

Sub UpdateArticleTitle()
Dim rngPath As Range
Dim tsObj As Object, tsFile As Object
Dim strLine As String
Dim bytSpanCount As Byte, bytH2Count As Byte
Dim strArticleTitle As String
    ' Go throught the range of selected fileds
    For Each rngPath In ActiveWindow.RangeSelection
        ' Continue if the file exists
        If Dir(rngPath.Value, vbNormal) <> "" Then
            ' Initialize the variables
            bytSpanCount = 0
            bytH2Count = 0
            strArticleTitle = ""
            ' Create a file system object
            Set tsObj = CreateObject("Scripting.FileSystemObject")
            ' Open the HTML file
            Set tsFile = tsObj.Opentextfile(rngPath.Value)
            Do Until tsFile.AtEndOfStream
                ' Read the file
                strLine = tsFile.ReadLine
                ' Search for the first occurrence of <span>
                If bytSpanCount = 0 Then
                    If InStr(1, LCase(strLine), "<span>") > 0 Then bytSpanCount = 1
                ' If <span> has been found, then search for <h2>
                ElseIf bytSpanCount = 1 Then
                    If InStr(1, LCase(strLine), "<h2>") > 0 Then
                        If bytH2Count = 0 Then
                            bytH2Count = 1
                        ' The second occurence of <h2> has been reached so extract the Article Title
                        Else
                            ' Get all lines until the closing </h2> tag is found
                            Do Until InStr(1, LCase(strLine), "</h2>") > 0
                                strLine = strLine & tsFile.ReadLine
                            Loop
                            ' Set the article title
                            strArticleTitle = Mid(strLine, InStr(1, LCase(strLine), "<h2>") + Len("<h2>"), InStr(1, LCase(strLine), "</h2>") - InStr(1, LCase(strLine), "<h2>") - Len("<h2>"))
                            ' Exit the loop
                            Exit Do
                        End If
                    End If
                End If
            Loop
            ' Close the file
            tsFile.Close
            ' Update the article title in the sheet
            rngPath.Offset(0, 1).Value = strArticleTitle
        Else
            ' Clear the article title if the file isn't found
            rngPath.Offset(0, 1).ClearContents
        End If
    Next rngPath
    Set tsObj = Nothing
    Set tsFile = Nothing
End Sub