从源代码中没有的网页中提取数据

Extracting Data From Webpage That Isn't In the Source Code

本文关键字：网页提取数据源代码更新时间：2023-09-26

我想在Excel中编写一个宏，从下面的网页中提取数据：

http://www.richmond.com/data-center/salaries-virginia-state-employees-2013/?appSession=673718284851033&RecordID=101177&PageID=3&PrevPageID=2&cpipage=1&CPIsortType=&CPIorderBy=&cbCurrentRecordPosition=1

我遇到的问题是员工信息数据不在页面源代码中，因此当我使用以下代码（其中 NextPage 设置为上述 URL）时，responseText不包括我正在寻找的数据。

With CreateObject("msxml2.xmlhttp")
    .Open "GET", NextPage, False
    .Send
    htm.body.innerHtml = .responseText
End With

我很可能错了，但我相信数据包含在页面的 DOM 中。有人可以帮助我了解如何使用 VBScript 下载显示的此页面的内容（即在应用 javascript 修改之后）吗？

使用 InternetExplorer.Application COM 对象应该可以访问实际的 DOM 树：

url = "http://www.richmond.com/..."
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate url
Do
  WScript.Sleep 100
Until ie.ReadyState = 4
Set elem = ie.Document.getElementById("...")

如果这不起作用，您可能不得不求助于PhantomJS之类的东西。