R - 如何在网站上运行javascript按钮以显示所有用于抓取的值

R - How can I run javascript button on website to display all values for scraping

本文关键字：显示用于抓取按钮 javascript 网站运行更新时间：2023-09-26

除非我按"全部列出"按钮，否则我正在尝试在显示产品 1-30 的网站上抓取一些数据。这个按钮是JavaScript，当我运行它时不会更改URL。我目前正在使用 R 中的 rvest 包来执行此操作。

  page <- paste("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200350")
  page <- read_html(page)

我看过其他一些帖子，它提到使用 RSelenium 包，但我更喜欢是否有另一种方法。

编辑 - 由于杰克的帮助，我现在得到了下面的代码，但遇到了两个问题。

1）即使我们按下"全部列出"按钮，某些页面也不会显示所有产品。它将显示前 200 个，然后您必须浏览接下来的 200 个页面等，例如在此页面上 https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200275

2）在我的循环中，如果代码无法检测到"ListAll"元素（即，如果产品少于30个，则代码会抛出错误。有人知道如何在循环中避免这种情况吗？伪（如果不存在 ListAll 元素，则跳过 ListAll 并继续运行）

checkForServer()
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
while(i < 67){
  # Navigate to page
  mybrowser$navigate(paste("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200275"))
  # Show all products
  ListAll <- mybrowser$findElement("class", "listAllText")
  ListAll$clickElement()
  # Navigate to next page (only goes to second page, when run again, it goes back to the first page as it is the first "unselected" class it detects.
  ListAll <- mybrowser$findElement("class", "unselected")
  ListAll$clickElement()

  # Take it slow
  Sys.sleep(7)
  outhtml <- mybrowser$findElement(using = 'xpath', "//*")
  out<-outhtml$getElementAttribute("outerHTML")[[1]]
  # Parse with RCurl
  doc<-htmlParse(out, encoding = "UTF-8")
  doc
  # Scrape product info
  productRaw <- getNodeSet(doc, "//*[@class = 'productTitle']")
  products <- sapply(productRaw, xmlValue)
  priceRaw <- getNodeSet(doc, "//*[@class = 'divProductPrice BodyText Style3']")
  price <- sapply(priceRaw, xmlValue)
  pricePerUnitRaw <- getNodeSet(doc, "//*[@class = 'divProductPricePerUnit BodyText Style2']")
  pricePerUnit <- sapply(pricePerUnitRaw, xmlValue)
  barcodeRaw <- getNodeSet(doc, "//*[@class = 'productImage']//a[@href]//img[@src]")
  barcode <- sapply(barcodeRaw, xmlValue)
  barcode <- sapply(barcodeRaw,function(x) xmlAttrs(x)["src"])
  final <- rbind(final, data.frame(Products = products, 
                                   Price = price, UnitPrice = pricePerUnit, Barcode = barcode))
  i=i+1
}

我知道

你更喜欢另一种方式，但我想抛出 RSelenium 解决方案，以便你可以看到它。

library(RSelenium)
library(XML)
# Start Selenium server
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
# Navigate to page
remDr$navigate("https://shop.supervalu.ie/shopping/shopping/shop.aspx?catid=150200350")
# Snag the html
ListAll <- remDr$findElement("class", "listAllText")
ListAll$clickElement()
# Take it slow
Sys.sleep(.50)
outhtml <- remDr$findElement(using = 'xpath', "//*")
out<-outhtml$getElementAttribute("outerHTML")[[1]]
# Parse with RCurl
doc<-htmlParse(out, encoding = "UTF-8")
# just scraping a bit for example
gg <- getNodeSet(doc, "//*[@class = 'productTitle']")
sapply(gg, xmlValue)

HRBRMSTR可能有一些你可以使用的Ajax魔法。在这里查看他对不同问题的回答