使用自动脚本从网站检索 zip 文件

Retrieve a zip file from a website in an automated script

本文关键字：网站检索 zip 文件脚本更新时间：2023-09-26

供参考：http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447

我正在尝试从没有专用网址的 zip 文件中检索 zip 文件。我在Python Mechanize和美丽的汤上做得很好，但是当我接近这个过程结束时遇到了一个问题。

在表格中选择我想要的选项（通过 mechanize/bs4）后，我尝试让我的浏览器"提交"表单并检索我的 zip 文件。但是，"提交"按钮只是一个带有

onclick="javascript:submit()"

叫。当您在浏览器中手动点击该按钮时，它会将您重定向到通用的".....Testdwn.cfm？请求超时=2000"页面，无论您在单击 gif 图像之前选择哪个选项（也会下载您的 zip 文件）。所以我的问题就是没有专用的 zip 网址。

因此，从我过去几天在网上阅读的内容来看，Python/Mechanize 无法以任何身份阅读 JavaScript，所以我似乎在这条道路上是 SOL。如果机械化可以以某种方式点击那个按钮，一切都会很好。

我应该采用什么方法来提取此数据？我读过关于硒的文章，但我想知道什么选项绝对最简单和最好来提取这些数据，基于 javascipt 或基于 python-selenium，或者其他什么？如果可以管理，则首选Python。

提前感谢！

好的，

我用硒找到了答案，

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome(executable_path=r"C:'Users'xx'xx'xx'xx'xx'xx'chromedriver.exe")
driver.get("http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447")
assert "Download Menu" in driver.title
form = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[2]/select/option[37]")
submit = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[1]/font/img")
ActionChains(driver).move_to_element(form).click(form).perform()
ActionChains(driver).move_to_element(submit).click(submit).perform()

我导航到该页面并使用Selenium的find_element_by_path及其操作链来选择并单击我想要的所有内容。