PHP中的截屏JavaScript
Screen-scraping JavaScript in PHP
本文关键字:JavaScript PHP 更新时间:2023-09-26
我可以使用以下脚本成功地抓取该页面上的所有项目:
$html = file_get_contents($list_url);
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html))
{
$doc->loadHTML($html);
libxml_clear_errors(); // remove errors for yucky html
$xpath = new DOMXPath($doc);
/* FIND LINK TO PRODUCT PAGE */
$products = array();
$row = $xpath->query($product_location);
/* Create an array containing products */
if ($row->length > 0)
{
foreach ($row as $location)
{
$product_urls[] = $product_url_root . $location->getAttribute('href');
}
}
else { echo "product location is wrong<br>";}
$imgs = $xpath->query($photo_location);
/* Create an array containing the image links */
if ($imgs->length > 0)
{
foreach ($imgs as $img)
{
$photo_url[] = $photo_url_root . $img->getAttribute('src');
}
}
else { echo "photo location is wrong<br>";}
$was = $xpath->query($was_price_location);
/* Create an array containing the was price */
if ($was->length > 0)
{
foreach ($was as $price)
{
$stripped = preg_replace("/[^0-9,.]/", "", $price->nodeValue);
$was_price[] = "£".$stripped;
}
}
else { echo "was price location is wrong<br>";}
$now = $xpath->query($now_price_location);
/* Create an array containing the sale price */
if ($now->length > 0)
{
foreach ($now as $price)
{
$stripped = preg_replace("/[^0-9,.]/", "", $price->nodeValue);
$stripped = number_format((float)$stripped, 2, '.', '');
$now_price[] = "£".$stripped;
}
}
else { echo "now price location is wrong<br>";}
$result = array();
/* Create an associative array containing all the above values */
foreach ($product_urls as $i => $product_url)
{
$result[] = array(
'product_url' => $product_url,
'shop_name' => $shop_name,
'photo_url' => $photo_url[$i],
'was_price' => $was_price[$i],
'now_price' => $now_price[$i]
);
}
}
然而,如果我想获得第二页,或者如果我每页查看100,则会出现问题。file_get_contents($list_url)
将始终返回具有24个值的第一页。
我认为页面更改是通过AJAX请求处理的(尽管我在源代码中找不到任何证据)。有没有办法把我在屏幕上看到的东西刮出来?
我在之前的回答中看到过PhantomJS的讨论,但考虑到我使用的是PHP,我不确定它在这里是否合适。
这是因为链接中有一个由js脚本生成的标签。关闭该网站的javascript并检查它生成的输出链接。
例如第二页http://www.hm.com/gb/subdepartment/sale?page=1
// Create DOM from URL or file
$file= file_get_html('http://stackoverflow.com/');
// Find your links
foreach($file->find('a') as $youreEement) {
echo $yourElement->href . '<br>';
}
相关文章:
- AJAX Post的奇怪Javascript/PHP行为
- 带有输入参数的Javascript/PHP中的XMLHTTPrequests
- 用Javascript/PHP创建字典
- 如何记录用户'使用Javascript/PHP的语音
- fetch data from db using javascript & php
- onClick按钮不工作javascript/php/jquery mobile
- 通过Javascript(PHP循环)添加类
- Javascript php onclick
- Interfacing html, javascript, php and mysql
- Javascript/PHP Validation
- JavaScript/PHP 刷新取代了 HTML 元素类
- Cross-domain javascript <-> php
- 设置计时器并使用javascript/PHP进行检查
- Javascript/Php聊天-输入is'不能在一个文本框中工作
- AJAX (JavaScript / PHP), FormData not sending
- Javascript/PHP中的HTML转义/编码
- 我想在Iframe Javascript PHP的帮助下,将单个ajax上传器脚本转换为多个上传器
- javascript/php中的批量打印
- Javascript/PHP,使用文本文件作为数据库,并将内容组织到阵列表中
- JavaScript/PHP函数不会发布到其他页面