获取空白抓取数据
Getting blank scrape data
一旦我从网页上抓取数据并获得空白值,就无法获得确切的数据。以下是代码:
require 'nokogiri'
require 'open-uri'
number=1
url="http://www.jabong.com/109F/"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
products=doc.css('.narrow')
products.each do |item|
product_name = item.at_css('.itm-title').text unless item.at_css('.itm-title').nil?
product_price = item.at_css('.itm-priceBox').text unless item.at_css('.itm-priceBox').nil?
puts product_name
puts product_price
puts number
number+=1
end
puts "it is the end of code"
我认为你应该使用xpath而不是CSS。 xpath非常强大,可以让你在DOM中导航更容易获取数据。
例如,要获取提供的 URL 中的所有服装名称并输出一个数组:
doc.xpath("//ul[@id = 'productsCatalog']//li//a//span[@class='itm-title']").children.map{|x| x.text.gsub("'r'n ","").strip}
=> ["Ruffle Sleeves Self Pattern Green Top", "Cap Sleeve Solid Red Top", "3/4Th Sleeve Embroidered Blue Tunic", "Ruffle Sleeves Embroidered Beige Tunic", "3/4Th Sleeve Stripe Blue Tunic", "Ruffle Sleeves Printed Beige/Pink Tunic", "Short Sleeve Embroidered Black Tunic", "Sleeve Less Solid Black Top", "Short Sleeve Solid Black Top", "Puffed Sleeve Embroidered Black Top", "Sleeve Less Printed Cream Top", "Sleeve Less Solid Yellow Top", "3/4Th Sleeve Embroidered Black Top", "Sleeve Less Solid Off White Top", "Sleeve Less Solid Fuschia Top", "Mega Sleeves Embroidered Green T-Shirt", "Sleeve Less Solid Orange Tunic", "Puffed Sleeve Embroidered Black Top", "Mega Sleeves Printed Beige Top", "Mega Sleeves Printed Beige T-Shirt", "Puffed Sleeve Stripe Red Dress", "Puffed Sleeve Stripe Blue Top", "Puffed Sleeve Printed Yellow Top", "Cap Sleeves Printed Multi Dress", "Short Sleeve Solid Mustard Yellow Tunic", "Mega Sleeves Embroidered Red Tunic", "Short Sleeve Stripe Red Dress", "Puffed Sleeve Embroidered Beige Top", "Short Sleeve Solid Black Top", "Butterfly Sleeve Check Beige Top", "Short Sleeve Solid Black tunic", "Sleeve Less Embroidered Green Top", "Mega Sleeves Printed Cream Tunic", "Mega Sleeves Embroidered Black Tunic", "Ruffle Sleeves Printed Orange Tunic", "Mega Sleeves Solid Pink Top", "Ruffle Sleeves Solid Rust Dress", "Sleeve Less Solid Navy Blue Dress", "Sleeve Less Self Pattern White Top", "Puffed Sleeve Stripe Red Top", "Mega Sleeves Solid Pink Tunic", "Roll Up Sleeve Solid Off White Tunic", "Sleeve Less Pintucks White Top", "Puffed Sleeve Embroidered Off White Top", "Puffed Sleeve Printed Pink Top", "Raglan Sleeve Solid Black Top", "Mega Sleeves Printed Beige Tunic", "Sleeve Less Solid Yellow Top", "Puffed Sleeve Self Pattern Beige Top", "Cap Sleeve Printed Yellow Dress", "Sleeve Less Solid Pink Dress"]
相关文章:
- 同源策略目的|用户数据与基本页面数据|客户端页面抓取
- 如何获取网站所有页面的链接以进行数据抓取
- 使用回发数据抓取页面 javascript Python Scrapy
- Beautifulsoup抓取数据,其中有js文本在中间
- 试图在网站上抓取谷歌地图api生成的动态数据,但正常抓取返回空白
- 从Playstation官方网站抓取奖杯数据
- 无法通过请求模块抓取数据 - Fobidden
- CSS选择器代码,用于从棘手的网站抓取/解析数据
- 抓取表数据并使用jQuery转换为无序列表
- 从谷歌地图抓取数据
- Jquery - 在 Jquery 中抓取多个选择的数据
- 从 Web 抓取数据并同时执行 api 调用的编程语言
- 抓取动态数据
- 具有多个加载数据的PhantomJS页面抓取
- 如何使用 php 抓取基于 javascript 和 ajax 的网页数据
- 数据抓取刚刚用PHP编写的页面
- HTML数据抓取(我认为)
- Kissmetrics数据抓取
- 错误:无法将数组转换为对象数据抓取脚本
- 烬数据抓取已经加载的对象从存储