如何解析点击按钮后显示额外文本的网站中的文本,但该文本不在基本html中

How to parse text in websites that display additional text after clicking a button, but that text is not in the base html

本文关键字:文本 html 何解析 网站 显示 按钮      更新时间:2023-09-26

我正在尝试抓取此页面上的所有博客链接:http://hypem.com/track/26ed4/Skizzy+Mars+-+Way+I+实时

单击更多可显示链接。但是,在html源中只有一个链接可见。我正在使用BeautifulSoup,如何获取其他链接?

您可以使用requests+BeautifulSoup方法。您只需要在单击More blogs按钮并向下滚动页面时模拟发送到服务器的底层请求。

下面的代码打印了http://hypem.com/blogs页码:

from bs4 import BeautifulSoup
import requests

def extract_blogs(content):
    first_page = BeautifulSoup(content)
    for link in first_page.select('div.directory-blog img'):
        print link.get('title')
# extract blogs from the main page
response = requests.get('http://hypem.com/blogs')
extract_blogs(response.content)
# paginate over rest results until there would be an empty response
page = 2
url = 'http://hypem.com/inc/serve_sites.php?featured=true&page={page}'
while True:
    response = requests.get(url.format(page=page))
    if not response.content.strip():
        break
    extract_blogs(response.content)
    page += 1

打印:

Heart and Soul
Avant-Avant
Different Kitchen
Ladywood 
Orange Peel
Phonographe Corp
...
Stadiums & Shrines
Caipirinha Lounge
Gorilla Vs. Bear
ISO50 Blog
Fluxblog
Music ( for robots)

希望这至少能让你对如何在这种情况下抓取网页内容有一个基本的想法。