如何在使用BeautifulSoup抓取网页时提取javascript中的内容

How to extract content inside the javascript while scraping through a webpage using BeautifulSoup

本文关键字:javascript 提取 网页 抓取 BeautifulSoup      更新时间:2023-09-26

我想使用BeautifulSoup从"img class=BVImgOrSprite"中提取CustomerRatings,但无法获得。我在某个地方读到BS只解析HTML内容,而不是JS部分。我该怎么做呢。为了快速遍历,请查找我想要抓取的ModuleId 372309。谢谢

<!DOCTYPE html>
<html lang="en-US" "="">
    <head></head>
    <body id="WalmartBodyId" class="WalmartMainBody DynamicMode wmItemPage" onload="handleLocationHash();" style="">
        <iframe style="visibility:hidden;width:1px;height:1px;position:absolute;left:-999px;top:-999px;" src="http://walmart.ugc.bazaarvoice.com/1336/crossdomain.htm?format=embedded#origin=http%3A%2F%2Fwww.walmart.com"></iframe>
        <script type="text/javascript"></script>
        <script type="text/javascript" language="JavaScript"></script>
        <div class="PageContainer">
            <img class="WalmartLogo scrHid" width="145" height="62" border="0" style="float:none;margin-bottom:1px" src="http://i2.walmartimages.com/i/catalog/modules/G0040/wmlogo.gif"></img>
            <div class="prtHid"></div>
            <!--
             end header 
            -->
            <div class="MidContainer">
                <div class="ItemPage clearfix" role="main">
                    <!--

                    ModuleId 372264
                    FileName @itemPageSingleRowContai…
                    -->
                    <!--
                     Start multiRowsContainer 
                    -->
                    <div class="multiRow clearfix"></div>
                    <div class="multiRow clearfix"></div>
                    <div class="multiRow clearfix"></div>
                    <div class="multiRow clearfix" itemtype="http://schema.org/Product" itemscope="">
                        <!--

                        ModuleId 372268
                        FileName @mainInfoTwoColsContaine…
                        -->
                        <!--
                         Start: mainInfoTwoColsContainer 
                        -->
                        <script type="text/javascript"></script>
                        <div class="columnOne"></div>
                        <div class="columnTwo">
                            <!--
                             Main Additional Information 
                            -->
                            <!--
                             Start mainInfoTwoColsContainer 
                            -->
                            <!--
                             This DIV is used as the parent container of fly-o…
                            -->
                            <div id="Zoomer-big"></div>
                            <div>
                                <!--

                                ModuleId 372278
                                FileName @multiContainers
                                -->
                                <!--
                                 Start multiRowsContainer MP
                                -->
                                <div class="multiRow">
                                    <!--

                                    ModuleId 372279
                                    FileName @swMultiRowsContainer
                                    -->
                                    <form onsubmit="return ItemPage.validateSubmit(this, true);" action="/catalog/select_product.do" method="GET" name="SelectProductForm">
                                        <input type="hidden" value="34083867" name="product_id"></input>
                                        <input type="hidden" value="0" name="seller_id"></input>
                                        <!--
                                         Start multiRowsContainer MP
                                        -->
                                        <div class="multiRow clearfix"></div>
                                        <div class="multiRow clearfix">
                                            <!--

                                            ModuleId 372283
                                            FileName @swSingleRowContainer1
                                            -->
                                            <!--
                                             Start singleRowsContainer MP 
                                            -->
                                            <style type="text/css"></style>
                                            <!--

                                            ModuleId 372309
                                            FileName @CustomerRatingsLeftTop
                                            -->
                                            <script type="text/javascript"></script>
                                            <div class="CustomerRatings">
                                                <div id="BVCustomerRatings" class="BVBrowserFF">
                                                    <div class="BVRRRootElement">
                                                        <div class="BVRRRatingSummary BVRRPrimaryRatingSummary">
                                                            <div class="BVRRRatingSummaryStyle2">
                                                                <div class="BVRRRatingSummaryHeader"></div>
                                                                <div class="BVRROverallRatingContainer">
                                                                    <div class="BVRRRatingContainerStar">
                                                                        <div class="BVRRRatingEntry BVRROdd">
                                                                            <div id="BVRRRatingOverall_" class="BVRRRating BVRRRatingNormal BVRRRatingOverall">
                                                                                <div class="BVRRLabel BVRRRatingNormalLabel"></div>
                                                                                <div class="BVRRRatingNormalImage">
                                                                                    <img class="BVImgOrSprite" width="75" height="15" title="3.4 out of 5" alt="3.4 out of 5" src="http://walmart.ugc.bazaarvoice.com/1336/3_4/5/rating.png"></img>
                                                                                </div>
                                                                                <div class="BVRRRatingNormalOutOf"></div>
                                                                            </div>
                                                                        </div>
                                                                    </div>
                                                                </div>
                                                                <div id="BVRRRatingsHistogramButton_pyl3wq4v0hkzvqlgmib3ufvcl_ID" class="BVRRRatingsHistogramButton"></div>
                                                                <span class="BVRRCustomRatingSummaryCountContainer"></span>
                                                                <div class="BVRRSecondaryRatingsContainer"></div>
                                                                <div class="BVRRBuyAgainContainer"></div>
                                                                <div class="BVRRSecondaryRatingsContainer"></div>
                                                                <div class="BVRRRatingSummaryLinks"></div>
                                                            </div>
                                                        </div>
                                                        <a id="BVSubmissionLink" href="javascript://" data-bvjsref="http://walmart.ugc.bazaarvoice.com/1336/34083867/writereview…url=http%3A%2F%2Fwww.walmart.com%2Fcservice%2FwriteReview.do" data-bvcfg="574213729" style="display: none;"></a>
                                                    </div>
                                                </div>
                                            </div>
                                            <!--
                                             End: Customer Ratings Left Top 
                                            -->
                                            <!--

                                            ModuleId 372312
                                            FileName @mpProductDetailsSummary…
                                            -->
                                            <div class="prtHid"></div>
                                            <!--

                                            ModuleId 372313
                                            FileName @mpSecondaryButtons3
                                            -->
                                            <div class="prtHid"></div>
                                            <!--
                                             End singleRowsContainer 
                                            -->
                                        </div>
                                        <div class="multiRow clearfix"></div>
                                        <div class="multiRow clearfix"></div>
                                        <div class="multiRow clearfix"></div>
                                        <div class="multiRow clearfix"></div>
                                        <div class="multiRow clearfix"></div>
                                    </form>
                                    <!--
                                     End multiRowsContainer 
                                    -->
                                </div>
                                <div class="multiRow"></div>
                                <!--
                                 End multiRowsContainer 
                                -->
                            </div>
                            <!--
                             End mainInfoTwoColsContainer 
                            -->
                        </div>
                        <!--
                         End: mainInfoTwoColsContainer 
                        -->
                    </div>
                    <!--
                     End multiRowsContainer 
                    -->
                </div>
                <div id="BottomAds" class="BottomAds" style="position: relative;left:200px"></div>
                <!--
                 Start: R13.5 OSO - Sticky add to cart panel 
                -->
                <div class="executeJS" style="display: hidden;"></div>
                <!--
                 End: R13.5 OSO - Sticky add to cart panel 
                -->
                <div id="emailMeOverlay" class="wm-widget-overlay-template" style="overflow: hidden" title="Notify me when it's back in stock"></div>
            </div>
            <script language="javascript"></script>
            <div id="ROLLOVER" zindex="100000" style="display:none; text-align:left;" _pointermargin="-9px 0px 0px 10px" bubblemargin="5px 0 0 0" applyto="#ROLLOVER" pointer="true" bubbleposition="top" closebubbleonevent="mouseout" openbubbleonevent="mouseover" bubbleclassname="wm-widget-bubble-blue1px"></div>
            <script type="text/javascript"></script>
            <link type="text/css" rel="stylesheet" href="http://i2.walmartimages.com/css/quicklook_btn.css"></link>
            <script type="text/javascript"></script>
            <script type="text/javascript"></script>
            <script type="text/javascript"></script>
            <script type="text/javascript"></script>
            <script type="text/javascript" src="http://www.walmart.com/c/midas/loader.js"></script>
            <script type="text/javascript" src="//www.walmart.com/c/midas/hl.js"></script>
            <style type="text/css"></style>
            <script type="text/javascript" src="http://www.google.com/adsense/search/ads.js"></script>
            <script type="text/javascript" src="//www.google.com/ads/search/module/ads/3.0/beb93033d95ef74abd29c04a5d16f4dbee1ccd0a/n/search.js"></script>
            <script type="text/javascript" src="//www.walmart.com/c/midas/m_ip.js"></script>
            <style type="text/css"></style>
            <div id="ipAdsenseContainer"></div>
            <script type="text/javascript"></script>
            <!--
             start footer 
            -->
            <div class="prtHid"></div>
        </div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 0px; position: absolute; display: none;"></div>
        <div class="prtHid"></div>
        <!--
         Start Pinterest call 
        -->
        <script type="text/javascript"></script>
        <!--
         End Pinterest call 
        -->
        <!--

        -->
        <script type="text/javascript"></script>
        <script type="text/javascript"></script>
        <script src="/__ssobj/static/ss-walmart.min.31246.js"></script>
        <script></script>
        <script></script>
        <script></script>
        <!--
         MERGED 
        -->
        <script id="ss-descriptors" type="text/javascript"></script>
        <!--
        SSSV
        -->
        <script></script>
        <script></script>
        <script></script>
        <script></script>
        <ul class="ui-autocomplete ui-menu ui-widget ui-corner-all ui-widget-autocomplete" role="listbox" aria-activedescendant="ui-active-menuitem" style="z-index: 12; top: 0px; left: 0px; display: none;"></ul>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 5px 0px 0px; position: absolute; display: none;"></div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 5px 0px 0px; position: absolute; display: none;"></div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 5px 0px 0px; position: absolute; display: none;"></div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 0px; position: absolute; display: none;"></div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 0px; position: absolute; display: none;"></div>
        <div class="wm-widget-bubble-blue1px wm-widget-bubble-pt-show-bottom" style="margin: 5px 0px 0px; position: absolute; display: none;"></div>
        <div id="stickyAddtoCart"></div>
        <img src="http://beacon.walmart.com:80/p13n/site/irs/beacon.gif?visito…ince_response=5831&time_since_init=0&timestamp=1397377059098" style="position: absolute; width: 0px; height: 0px; top: -1234px; left: -1234px;"></img>
        <img src="http://beacon.walmart.com:80/p13n/site/irs/beacon.gif?visito…ce_init=219&time_since_response=6050&timestamp=1397377059317" style="position: absolute; width: 0px; height: 0px; top: -1234px; left: -1234px;"></img>
        <img src="http://beacon.walmart.com:80/p13n/site/irs/beacon.gif?visito…ce_init=333&time_since_response=6164&timestamp=1397377059431" style="position: absolute; width: 0px; height: 0px; top: -1234px; left: -1234px;"></img>
        <div class="ui-dialog ui-widget ui-widget-content ui-corner-all" style="display: none; z-index: 50300; outline: 0px none;" tabindex="-1" role="dialog" aria-labelledby="ui-dialog-title-rmvideoPanel"></div>
        <iframe id="google_osd_static_frame_9801270172171" name="google_osd_static_frame" style="display: none; width: 0px; height: 0px;"></iframe>
    </body>
</html>
<!--
 end footer 
-->

你可能应该看看Webkit或Spynner(从这个SO问题中得到的)-我对PhantomJS在过去抓取JS生成的内容有过积极的体验,但它不完全是Python。