Jquery:正在分析HTML

Jquery: Parsing HTML

本文关键字:HTML Jquery      更新时间:2023-09-26

我需要帮助解析href标记。目前,所有内容都被解析为文本,但我需要解析链接,以便稍后使用AJAX将其发送到php页面。

我的HTML看起来像:

<div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
    Total Age: 19<br>
    Total Friemd: 9<br>
    Total Family: 10<br>
    <br>
Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
<h3>Overview</h3><br>
<ul>
    <li>(The overlap provided is not good)</li>
</ul>
<h3>Structure</h3><br>
<h4>Target:</h4><br>
<ul>
    <li>Audience.</li>
    <li>Lookalike</li>
    <li>Overlap of Audience</li> 
    <a href="https://www.myPage.com/lolPagess/?id=06" target="_blank">06<font name="names" hidden="" style="display: inline;"> - Page Likes</font></a>           
</ul>

Jquery代码是这样的:

var headTags = $("div#word_content").find("*").filter(function(){
                return /^h/i.test(this.nodeName);
              });
              var output = {};
              $(headTags).each(function(){
                var currentHead = $(this);
                var nextNextElem = currentHead.next().next();
                var innerText = [];
                if(nextNextElem.prop("tagName") == "UL")
                  {
                     nextNextElem.find("li").each(function(){
                       innerText.push($(this).text());
                     });  
                  }
                output[currentHead.text()] = innerText;
              });  

目前,Jquery正在获取数据,但它只捕获文本,而不是链接。我还需要解析该链接,以便在以后的页面中使用该链接。有人能帮忙吗。

使用这个:

 nextNextElem.find("a").each(function(){
         innerText.push($(this).text()+" & href is:"+$(this).attr("href"));                   
                         }); 

var headTags = $("div#word_content").find("*").filter(function(){ 
	return /^h/i.test(this.nodeName); 
	}); 
	var output = {}; 
	$(headTags).each(function(){ 
	var currentHead = $(this); 
	var nextNextElem = currentHead.next().next(); 
	var innerText1 = []; 
	if(nextNextElem.prop("tagName") == "UL") 
	{ 
	nextNextElem.find("li").each(function(index){ 
	innerText1.push(this.firstChild.data);
	$(this).children().each(function(index){ 
	innerText1.push("<a href='"+$(this).attr("href")+"'>"+$(this)[0].innerText+"</a>"); 
    if($(this).prop('nextSibling')){
	   innerText1.push($(this).prop('nextSibling').nodeValue);
         }
	}); 
	}); 
	} 
	output[currentHead.text()] = innerText1; 
	});      console.log(output);
             $("#data").html(JSON.stringify(output));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
   <div id="word_content">
<br>Testing Time: 2015-10-29 17:57:11<br>
Total Age: 19<br>
Total Friemd: 9<br>
Total Family: 10<br>
<br>
Here are the suggestions  - Him_530037_: <a href="www.mytarget.com="_blank">93358546</a>
<h3>Overview</h3><br>
<ul> 
<li>Multiple Countries 
<a href="https://www.myTarget.com/ads/?id=603" target="_blank">603<font name="names" hidden="" style="display: none;"> - Post: "သင့္ရဲ့ Data အသံုးျပဳ မွုကို အေၾကာင္းၾကားေပးေသာ..."</font></a> (MM, SG), 
<a href="https://www.myTarget.com/ads/?id=602" target="_blank">602<font name="names" hidden="" style="display: none;"> - Post: "Mynamar pics."</font></a></li> 
</ul>
</div>
<span>OUTPUT AREA:</span>
<div id="data"></div>

您可以使用类似的东西来解析网站中的链接:

$("a").each(function(i, o) {
    console.log("Link: " + (i + 1));
    console.log("  Text is: " + $(o).text());
    console.log("  Link is: " + $(o).attr('href'));
})

结果:

www.mytarget.com=
https://www.myPage.com/lolPagess/?id=06

请参阅JsFiddle

检查a 内的每个href

$("a").each(function () {
    isUrlValid($(this).attr("href"));
});

借用了在没有验证插件的情况下使用jQuery验证url?:

  function isUrlValid(url) {
        return /^(https?|s?ftp):'/'/(((([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(%['da-f]{2})|[!'$&''(')'*'+,;=]|:)*@)?((('d|[1-9]'d|1'd'd|2[0-4]'d|25[0-5])'.('d|[1-9]'d|1'd'd|2[0-4]'d|25[0-5])'.('d|[1-9]'d|1'd'd|2[0-4]'d|25[0-5])'.('d|[1-9]'d|1'd'd|2[0-4]'d|25[0-5]))|((([a-z]|'d|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(([a-z]|'d|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])*([a-z]|'d|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])))'.)+(([a-z]|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(([a-z]|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])*([a-z]|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])))'.?)(:'d*)?)('/((([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(%['da-f]{2})|[!'$&''(')'*'+,;=]|:|@)+('/(([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(%['da-f]{2})|[!'$&''(')'*'+,;=]|:|@)*)*)?)?('?((([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(%['da-f]{2})|[!'$&''(')'*'+,;=]|:|@)|['uE000-'uF8FF]|'/|'?)*)?(#((([a-z]|'d|-|'.|_|~|['u00A0-'uD7FF'uF900-'uFDCF'uFDF0-'uFFEF])|(%['da-f]{2})|[!'$&''(')'*'+,;=]|:|@)|'/|'?)*)?$/i.test(url);
    }

此正则表达式将测试有效的url。