PhP使用嵌入式JavaScript从HTML中提取striing

PhP extracting striing from HTML with embedded JavaScript

本文关键字:HTML 提取 striing JavaScript 嵌入式 PhP      更新时间:2023-09-26

我正试图从网页中提取此数据(MARK PATER(,我希望它是字符串,而不是超链接。这是我的代码:

当我回显时,这是我在浏览器上得到的结果:马克·帕特尔。我无法将此值提取为字符串。。。这是一个HYPERLINK。当我打开源代码时,我会得到这个:

<a class="filter_list" href="" onclick="return fillFilterForm(document.formFilter1, 'nation_party_name', 'MARK PATGHL');"><font face="Verdana" size="1" color="BLACK">MARK PATERÂ Â </font></a>string(0) ""

以下是echo$html:的部分源代码

<tr >
<td align="justify" width="5%" nowrap><font face="Verdana" size="1">&nbsp;&nbsp;&nbsp;
*
<a class="list_2" href="details.asp
?doc_id=2&index=0&file_num=07">View</a>&nbsp;&nbsp;</font>
</td>
<td width="20%" align="justify" ><a class="filter_list" href="" onClick="return fillFilterForm(document.formFilter1, 'party_name', 'NEW YORK GORDI’);”><font face="Verdana" size="1" color="BLACK">NEW YORK GORDI&nbsp;&nbsp;</font></td>
<td width="15%" align="justify" nowrap><a class="filter_list" href="" onClick="return fillFilterForm(document.formFilter1, ’Name’, ‘MARK PATER );”><font face="Verdana" size="1" color="BLACK">MARK PATER&nbsp;&nbsp;</font></td>

代码:

$html = file_get_html($link);
//echo htmlspecialchars ($html);
// a new dom object
$dom = new domDocument;  
// load the html into the object
$dom->loadHTML($html); 
$tables = $dom->getElementsByTagName('td');
echo get_inner_html($tables->item(26));

function get_inner_html( $node ) 
{
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child)
{
    $innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
enter code here

尝试使用正则表达式

尝试构建正则表达式以从HTML中提取字符串。

使用SimpleXML/DOM在HTML中循环有时是一个非常令人头疼的过程。

您案例的样本

$html = "<tr >
<td align='"justify'" width='"5%'" nowrap><font face='"Verdana'" size='"1'">&nbsp;&nbsp;&nbsp;
*
<a class='"list_2'" href='"details.asp?doc_id=2&index=0&file_num=07'">View</a>&nbsp;&nbsp;</font>
</td>
<td width='"20%'" align='"justify'" ><a class='"filter_list'" href='"'" onClick='"return fillFilterForm(document.formFilter1, 'party_name', 'NEW YORK GORDI';);'"><font face='"Verdana'" size='"1'" color='"BLACK'">NEW YORK GORDI&nbsp;&nbsp;</font></td>
<td width='"15%'" align='"justify'" nowrap><a class='"filter_list'" href='"'" onClick='"return fillFilterForm(document.formFilter1, 'Name', 'MARK PATER';);'"><font face='"Verdana'" size='"1'" color='"BLACK'">MARK PATER&nbsp;&nbsp;</font></td>";
preg_match_all('/(?:<td.+><a.+><font.+>)(['w's]+)(?:(&nbsp;)+<'/font><'/td>)/', $html, $filtered);
print_r( $filtered[1] );
//Output: Array ( [0] => NEW YORK GORDI [1] => MARK PATER )