使用 jsoup 解析具有多列的表数据时,返回空字符串而不是第二列中的数据

While parsing table data with multiple columns using jsoup returning empty string instead of data in the second column?

本文关键字:数据 字符串 二列 jsoup 使用 返回      更新时间:2023-09-26

通过jsoup,我可以检索注册日期数据,但无法获取第二列值31/12/2009。相反,它返回空字符串。我尝试了所有可能的方法。

所有表行都已正确提取。

<tr>
<td style="width: 30%; font-weight: bold; background-color: #d7e8ff; ">
  <span style="font-size: 10pt"> Registration Date</span></td>
<td style="margin-bottom: 1px; padding-bottom: 1px; background
   color:lemonchiffon;">
 <span id="ContentPlaceHolder1_636042629082042500">الخميس 31/12/2009</span>
 <span id="ContentPlaceHolder1_iInstalldate"></span></td>
</tr>

这是我正在使用的java代码:

Element table = doc.select("TABLE").get(2);
Elements table1=table.select("table[border=1]"); // to select particular  
      //table
Elements rows=table1.select("tr");
for (int i = 0; i < rows.size(); i++) {
      Element row = rows.get(i);
      Elements cols=row.select("td");
      for (Element col : cols) {
         if (!(col.text().equals("")))                                 
            Log.e("test", col.text()+cols.size());
       }
}

这是输出,但只有第一列中的值,而不是第二列中的值:

注册日期 ,账户类型 ,当前账户状态 ,总账户信用 ,已使用信用 ,有效信用 ,信用到期日

现在这是此页表的示例源,其中包含以下行

<tr>
  <td style="width: 30%; font-weight: bold; background-color: #d7e8ff; ">
  <span style="font-size: 10pt">Registration Date</span></td>
 <td style="margin-bottom: 1px; padding-bottom: 1px; background-color:  
     lemonchiffon;">
  <span id="ContentPlaceHolder1_636045303384071212">الخميس 31/12/2009</span>
  <span id="ContentPlaceHolder1_iInstalldate"></span></td>
 </tr>
 <tr>
 <td style="width: 30%; font-weight: bold; background-color: #d7e8ff; 
     ">Account Type</td>
 <td style="margin-bottom: 1px; padding-bottom: 1px; background-color: 
       lemonchiffon;">
<span id="ContentPlaceHolder1_636045303384071212">1 Mbps---فضي</span>
<span id="ContentPlaceHolder1_iAcctType"></span></td>
</tr>

这是我用来访问网页的代码

loginForm=Jsoup.connect("http://adsl.yemen.net.ye/en/user_main.aspx")
.data("ctl00$ContentPlaceHolder1$loginframe$Password", "MAMAM")
.data("ctl00$ContentPlaceHolder1$loginframe$LoginButton", "Sign In")
.data("__LASTFOCUS", "")
.data("__EVENTTARGET", "")
.data("__EVENTARGUMENT","")
.userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML,   
 like Gecko) Chrome/51.0.2704.103 Safari/537.36")
.cookies(loginForm.cookies())
.followRedirects(false)
.method(Connection.Method.POST)
.execute();

你可以这样做:

package com.github.davidepastore.stackoverflow38415236;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
/**
 * Stackoverflow 38415236 answer.
 *
 */
public class App {
    public static void main(String[] args) {
        String html = "<table><tr>'r'n"
            + "  <td style='"width: 30%; font-weight: bold; background-color: #d7e8ff; '">'r'n"
            + "  <span style='"font-size: 10pt'">Registration Date</span></td>'r'n"
            + "'r'n"
            + " <td style='"margin-bottom: 1px; padding-bottom: 1px; background-color:  'r'n"
            + "     lemonchiffon;'">'r'n"
            + "'r'n"
            + "  <span id='"ContentPlaceHolder1_636045303384071212'">الخميس 31/12/2009</span>'r'n"
            + "  <span id='"ContentPlaceHolder1_iInstalldate'"></span></td>'r'n"
            + " </tr>'r'n"
            + " <tr>'r'n"
            + " <td style='"width: 30%; font-weight: bold; background-color: #d7e8ff; 'r'n"
            + "     '">Account Type</td>'r'n"
            + " <td style='"margin-bottom: 1px; padding-bottom: 1px; background-color: 'r'n"
            + "       lemonchiffon;'">'r'n"
            + "'r'n"
            + "<span id='"ContentPlaceHolder1_636045303384071212'">1 Mbps---فضي</span>'r'n"
            + "<span id='"ContentPlaceHolder1_iAcctType'"></span></td>'r'n"
            + "</tr></table>";
        Document doc = Jsoup.parse(html);
        Element table = doc.select("table").first();
        Elements trs = table.select("tr");
        for (Element tr : trs) {
            Elements td = tr.select("td");
            Element firstTd = td.first();
            Element secondTd = td.get(1);
            System.out.println(firstTd.text() + " --- " + secondTd.text());
        }
    }
}

输出为:

Registration Date --- الخميس 31/12/2009
Account Type --- 1 Mbps---فضي

使用选择器直接定位元素,然后提取其文本。

    Document doc = Jsoup.parse(htmlContent);
    Elements rows = doc.select("table[border=1] tr");
    for (Element row : rows) {
        String key = row.select("td:first-child").text();
        String value = row.select("td:nth-child(2) span:first-child").text();
        System.out.println("key=" + key + " value=" + value);
    }

输出

key=Registration Date value=31/12/2009
key=Account Type value=1 Mbps---