获取网页源代码后，网页运行Javascript - Java

fetch webpage source after the webpage run Javascript - Java

本文关键字：网页 Javascript Java 运行源代码获取更新时间：2023-09-26

我试图获取网页源，我发现了一些问题。我想获取源上的Url，但是当我取下来时，Url变成了Javascript方法。

浏览器源代码查看器:

<a class="title" href="/hkstp_web/en/Directory/Acquest%20Stem%20Cell%20Research%20Company%20Limited/">aaa Company Limited</a>

但是当我把它拿下来的时候，它变成了这样:

<a href="javascript:void(0)"><span>...</span></a>

下面是我的代码:

public class DownloadPage {
    public static void main(String[] args) {
        URL url;
        try {
            // get URL content
            url = new URL("https://www.hkstp.org/hkstp_web/en/directory/");
            URLConnection conn = url.openConnection();
            // open the stream and put it into BufferedReader
            BufferedReader br = new BufferedReader(
                               new InputStreamReader(conn.getInputStream()));
            String inputLine;
            //save to this filename
            String fileName = "C:''Users''USER''Documents''server''test.txt";
            File file = new File(fileName);
            if (!file.exists()) {
                file.createNewFile();
            }
            //use FileWriter to write file
            FileWriter fw = new FileWriter(file.getAbsoluteFile());
            BufferedWriter bw = new BufferedWriter(fw);
            while ((inputLine = br.readLine()) != null) {
                bw.write(inputLine + "'n");
            }
            bw.close();
            br.close();
            System.out.println("Done");
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

我如何得到正确的链接?由于

嗯，正如你(希望)知道的那样，自从Timothy爵士发明了网页以来，网页已经发展了很多。这意味着您所看到的(可以与之交互的)不仅仅是来自服务器的HTML(和CSS)代码的结果，而且通常是由浏览器使用JavaScript进行大量"post"处理的。

所以如果你想获得那个链接，你必须做同样的后处理，即使用HtmlUnit框架，或者-如果你不坚持Java - PhantomJS。