从页面获取非html内容

Get non html content from a page

本文关键字：html 内容获取更新时间：2023-09-26

他们有可能从页面中获取非html内容吗？我所说的非html的意思是，页面中的单词/句子不是html标签。

我可以使用获取源代码

Dim sourceString As String = New System.Net.WebClient().DownloadString("SomeWebPage.com")

但是我如何才能像这样只从网页中获取非html内容呢？

如果html结构正确，这应该会起作用。。。

Dim myhtml As String = New System.Net.WebClient().DownloadString("http:''www.test.com")
Dim plaintext As String = System.Text.RegularExpressions.Regex.Replace(myhtml, "<.*?>", "")