Javascript:从字符串(包括查询字符串）中提取URL并返回数组

Javascript: extract URLs from string (inc. querystring) and return array

本文关键字：字符串 URL 提取返回数组查询包括 Javascript 更新时间：2023-09-26

我知道这已经被问过一千次了（抱歉），但是搜索SO/Google等我还没有得到决定性的答案。

基本上，我需要一个JS函数，当传递一个字符串时，它会根据正则表达式识别和提取所有URL，返回所有找到的数组。

function findUrls(searchText){
    var regex=???
    result= searchText.match(regex);
    if(result){return result;}else{return false;}
}

该函数应该能够检测并返回任何潜在的 URL。我知道这（右括号等）固有的困难/问题，所以我觉得这个过程需要：

将字符串（searchText）拆分为开始/结束的不同部分，其中没有任何东西，空格或回车的两侧，导致不同的内容块，例如进行拆分。

对于拆分产生的每个内容块，查看它是否符合任何构造的 URL 的逻辑，即它是否包含紧跟在文本后面的句点（用于限定潜在 URL 的一个常量规则）。

正则表达式应该查看句点后面是否紧跟其他文本，TLD允许的类型，目录结构和查询字符串，以及URL允许类型的文本。

我知道可能会导致误报，但是任何返回的值都将通过调用 URL 本身进行检查，因此可以忽略这一点。我发现的其他函数通常也不会返回 URL 查询字符串（如果存在）。

因此，从文本块中，该函数应该能够返回任何类型的URL，即使这意味着将 will.i.am 标识为有效URL！

例如 http://www.google.com、google.com、www.google.com、http://google.com， ftp.google.com、https://等...以及带有查询字符串的任何派生应该退回...

非常感谢，如果这存在于SO的其他地方，但我的搜索没有返回它，再次道歉。

我只使用URI.js - 使它变得容易。

var source = "Hello www.example.com,'n"
    + "http://google.com is a search engine, like http://www.bing.com'n"
    + "http://exämple.org/foo.html?baz=la#bumm is an IDN URL,'n"
    + "http://123.123.123.123/foo.html is IPv4 and "
    + "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.'n"
    + "links can also be in parens (http://example.org) "
    + "or quotes »http://example.org«.";
var result = URI.withinString(source, function(url) {
    return "<a>" + url + "</a>";
});
/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://exämple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes »<a>http://example.org</a>«.
*/

https://github.com/medialize/URI.js
http://medialize.github.io/URI.js/

你可以使用URI中的正则表达式.js：

// gruber revised expression - http://rodneyrehm.de/t/url-regex.html
var uri_pattern = /'b((?:[a-z]['w-]+:(?:'/{1,3}|[a-z0-9%])|www'd{0,3}[.]|[a-z0-9.'-]+[.][a-z]{2,4}'/)(?:[^'s()<>]+|'(([^'s()<>]+|('([^'s()<>]+')))*'))+(?:'(([^'s()<>]+|('([^'s()<>]+')))*')|[^'s`!()'[']{};:'".,<>?«»“”‘’]))/ig;

字符串#

匹配和/或字符串#替换可能会有所帮助...

跟随正则表达式从字符串（包括查询字符串）中提取 URL 并返回数组

var url = "asdasdla hakjsdh aaskjdh https://www.google.com/search?q=add+a+element+to+dom+tree&oq=add+a+element+to+dom+tree&aqs=chrome..69i57.7462j1j1&sourceid=chrome&ie=UTF-8 askndajk nakjsdn aksjdnakjsdnkjsn";
var matches = strings.match(/'bhttps?::'/'/'S+/gi) || strings.match(/'bhttps?:'/'/'S+/gi);

输出：

["https://www.google.com/search?q=format+to+6+digir&…s=chrome..69i57.5983j1j1&sourceid=chrome&ie=UTF-8"]

注意：这将处理字符串中带有单冒号的 http://和带有双冒号的 http：：//，反之亦然，因此您可以安全使用。 :)

试试这个

var expression = /[-a-zA-Z0-9@:%_'+.~#?&//=]{2,256}'.[a-z]{2,4}'b('/[-a-zA-Z0-9@:%_'+.~#?&//=]*)?/gi;

您可以使用此网站来测试正则表达式 http://gskinner.com/RegExr/

在UIPath Studio中，定义了以下内置正则表达式规则：

/(?:(?:https?|ftp|file):'/'/|www'.|ftp'.)(?:'([-a-zA-Z0-9+&@#'/%=~_|$?!:,.]*')|[-a-zA-Z0-9+&@#'/%=~_|$?!:,.])*(?:'([-a-zA-Z0-9+&@#'/%=~_|$?!:,.]*')|[a-zA-Z0-9+&@#'/%=~_|$])/