模式URL检测的问题.在JavaScript中使用正则表达式等

issue with URL detection for patterns b.tech,m.tech etc using Regular Expression in JavaScript

本文关键字:正则表达式 JavaScript URL 检测 问题 模式      更新时间:2023-09-26

我有一个正则表达式来从字符串中检测url。

正则表达式是:

var urlRegex = /(https?':'/'/|'s)[a-z0-9-]+('.[a-z0-9-]+)*('.[a-z]{2,4})('/+[a-z0-9_.':';-]*)*('?['&'%'|'+a-z0-9_=,'.':';-]*)?(['&'%'|'+&a-z0-9_=,':';'.-]*)(['!'#'/'&'%'|'+a-z0-9_=,':';'.-]*)}*/i;
if (urlRegex.test(text)) {
   textCrawler(text);
}

这个工作正常,但问题是它也检测b.tech,m。

我调用一个文本爬虫函数,在字符串中预览URL。问题是,如果字符串包含b.tech资格等,文本爬虫被调用。

搜索了几个链接,但似乎没有一个完美的正则表达式来检测字符串中的url。

看看这个:寻找完美的URL验证正则表达式。

这似乎是目前为止最准确的:

var re_weburl = new RegExp(
  "^" +
    // protocol identifier
    "(?:(?:https?|ftp)://)" +
    // user:pass authentication
    "(?:''S+(?::''S*)?@)?" +
    "(?:" +
      // IP address exclusion
      // private & local networks
      "(?!(?:10|127)(?:''.''d{1,3}){3})" +
      "(?!(?:169''.254|192''.168)(?:''.''d{1,3}){2})" +
      "(?!172''.(?:1[6-9]|2''d|3[0-1])(?:''.''d{1,3}){2})" +
      // IP address dotted notation octets
      // excludes loopback network 0.0.0.0
      // excludes reserved space >= 224.0.0.0
      // excludes network & broacast addresses
      // (first & last IP address of each class)
      "(?:[1-9]''d?|1''d''d|2[01]''d|22[0-3])" +
      "(?:''.(?:1?''d{1,2}|2[0-4]''d|25[0-5])){2}" +
      "(?:''.(?:[1-9]''d?|1''d''d|2[0-4]''d|25[0-4]))" +
    "|" +
      // host name
      "(?:(?:[a-z''u00a1-''uffff0-9]+-?)*[a-z''u00a1-''uffff0-9]+)" +
      // domain name
      "(?:''.(?:[a-z''u00a1-''uffff0-9]+-?)*[a-z''u00a1-''uffff0-9]+)*" +
      // TLD identifier
      "(?:''.(?:[a-z''u00a1-''uffff]{2,}))" +
    ")" +
    // port number
    "(?::''d{2,5})?" +
    // resource path
    "(?:/[^''s]*)?" +
  "$", "i"
);

来源:dperini的gist