JS从字符串数组中提取特定字符串

JS extracting specific strings from array of strings

本文关键字：字符串提取数组 JS 更新时间：2023-09-26

我试图理解这段代码：

function extractLinks(input) {
    var html = input.join(''n');
    var regex = /<a's+([^>]+'s+)?href's*='s*('([^']*)'|"([^"]*)|([^'s>]+))[^>]*>/g;
    var match;
    while (match = regex.exec(html)) {
        var hrefValue = match[3];
        if (hrefValue == undefined) {
            var hrefValue = match[4];
        }
        if (hrefValue == undefined) {
            var hrefValue = match[5];
        }
        console.log(hrefValue);
    }
}

无论如何，这是一个简单的函数，它提取所有 href 值，但不包括这些是真实 href 的值，例如定义为 class="href" 的 href 或 A 标签之外等。这一切奇怪的是，我为此计算创建的regex是 (<a['s'S]*?>) 但是当我没有设法找到解决方案并查看原始解决方案时，我发现了这个很长的regex.用我的regex尝试了这个解决方案，它不起作用。

请问，有人解释一下，我怎么能解释这么长的regex.然后，match 返回一个数组，好吧。让我看看如果我明白这个 while 循环的想法：

而（ match = 字符串中存在正则表达式） { 某物 = 匹配[3] / 为什么是 3???/ 然后，如果未定义的东西 = 匹配[4]，如果再次未定义，则某些内容 = 匹配[5]; }

我真的很难理解这一切背后的机制，以及regex中的逻辑。

输入由一个系统生成，它将解析 10 个不同的字符串数组，但让我们选择一个，我用它来测试：下面的代码被解析为字符串数组，长度为行，每一行都是数组中的一个单独元素，这是函数的参数输入。

<!DOCTYPE html>
<html>
<head>
  <title>Hyperlinks</title>
  <link href="theme.css" rel="stylesheet" />
</head>
<body>
<ul><li><a   href="/"  id="home">Home</a></li><li><a
 class="selected" href=/courses>Courses</a>
</li><li><a href = 
'/forum' >Forum</a></li><li><a class="href"
onclick="go()" href= "#">Forum</a></li>
<li><a id="js" href =
"javascript:alert('hi yo')" class="new">click</a></li>
<li><a id='nakov' href =
http://www.nakov.com class='new'>nak</a></li></ul>
<a href="#empty"></a>
<a id="href">href='fake'<img src='http://abv.bg/i.gif' 
alt='abv'/></a><a href="#">&lt;a href='hello'&gt;</a>
<!-- This code is commented:
  <a href="#commented">commentex hyperlink</a> -->
</body>

为了了解这个正则表达式在做什么，我在此页面中放置了内联评论，您可以查看。我也在这里复制它：

<a's+            # Look for '<a' followed by whitespace
([^>]+'s+)?      # Look for anything else that isn't 'href='
                 # such as 'class=' or 'id='
href's*='s*      # locate the 'href=' with any whitespace around the '=' character
(
  '([^']*)'      # Look for '...'
|                # ...or...
  "([^"]*)       # Look for "..."
|                # ...or...
  ([^'s>]+)      # Look anything NOT '>' or spaces
)
[^>]*>           # Match anything else up to the closing '>'

这只是为了将其分解，以便您可以看到这些部分中的每一个都在做什么。至于你关于match的问题，我不完全理解你的问题。