正则表达式模式与各种端点匹配

Regex pattern matching with various end points

本文关键字:端点 模式 正则表达式      更新时间:2023-09-26

我想通过javascript从以下字符串列表中提取具有特定模式的子字符串。

但是我在设置正则表达式模式时遇到问题。

输入字符串列表

  1. search?w=tot&DA=YZR&t__nil_searchbox=btn&sug=&o=& q=%EB%B9%84%EC%BD%98

  2. 搜索?q=%EB%B9%84%EC%BD%98 &go=%EC%A0...4%EB%B9%84%EC%BD%98&sc=8-2&sp=-1&sk=&cvid=f05407c5bcb9496990d2874135aee8e9

  3. where=nexearch& query=%EB%B9%84%EC%BD%98 &sm=top_hty&fbm=0&ie=utf8

预期的模式匹配结果

%EB%B9%84%EC%BD%98上述情况。

正则表达式

/(query|q)=.* + 此处的其他正则表达式 + /

它的终点将是$first appeared &

问题

我应该为额外的正则表达式写什么?

你可以在这里测试它。谢谢。

将第一个捕获组转换为非捕获组,然后添加否定字符类而不是.*

'b(?:query|q)=([^&'n]*)

演示

> var s = "where=nexearch& query=%EB%B9%84%EC%BD%98&sm=top_hty&fbm=0&ie=utf8"
undefined
> var pat = /'b(?:query|q)=([^&'n]*)/;
> pat.exec(s)[1]
'%EB%B9%84%EC%BD%98'

我个人建议另一种方法,使用更具程序性的函数来匹配所需的参数值,而不是"简单"的正则表达式。虽然一开始可能看起来更复杂,但如果将来需要查找不同的或额外的参数值,它确实允许轻松扩展。

可是:

/* haystack:
     String, the string in which you're looking for the
     parameter-values,
   needles:
     Array, the parameters whose values you're looking for
*/
function queryGrab(haystack, needles) {
  // creating a regular expression from the array of needles,
  // given an array of ['q','query'], this will result in:
  // /^(q)|(query)/gi
  var reg = new RegExp('^(' + needles.join(')|(') + ')', 'gi'),
    // finding either the index of the '?' character in the haystack:
    queryIndex = haystack.indexOf('?'),
    // getting the substring from the haystack, starting
    // after the '?' character:
    keyValues = haystack.substring(queryIndex + 1)
      // splitting that string on the '&' characters,
      // to form an array:
      .split('&')
      // filtering that array (with Array.prototype.filter()),
      // the 'keyValue' argument is the current array-element
      // from the array over which we're iterating:
      .filter(function(keyValue) {
        // if RegExp.prototype.test() returns true,
        // meaning the supplied string ('keyValue')
        // is matched by the created regular expression,
        // the current element is retained in the filtered
        // array:
        return reg.test(keyValue);
    // converting that filtered-array to a string
    // on the naive assumption each searched-string
    // should return only one match:
    }).toString();
  // returning a substring of the keyValue, from after
  // the position of the '=' character:
  return keyValues.substring(keyValues.indexOf('=') + 1);
}
// essentially irrelevant, just for the purposes of
// providing a demonstration; here we get all the
// elements of class="haystack":
var haystacks = document.querySelectorAll('.haystack'),
  // the parameters we're looking for:
  needles = ['q', 'query'],
  // an 'empty' variable for later use:
  retrieved;
// using Array.prototype.forEach() to iterate over, and
// perform a function on, each of the .haystack elements
// (using Function.prototype.call() to use the array-like
// NodeList instead of an array):
Array.prototype.forEach.call(haystacks, function(stack) {
  // like filter(), the variable is the current array-element
  // retrieved caches the found parameter-value (using
  // a variable because we're using it twice):
  retrieved = queryGrab(stack.textContent, needles);
  // setting the next-sibling's text:
  stack.nextSibling.nodeValue = '(found: ' + retrieved + ')';
  // updating the HTML of the current node, to allow for
  // highlighting:
  stack.innerHTML = stack.textContent.replace(retrieved, '<span class="found">$&</span>');
});

function queryGrab(haystack, needles) {
  var reg = new RegExp('^(' + needles.join(')|(') + ')', 'gi'),
    queryIndex = haystack.indexOf('?'),
    keyValues = haystack.substring(queryIndex + 1)
    .split('&')
    .filter(function(keyValue) {
      return reg.test(keyValue);
    }).toString();
  return keyValues.substring(keyValues.indexOf('=') + 1);
}
var haystacks = document.querySelectorAll('.haystack'),
  needles = ['q', 'query'],
  retrieved;
Array.prototype.forEach.call(haystacks, function(stack) {
  retrieved = queryGrab(stack.textContent, needles);
  stack.nextSibling.nodeValue = '(found: ' + retrieved + ')';
  stack.innerHTML = stack.textContent.replace(retrieved, '<span class="found">$&</span>');
});
ul {
  margin: 0;
  padding: 0;
}
li {
  margin: 0 0 0.5em 0;
  padding-bottom: 0.5em;
  border-bottom: 1px solid #ccc;
  list-style-type: none;
  width: 100%;
}
.haystack {
  display: block;
  color: #999;
}
.found {
  color: #f90;
}
<ul>
  <li><span class="haystack">search?w=tot&amp;DA=YZR&amp;t__nil_searchbox=btn&amp;sug=&amp;o=&amp;q=%EB%B9%84%EC%BD%98</span>
  </li>
  <li><span class="haystack">search?q=%EB%B9%84%EC%BD%98&amp;go=%EC%A0…4%EB%B9%84%EC%BD%98&amp;sc=8-2&amp;sp=-1&amp;sk=&amp;cvid=f05407c5bcb9496990d2874135aee8e9</span>
  </li>
  <li><span class="haystack">where=nexearch&amp;query=%EB%B9%84%EC%BD%98&amp;sm=top_hty&amp;fbm=0&amp;ie=utf8</span>
  </li>
</ul>

JS小提琴(便于异地实验)。

引用:

  • Array.prototype.filter() .
  • Array.prototype.forEach() .
  • Array.prototype.toString() .
  • document.querySelectorAll() .
  • Function.prototype.call() .
  • RegExp()构造函数。
  • RegExp.prototype.test() .
  • 正则表达式指南。
  • String.prototype.indexOf() .
  • String.prototype.split() .
  • String.prototype.substring() .

则表达式不是解析这些查询字符串的最佳方式。有库和工具,但如果你想自己做:

function parseQueryString(url) {
    return _.object(url .              // build an object from pairs
        split('?')[1]   .              // take the part after the ?
        split('&')      .              // split it by &
        map(function(str) {            // turn parts into 2-elt array
            return str.split('=');     // broken at =
        })
    );
}

这使用 Underscore 的 _.object ,它从键/值对数组数组创建一个对象,但如果你不想使用它,你可以在几行中编写自己的等价物。

现在,您正在寻找的价值只是

params = parseQueryString(url);
return params.q || params.query;