通过分解成更小的部分来解释javascript正则表达式

explain javascript regex by breaking into smaller pieces

本文关键字:解释 javascript 正则表达式 分解      更新时间:2023-09-26

这是一个从youtube url中提取视频id的函数。

 function youtubeLinkParser(url) {
            var regExp = /^.*(youtu.be'/|v'/|u'/'w'/|embed'/|watch'?v=|'&v=)([^#'&'?]*).*/;
            var match = url.match(regExp);
            if (match && match[2].length == 11) {
                return match[2];
            } else {
                return null;
            }
        }

我是regex的新手,所以有人介意把regex分解成更小的部分并解释它是如何工作的吗?

Yape::Regex::Explain

The regular expression:
(?-imsx:^.*(youtu.be/|v/|u/'w/|embed/|watch'?v=|'&v=)([^#'&'?]*).*)
matches as follows:
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching 'n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  .*                       any character except 'n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  (                        group and capture to '1:
----------------------------------------------------------------------
    youtu                    'youtu'
----------------------------------------------------------------------
    .                        any character except 'n
----------------------------------------------------------------------
    be/                      'be/'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    v/                       'v/'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    u/                       'u/'
----------------------------------------------------------------------
    'w                       word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    embed/                   'embed/'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    watch                    'watch'
----------------------------------------------------------------------
    '?                       '?'
----------------------------------------------------------------------
    v=                       'v='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    '&                       '&'
----------------------------------------------------------------------
    v=                       'v='
----------------------------------------------------------------------
  )                        end of '1
----------------------------------------------------------------------
  (                        group and capture to '2:
----------------------------------------------------------------------
    [^#'&'?]*                any character except: '#', ''&', ''?' (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of '2
----------------------------------------------------------------------
  .*                       any character except 'n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
^.*

一开始,什么都有可能。

然后是下面这些东西之一:

youtu.be/    (that's the intention, but actually the dot can be any char)
v/
u/some letter/
embed/
watch?v=
&v=

上面的东西变成match[1]。

然后是0个或多个非#&还是?这些字符变成match[2]。

/^.*以任意字符开头

(youtu.be'/|v'/|u'/'w'/|embed'/|watch'?v=|'&v=)

匹配以下任何一个:

  • youu *be/<——可能应该是youtube .be
  • v/
  • u/w/
  • 手表吗?v =
  • , v =

    ([^ # ', ' ?] *)

后面跟着除# &然后呢?符号

.*/

任意字符