Regex需要按"."分割字符串

Regex needed to split a string by "."

本文关键字:quot 分割 字符串 Regex      更新时间:2023-09-26

我需要一个Javascript中的正则表达式。我有一个字符串:

'*window.some1.some'.2.(a.b + ")" ? cc'.c : d.n [a.b, cc'.c]).some'.3.(this.o.p ? ".mike." [ff'.]).some5'

我想把这个字符串按句号分割这样我就得到了一个数组

[
    '*window',
    'some1',
    'some'.2',   //ignore the . because it's escaped
    '(a.b ? cc'.c : d.n [a.b, cc'.c])',  //ignore everything inside ()
    'some'.3',
    '(this.o.p ? ".mike." [ff'.])',
    'some5'
]

什么正则表达式会做这个?

var string = '*window.some1.some''.2.(a.b + ")" ? cc''.c : d.n [a.b, cc''.c]).some''.3.(this.o.p ? ".mike." [ff''.]).some5';
var pattern = /(?:'((?:(['"])')'1|[^)]+?)+')+|'''.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array

小提琴:http://jsfiddle.net/66Zfh/3/
RegExp的解释。匹配一个连续的字符集,满足:

/             Start of RegExp literal
(?:            Create a group without reference (example: say, group A)
   '(          `(` character
   (?:         Create a group without reference (example: say, group B)
      (['"])     ONE `'` OR `"`, group 1, referable through `'1` (inside RE)
      ')         `)` character
      '1         The character as matched at group 1, either `'` or `"`
     |          OR
      [^)]+?     Any non-`)` character, at least once (see below)
   )+          End of group (B). Let this group occur at least once
  |           OR
   '''.        `'.` (escaped backslash and dot, because they're special chars)
  |           OR
   [^.]+?      Any non-`.` character, at least once (see below)
)+            End of group (A). Let this group occur at least once
/g           "End of RegExp, global flag"
        /*Summary: Match everything which is not satisfying the split-by-dot
                 condition as specified by the OP*/

++?之间存在差异。单个+试图匹配尽可能多的字符,而+?只匹配这些字符是必要的,以获得RegExp匹配。示例:123 using 'd+? > 1 and 'd+ > 123

String.match方法执行全局匹配,因为/g是全局标志。带有g标志的match函数返回一个包含所有匹配子序列的数组。

省略g标志时,只选择第一个匹配。该数组将由以下元素组成:

Index 0: <Whole match>
Index 1: <Group 1>

下面的正则表达式:

result = subject.match(/(?:('(.*?[^'"]')|.*?[^''])(?:'.|$))/g);

可以用来获得想要的结果。组1有结果,因为您想省略.

使用:

var myregexp = /(?:('(.*?[^'"]')|.*?[^''])(?:'.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
    for (var i = 0; i < match.length; i++) {
        // matched text: match[i]
    }
    match = myregexp.exec(subject);
}

说明:

// (?:('(.*?[^'"]')|.*?[^''])(?:'.|$))
// 
// Match the regular expression below «(?:('(.*?[^'"]')|.*?[^''])(?:'.|$))»
//    Match the regular expression below and capture its match into backreference number 1 «('(.*?[^'"]')|.*?[^''])»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «'(.*?[^'"]')»
//          Match the character “(” literally «'(»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match a single character NOT present in the list “'"” «[^'"]»
//          Match the character “)” literally «')»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^'']»
//          Match any single character that is not a line break character «.*?»
//             Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
//          Match any character that is NOT a “A ' character” «[^'']»
//    Match the regular expression below «(?:'.|$)»
//       Match either the regular expression below (attempting the next alternative only if this one fails) «'.»
//          Match the character “.” literally «'.»
//       Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
//          Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

使用Regex来平衡括号匹配是出了名的困难,尤其是在Javascript中。

你最好创建自己的解析器。这里有一个聪明的方法来做到这一点,它将利用Regex的优势:

  • 创建一个匹配并捕获任何"感兴趣的模式"的正则表达式- /(?:(''.)|(['('['{])|([')']'}])|('.))/g
  • 使用string.replace(pattern, function (...)),并在函数中保留左花括号和右花括号的计数。
  • 将匹配的文本添加到缓冲区
  • 如果找到了分割字符,并且开始和结束大括号是平衡的,将缓冲区添加到结果数组中。

这个解决方案将需要一些工作,并且需要闭包的知识,你可能应该看到string.replace的文档,但我认为这是解决你的问题的好方法!

:
在注意到与这个问题相关的问题的数量之后,我决定接受上面的挑战。下面是使用正则表达式拆分字符串的实时代码。
这段代码有以下特点:

  • 使用Regex模式查找拆分
  • 仅在平衡括号
  • 时进行分割
  • 仅在平衡引号
  • 时进行分割
  • 允许使用'转义括号、引号和分隔符

不需要正则表达式。

var s = '*window.some1.some'.2.(a.b + ")" ? cc'.c : d.n [a.b, cc'.c]).some'.3.(this.o.p ? ".mike." [ff'.]).some5';
console.log(s.match(/(?:'([^')]+')|.*?'.)/g));
输出:

  ["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

所以,我正在处理这个,现在我看到@FailedDev不是一个失败,因为这是相当不错的。:)

无论如何,这是我的解决方案。我将只发布正则表达式。
(('(.*?((?<!")')(?!")))|(('''.)|([^.]))+)

遗憾的是,这不会在你的情况下工作,然而,因为我使用负向后看,我不认为是由javascript regex引擎支持。它应该工作如预期在其他引擎,但可以确认在这里:http://gskinner.com/RegExr/。