在javascript中从字符串中提取半结构化信息

Extract semi-structured information from a string in javascript

本文关键字：结构化提取信息字符串 javascript 更新时间：2023-09-26

我有这样的句子:

"[Paris:location]和[Lyon:location]在法国"

我需要从中提取所有标记的部分("Paris:location"answers"Lyon:location")。

我已经尝试过使用正则表达式(RegExp)的代码:

var regexEntity = new RegExp(''[.+:.+']', 'g');
var text = '[Paris:location] and [Lyon:location] are in France';
while ((match = regexEntity.exec(text))) {
    console.log(match);
}

但这是我得到的输出，好像它正在检测冒号:

[ ':',
  index: 6,
  input: '[Paris:location] and [Lyon:location] are in France' ]
[ ':',
  index: 26,
  input: '[Paris:location] and [Lyon:location] are in France' ]

我的正则表达式有问题吗?你还会用其他方法获取这些信息吗?

.+是贪婪的，你需要使用它的惰性版本:.+? .

那么，就像这样简单:

var text = '[Paris:location] and [Lyon:location] are in France';
console.log(text.match(/'[.+?:.+?']/g));

您可以使用非延迟搜索和正向查找的正则表达式。

var regex = /'[(.*?)(?=:location)/gi,
    string = '"[Paris:location] and [Lyon:location] are in France"',
    match;
 
while ((match = regex.exec(string)) !== null) {
    console.log(match[1]);
}