使用正则表达式将任何字符串分隔成一个完整的单词，标点符号&html标记

Use regex to separate any string into an array of whole words, punctuation & html tags

本文关键字：一个单词标记 html 标点符号任何正则表达式字符串分隔更新时间：2023-09-26

我发现目前的工作是使用空格来匹配。我希望能够匹配任意HTML标签和标点符号。

var text = "<div>The Quick brown fox ran through it's forest darkly!</div>"
//this one uses spaces only but will match "darkly!</div>" as 1 element
console.log(text.match(/'S+/g));
//outputs: ["<div>The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly!</div>"]

我想要一个匹配表达式，它将输出:

["<div>", "The", "Quick", "brown", "fox", "ran", "through", "it's", "forest", "darkly", "!", "</div>"]

这是一把小提琴:https://jsfiddle.net/scottpatrickwright/og0bd0xj/2/

最后，我将把所有匹配项存储在一个数组中，做一些处理(在每个单词周围添加一些带有条件数据属性的span标记)，并以改变的形式重新输出原始字符串。我之所以提到这个，是因为不让字符串或多或少保持完整的解决方案是行不通的。

我在网上找到了很多接近错过的解决方案，但是我的正则表达式不够好，无法利用他们的工作。

如何:

/(<'/?)?['w']+>?|[!'.,;'?]/g

了。

你可以在HTML标签前后加一个空格，像这样:

var text = "<div>The Quick brown fox ran through it's forest darkly!</div>"
text = text.replace(/'<(.*?)'>/g, ' <$1> ');
console.log(text.match(/'w+|'S+/g)); // ## Credit to George Lee ##

我的建议是:

console.log(text.match(/(<.+?>|[^'s<>]+)/g));

在我们的regex: (<.+?>|[^'s<>]+)中我们指定了两个字符串来捕获

<.+?> returns all <text> strings
[^'s<>]+ returns all strings that don't contain space,<,>

在第二行中你可以添加你想忽略的字符