Javascript:使用dictionary从字符串中过滤单词
Javascript: Using dictionary to filter out words from a string?
我需要从字符串中筛选出几百个"停止"字。由于有很多"停止"的词,我认为这样做不是一个好主意:
sentence.replace(/'b(?:the|it is|we all|an?|by|to|you|[mh]e|she|they|we...)'b/ig, '');
如何创建类似哈希图的东西来存储停止词?在这个映射中,键本身就是一个停止词,值并不重要。然后过滤将导致检查单词是否不存在于停止单词映射中。使用什么数据结构来构建这样的地图?
对于这类工作,没有什么比正则表达式更好的了。然而,它们有两个问题——难以维护(你在帖子中指出的)和非常大的性能问题。我不知道一个regexp可以处理多少个替代方案,但我想在任何情况下最多20-30个都可以。
因此,您需要一些代码来从某些数据结构动态构建正则表达式,这些数据结构可以是数组,也可以只是字符串。我个人更喜欢刺,因为它最容易维护。
// taken from http://www.ranks.nl/resources/stopwords.html
stops = ""
+"a about above after again against all am an and any are aren't as "
+"at be because been before being below between both but by can't "
+"cannot could couldn't did didn't do does doesn't doing don't down "
+"during each few for from further had hadn't has hasn't have "
+"haven't having he he'd he'll he's her here here's hers herself "
+"him himself his how how's i i'd i'll i'm i've if in into is isn't "
+"it it's its itself let's me more most mustn't my myself no nor "
+"not of off on once only or other ought our ours ourselves out "
+"over own same shan't she she'd she'll she's should shouldn't so "
+"some such than that that's the their theirs them themselves then "
+"there there's these they they'd they'll they're they've this "
+"those through to too under until up very was wasn't we we'd we'll "
+"we're we've were weren't what what's when when's where where's "
+"which while who who's whom why why's with won't would wouldn't "
+"you you'd you'll you're you've your yours yourself yourselves "
// how many to replace at a time
reSize = 20
// build regexps
regexes = []
stops = stops.match(/'S+/g).sort(function(a, b) { return b.length - a.length })
for (var n = 0; n < stops.length; n += reSize)
regexes.push(new RegExp("''b(" + stops.slice(n, n + reSize).join("|") + ")''b", "gi"));
一旦你有了这个,剩下的就是显而易见的:
regexes.forEach(function(r) {
text = text.replace(r, '')
})
您需要对reSize
值进行实验,以找出正则表达式长度和正则表达式总数之间的最佳平衡。如果性能很关键,您也可以运行一次生成部分,然后将结果(即生成的regexp)缓存在某个地方。
相关文章:
- 有可能过滤来自嵌入式YouTube的声音吗
- 当鼠标悬停在文本中的单词上时显示警报
- 匹配一个单词,其中候选人可以跨越顺序组(跨度)
- 如何使用jquery强制一个单词更改大小写等以保留品牌
- 拆分单词jquery
- 为什么我的d3.jsselectAll+过滤器没有过滤
- 如何让程序检查所选单词中是否有按键
- 如何在悬停时流畅地更改单词
- EmberJS中支持单字母单词模型
- 字母计数:返回重复字母数最多的第一个单词
- regex过滤单词长度最小的句子
- D3.js~如何过滤包含某些值/单词的数据值
- 如何使用JavaScript(不是jQuery)从字符串中过滤前缀和修剪单词
- Javascript:使用dictionary从字符串中过滤单词
- 在Javascript中过滤掉字符串中分隔符之后的单词
- JavaScript代码过滤掉字符串中的常见单词
- java脚本Regex对单词进行过滤
- 过滤掉用户在 javascript 中键入文本时与顺序不匹配的单词
- 如果表单输入中存在任何错误/过滤的单词,则显示警报
- 使用jQuery过滤列表,当输入多个单词时,使用AND条件