将正则表达式与字符数组匹配

match regex to array of characters

本文关键字:数组 字符 正则表达式      更新时间:2023-09-26

在我所做的页面,我需要有一个特定大小的字符串环缓冲区。我为此使用一个数组,只是push和unshift。单个索引是单个字符。我不想使用字符串,因为每次我将一个字符放入缓冲区时,都会发生复制。现在我需要在这个缓冲区上使用正则表达式。这样做的问题是,现在每次我想匹配我需要array.join(),这样做是相当昂贵的…

现在我想知道是否有可能直接在字符数组上使用正则表达式,而不是首先将其转换为字符串。

我想如果有一个可变的字符串类型,我永远不会有这个问题…

同意!JS字符串是不可变的,但是你担心它们的连接会耗费时间,这是不合理的。我可以确认,在Chrome, Firefox, Opera, IE11和Safari中,Array.prototype.join()操作超过1,000,000个随机字符的数组需要大约10~40ms,具体取决于引擎。是的……事情就是这样。(尤其是蜘蛛猴)让我们看看

var   arr = [],
    match = [],
  longStr = "",
    count = 1000000,
    start = 0,
      end = 0;
for (var i=0; i<count; i++){
  arr.push((+new Date()*Math.random()).toString(36)[0]);
}
start = performance.now();
longStr = arr.join("");
end = performance.now();
console.log("Concatenation took "+(end-start)+" msecs");
// Concatenation took 10.875 msecs.. Wow Firefox..!
start = performance.now();
match = longStr.match(/7{5,}/g);
!match && (match = [])
end = performance.now();
console.log("Regex match took "+(end-start)+" msecs and found " +match.length+" matches as: ", match);
//Regex match took 6.550000000046566 msecs and found 1 matches as: ["77777"]
所以在做了这些测试之后,我决定试着回答你的问题。我们需要创建自己的可变字符串对象。实际上创建它非常简单。一个具有特殊功能的奇异数组对象,它也可以访问Array.prototype函数。就像字符串对象一样,它将有一个额外的属性称为primitiveValue,每次更新长度属性时,我们将执行this.primitiveValue = this.join();操作,以便访问length属性的所有Array.prototype函数将自动更新primitiveValue。如果我们有一个繁重的写入工作流程,这将降低性能。好消息是我们可以完全控制如何更新primitiveValue。如果愿意,我们可以在每次写访问length属性时跳过更新它,并且可以在对字符串内容应用regex之前手动进行更新。或者我们甚至可以向RingBuffer.prototype添加正则表达式函数,并将primitiveValue作业连接到它们。这里有很多可能性。
function RingBuffer(){
    this.primitiveValue = "";
    this.__len;
    Object.defineProperty(this, "length", {
                                             enumerable: true,
                                           configurable: true,
                                                    get: this.getLength,
                                                    set: this.setLength
                                          });
}
RingBuffer.prototype = Array.prototype;
RingBuffer.prototype.constructor = RingBuffer;
RingBuffer.prototype.getLength = function(){
                                   return this.__len;
                                 };
RingBuffer.prototype.setLength = function(val){
                                   this.__len = val;
                                   this.primitiveValue = this.join("");
                                 };
var ringu = new RingBuffer();

所以我用100,000个随机字符填充ringu。Chrome 49的基准测试是这样的;

var longStr = "",
      count = 100000,
      start = performance.now(),
        end = 0;
for (var i=1; i<=count; i++){
  ringu.push((+new Date()*Math.random()).toString(36)[0]);
  if (!(i % 10000)){
    end = performance.now();
    console.log(i/10000+". 10000 pushes done at :"+(end - start)+" msecs");
    start = end;
  }
}
console.log("ringu is filled with " + count + " random characters");
start = performance.now();
longStr = ringu.join("");
end = performance.now();
console.log("Last concatenation took "+(end-start)+" msecs");
1. 10000 pushes done at :1680.6399999996647 msecs
2. 10000 pushes done at :4873.2599999997765 msecs
3. 10000 pushes done at :8044.155000000261 msecs
4. 10000 pushes done at :11585.525000000373 msecs
5. 10000 pushes done at :14642.490000000224 msecs
6. 10000 pushes done at :17998.389999999665 msecs
7. 10000 pushes done at :20814.979999999516 msecs
8. 10000 pushes done at :24024.445000000298 msecs
9. 10000 pushes done at :27146.375 msecs
10. 10000 pushes done at :30347.794999999925 msecs
ringu is filled with 100000 random characters
Last concatenation took 3.510000000707805 msecs

因此,根据您写的频率或在应用regex之前需要连接primitiveValue的频率,您可以决定在何处调用this.join("");指令。500K项RingBuffer的平均连接时间小于30ms。

嗯…这是蜘蛛猴实验的结果。因此,如果你要在Node. js上运行类似的代码,可能更明智的做法是尝试配置了Spider Monkey或ChakraCore引擎的JXCore,而不是V8引擎的Node。

1. 10000 pushes done at :710.310000000005 msecs
2. 10000 pushes done at :1831.4599999999991 msecs
3. 10000 pushes done at :3018.199999999997 msecs
4. 10000 pushes done at :4113.779999999999 msecs
5. 10000 pushes done at :5144.470000000008 msecs
6. 10000 pushes done at :6588.179999999993 msecs
7. 10000 pushes done at :7860.005000000005 msecs
8. 10000 pushes done at :8727.050000000003 msecs
9. 10000 pushes done at :9795.709999999992 msecs
10. 10000 pushes done at :10866.055000000008 msecs
ringu is filled with 100000 random characters
Last concatenation took 1.0999999999912689 msecs