使用数据库查找在节点中导出大文件 - 避免多次数据库调用

Exporting large file in node with database lookup - Avoid multiple DB calls?

本文关键字：数据库文件调用查找节点更新时间：2023-09-26

我对 Node 相当陌生，虽然我认为它非常适合服务类型的应用程序，但在将其用于仅从头到尾运行的应用程序时遇到了一些问题，例如当访问数据库或其他类似东西需要回调时，就像数据导出应用程序一样。

这是我目前的设置。

我有一个脚本，可以将MongoDB中的数据导出到XML文件中，以便在单独的进程中使用。导出脚本非常简单：

db.getData(function(err, data) {
    data.forEach(function(entry) {
        // write the data to the file
        writeData(entry);
    });
});

问题是当我需要在导出期间进行非同步调用时，例如：

db.getData(function(err, data) {
    data.forEach(function(entry) {
        var cacheValue = cache.get(entry.someOtherId);
        if (cacheValue) {
            // write the value from the cache
            writeData(entry, cacheValue);
        }
        else {
            // THIS IS CALLED 1000's OF TIMES EVEN THOUGH THE FIRST FEW CALLS
            // SHOULD POPULATE THE CACHE
            db.getLookup(entry.someOtherId, function(err, value) {
                // store it in the cache to avoid db calls
                cache.store(entry.someOtherId, value);
                // write the data to the file after getting the lookup
                writeData(entry, value);
            });
        }
    });
});

由于执行getLookup时节点的非阻塞性质，主forEach循环将继续，并且由于entry.someOtherId字段是查找，因此它通常包含与另一条记录相同的值。

因此，对于查找量相对较少的大文件，在第一个数据库调用有机会返回并将值存储在缓存中之前，我会收到数千个数据库调用发送到getLookup。

不需要预加载

我知道我可以简单地重新加载缓存，因为查找表相当小，但是对于重新缓存所有值不切实际的大型查找，应该如何解决这个问题？

暂停主循环

在同步环境中，这很简单，主循环将停止，直到返回数据库值，因此下次该值已经在缓存中。

我知道有各种库试图停止线程执行，直到回调返回，但这似乎违背了 Node 是什么。

有人可以告诉我在 Node 中处理此类情况的普遍接受模式是什么？

我建议使用 promise 库和记忆函数来解决处理并行运行的多个异步操作的任务。

对于以下示例，我使用的是蓝鸟。您的整个循环（包括结果缓存）可以简化为以下相当清晰的代码段：

var db = Promise.promisifyAll(db);
var lookup = memoize(db.getLookupAsync, db);
entries.forEach(function (entry) {
    lookup(entry.someOtherId).then(function (value) {
        writeData(entry, value);
    });
});

其中memoize是缓存函数结果的通用帮助程序函数：

function memoize(func, thisArg) {
    var cache = {};
    return function memoize(id) {
        if (!cache.hasOwnProperty(id)) {
            cache[id] = func.apply(thisArg || this, arguments);
        }
        return cache[id];
    };
}

因此，lookup()是一个调用db.getLookup()的 promisized 版本的函数（蓝鸟的.promisifyAll()创建对象中所有函数的...Async()版本）并记住相应的结果。

一个承诺的

函数返回一个承诺，该承诺在数据可用时立即解析（即调用其.then()回调），或者立即（如果之前已经解析）。换句话说，我们可以缓存一个承诺，并根据需要随时调用.then()。

通过此设置，我们具备了解决处理异步函数调用的任务所需的一切，同时缓存其结果以保持过程尽可能快。此外，它令人愉快，阅读起来很直接，而不是"回调地狱"。

看看 http://jsfiddle.net/Tomalak/91bdb5ns/，在那里你可以看到它的工作原理。

请注意，我的代码中没有错误处理。您应该阅读蓝鸟文档并自己添加。

我想我现在真正理解了表达callback hell

事实证明（确实不足为奇）这都需要在回调和递归函数中完成，因此在上一个条目完成之前，下一个条目不会启动：

使用此处描述的方法：对节点使用递归模式循环.js

处理一个

值数组，数组与索引一起传递到一个函数中，当该索引的值被处理后，它使用 index+1 的索引调用自己：

function processEntry(entries, index, next) {
    // no more entries to run
    if (index >= entries.length) {
        next();
        return;
    }
    var cacheValue = cache.get(entry.someOtherId);
    if (cacheValue) {
        // write the value from the cache
        writeData(entry, cacheValue);
        // process the next entry
        process.nextTick(function() {
            processEntry(entries, index+1, next);
        });
    }
    else {
        db.getLookup(entry.someOtherId, function(err, value) {
            // store it in the cache to avoid db calls
            cache.store(entry.someOtherId, value);
            // write the data to the file after getting the lookup
            writeData(entry, value);
            // process the next entry
            processEntry(entries, index+1, next);
        });
    }
}

避免堆栈溢出

这种设置的问题在于，一旦填充了缓存，我们将直接从processEntry内部开始调用processEntry，而不是从其他回调的堆栈调用，因此不久我们就会出现堆栈溢出。

为了避免这种情况，我们需要告诉 Node 使用 process.nextTick() 创建一个新的堆栈http://nodejs.org/api/process.html#process_process_nexttick_callback

在事件循环的下一个循环中调用此回调。这不是 setTimeout（fn， 0）的简单别名，它的效率要高得多。它通常在任何其他 I/O 事件触发之前运行，但也有一些例外。请参阅下面的 process.maxTickDepth。

根据文档，此调用相当有效