操纵V8引擎

Manipulating the V8 ast

本文关键字：引擎 V8 操纵更新时间：2023-09-26

我打算在v8代码中直接实现js代码覆盖。我最初的目标是为抽象语法树中的每个语句添加一个简单的打印。我看到有一个AstVisitor类，它允许您遍历AST。所以我的问题是我如何在访问者当前访问的语句之后向AST添加语句?

好，我来总结一下我的实验。首先，我所写的内容适用于V8，因为它在Chromium版本r157275中使用，所以事情可能不再工作-但我仍然会链接到当前版本中的地方。

如前所述，您需要自己的AST访问者，例如MyAstVisior，它继承自AstVisitor，并且必须从那里实现一堆VisitXYZ方法。唯一需要检测/检查执行代码的是VisitFunctionLiteral。被执行的代码要么是一个函数，要么是源(文件)中的一组松散语句，V8将其包装在一个函数中，然后执行。

然后，在将解析后的AST转换为代码之前，这里(从松散语句中编译函数)和那里(在运行时编译，当首次执行预定义函数时)，将访问者传递给函数字面量，它将调用访问者的VisitFunctionLiteral:

MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {

我将CompilationInfo指针info传递给自定义访问者，因为需要它来修改AST。

MyAstVisitor(CompilationInfo* compInfo) :
    _ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};

_ci、_nf和_z是指向CompilationInfo、AstNodeFactory<AstNullVisitor>和Zone的指针。

现在在VisitFunctionLiteral中，您可以遍历函数体，如果您愿意，还可以插入语句。

void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
    // fetch the function body
    ZoneList<Statement*>* body = funLit->body();
    // create a statement list used to collect the instrumented statements
    ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
    // iterate over the function body and rewrite each statement
    for (int i = 0; i < body->length(); i++) {
       // the rewritten statements are put into the collector
       rewriteStatement(body->at(i), _stmts);
    }
    // replace the original function body with the instrumented one
    body->Clear();
    body->AddAll(_stmts->ToVector(), _z);
}

在rewriteStatement方法中，您现在可以检查语句。_stmts指针保存了一个语句列表，这些语句最终将替换原来的函数体。因此，要在每个语句之后添加一个print语句，首先要添加原始语句，然后添加自己的print语句:

void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
    // add original statement
    collector->Add(stmt, _z);
    // create and add print statement, assuming you define print somewhere in JS:
    // 1) create handle (VariableProxy) for print function
    Vector<const char> fName("print", 5);
    Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
    fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
    // create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
    VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);
    // 2) create message
    Vector<const char> tmp("Hello World!", 12);
    Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
    Literal* msg = _nf.NewLiteral(v8String);
    // 3) create argument list, call expression, expression statement and add the latter to the collector
    ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
    args->Add(msg);
    Call* printCall = _nf.NewCall(_printVP, args, 0);
    ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
    collector->Add(printStmt, _z);   
}

NewCall和NewUnresolved的最后一个参数是一个数字，表示在脚本中的位置。我认为这是用于调试/错误消息，以告诉错误发生在哪里。我至少没有遇到过将其设置为0的问题(也有一个常量kNoPosition)。

最后一点:这实际上不会在每个语句后面添加print语句，因为Blocks(例如循环体)是表示语句列表的语句，而循环是具有条件表达式和体块的语句。因此，您需要检查当前处理的语句类型，并递归地查看它。重写代码块与重写函数体非常相似。

但是，当您开始替换或修改现有语句时，您将遇到问题，因为AST还携带有关分支的信息。因此，如果为某些条件替换跳转目标，就会破坏代码。我想，如果直接为单个表达式和语句类型添加重写功能，而不是创建新的表达式和语句类型来替换它们，就可以覆盖这一点。

到目前为止，我希望它有帮助。