JavaScript/ECMAScript 数组文字生产的 LOOKAHEADs

LOOKAHEADs for the JavaScript/ECMAScript array literal production

本文关键字:LOOKAHEADs 文字 ECMAScript 数组 JavaScript      更新时间:2023-09-26

我目前正在使用 JavaCC 实现 JavaScript/ECMAScript 5.1 解析器,并且在 ArrayLiteral 生产中遇到了问题。

ArrayLiteral :
    [ Elision_opt ]
    [ ElementList ]
    [ ElementList , Elision_opt ]
ElementList :
    Elision_opt AssignmentExpression
    ElementList , Elision_opt AssignmentExpression
Elision :
    ,
    Elision ,

我有三个问题,我会一个一个地问。

这是第二个。


我已将此生产简化为以下形式:

ArrayLiteral:
    "[" ("," | AssignmentExpression ",") * AssignmentExpression ? "]"

请看第一个问题是否正确:

如何简化 JavaScript/ECMAScript 数组文字制作?

现在我尝试在 JavaCC 中实现它,如下所示:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

JavaCC抱怨模棱两可的,AssignmentExpression(其内容)。显然,需要LOOKAHEAD规范。我花了很多时间试图弄清楚LOOKAHEAD,尝试了不同的事情,例如

  • LOOKAHEAD (AssignmentExpression() ",") (...)*
  • LOOKAHEAD (AssignmentExpression() "]") (...)?

和其他一些变体,但我无法摆脱 JavaCC 警告。

我不明白为什么这不起作用:

void ArrayLiteral() :
{
}
{
    "["
    (
        LOOKAHEAD ("," | AssignmentExpression() ",")
        ","
    |   AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

好吧,AssignmentExpression()本身是模棱两可的,但是LOOKAHEAD s中的尾随",""]"应该清楚地表明应该采取哪些选择 - 或者我在这里弄错了?

此生产的正确LOOKAHEAD规格是什么样的?

更新

不幸的是,这不起作用:

void ArrayLiteral() :
{
}
{
    "["
    (
        ","
    |
        LOOKAHEAD (AssignmentExpression() ",")
        AssignmentExpression()
        ","
    ) *
    (
        AssignmentExpression()
    ) ?
    "]"
}

警告:

Warning: Choice conflict in (...)* construct at line 6, column 5.
         Expansion nested within construct and expansion following construct
         have common prefixes, one of which is: "function"
         Consider using a lookahead of 2 or more for nested expansion.

第 6 行在第一个LOOKAHEAD之前(。公共前缀"function"只是AssignmentExpression的可能开始之一。

JavaCC生成自上而下的解析器。我会直截了当地说,我不喜欢自上而下的解析器生成器,所以我不是 JavaCC 专家,也没有方便测试的东西。

编辑:我以为其他方法会起作用,但后来我意识到我不明白JavaCC如何对实际选择进行展望;在( A | B )* C的情况下,实际上有三种可能的选择:A,B和C。我以为它会考虑所有三个,但它有可能一次做两个。所以下面是另一个猜测。

话虽如此,我认为以下内容会起作用,但它涉及几乎每AssignmentExpression()解析两次。

{
    "["
    (
        ","
    |
        AssignmentExpression()
        ","
    ) *
    (
        LOOKAHEAD (AssignmentExpression() "]")
        AssignmentExpression()
    ) ?
    "]"
}

正如我在链接问题中指出的那样,更好的解决方案是以不同的方式重写生产:

"[" AssignmentExpression ? ("," AssignmentExpression ?) * "]"

这会导致一个令牌的前瞻语法,因此您不需要LOOKAHEAD声明来处理它。

这是另一种方法。它的优点是无需使用任何语义操作即可识别哪些逗号表示未定义的元素。

void ArrayLiteral() : {} { "[" MoreArrayLiteral() }
void MoreArrayLiteral() : {} {
    "]"
|    "," /* undefined item */ MoreArrayLiteral()
|    AssignmentExpression() ( "]" |  "," MoreArrayLiteral() )
}
这就是

我解决它的方式(感谢@rici的回答):

JSArrayLiteral ArrayLiteral() : 
{
    boolean lastElementWasAssignmentExpression = false;
}
{
    "["
    (
        (
            AssignmentExpression()
            {
                // Do something with expression
                lastElementWasAssignmentExpression = true;
            }
        ) ?
        (
            ","
            {
                if (!lastElementWasAssignmentExpression)
                {
                    // Do something with elision
                }
            }
            (
                AssignmentExpression()
                {
                    // Do something with expression
                    lastElementWasAssignmentExpression = true;
                }
            ) ?
        ) *
    )
    "]"
    {
        // Do something with results
    }
}