Skip to content

Conversation

@hsqStephenZhang
Copy link

problem

i was testing my c compiler on a corner case for sizeof expression, and i found that the output of tree-sitter is different from that of clang.

the case is:

int foo(void) {
    return 0;
}

int test(void) {
    int a = sizeof(foo)();
    return a;
}

clang's output

`-FunctionDecl 0x124950c00 <line:5:1, line:8:1> line:5:5 test 'int (void)'
  `-CompoundStmt 0x124950e78 <col:16, line:8:1>
    |-DeclStmt 0x124950e18 <line:6:5, col:26>
    | `-VarDecl 0x124950cc0 <col:5, col:25> col:9 used a 'int' cinit
    |   `-ImplicitCastExpr 0x124950e00 <col:13, col:25> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x124950de0 <col:13, col:25> 'unsigned long' sizeof
    |       `-CallExpr 0x124950db8 <col:19, col:25> 'int'
    |         `-ImplicitCastExpr 0x124950da0 <col:19, col:23> 'int (*)(void)' <FunctionToPointerDecay>
    |           `-ParenExpr 0x124950d48 <col:19, col:23> 'int (void)'
    |             `-DeclRefExpr 0x124950d28 <col:20> 'int (void)' Function 0x124935428 'foo' 'int (void)' non_odr_use_unevaluated

tree-sitter's output (part of)

body: compound_statement [4, 15] - [7, 1]
  declaration [5, 4] - [5, 26]
    type: primitive_type [5, 4] - [5, 7]
    declarator: init_declarator [5, 8] - [5, 25]
      declarator: identifier [5, 8] - [5, 9]
      value: call_expression [5, 12] - [5, 25]
        function: sizeof_expression [5, 12] - [5, 23]
          type: type_descriptor [5, 19] - [5, 22]
            type: type_identifier [5, 19] - [5, 22]
        arguments: argument_list [5, 23] - [5, 25]

the reason that cause this difference is that, clang parses the remaining tokens after sizeof as a whole postfix-expression, while tree-sitter allows a more flat grammar and tried to address this by precedence.

solution

after analyzing the output of tree-sitter parse --debug <<< 'int main() { sizeof(foo)(); }', i found out that there indeed are two options, and tree-sitter chose the wrong one due to the precedence of call and sizeof

so i use prec.dynamic(1, $.expression)) instead of $.expression, and the problem is solved.

test and verification

for the case tree-sitter parse --debug <<< 'int main() { sizeof(foo)(); }' mentioned before, here is the comparison

logs before this fix

reduce sym:type_specifier, child_count:1
reduce sym:expression, child_count:1
shift state:664
process version:1, version_count:2, state:1077, row:0, col:23
reduce sym:type_descriptor, child_count:1
shift state:396
process version:0, version_count:2, state:664, row:0, col:24
lex_internal state:50, row:0, column:24
  consume character:'('
lexed_lookahead sym:(, size:1
reduce sym:parenthesized_expression, child_count:3
reduce sym:expression, child_count:1
shift state:451
process version:1, version_count:2, state:396, row:0, col:24
reduce sym:sizeof_expression, child_count:4
reduce sym:expression, child_count:1
shift state:451

...

(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 29]
    type: (primitive_type [0, 0] - [0, 3])
    declarator: (function_declarator [0, 4] - [0, 10]
      declarator: (identifier [0, 4] - [0, 8])
      parameters: (parameter_list [0, 8] - [0, 10]))
    body: (compound_statement [0, 11] - [0, 29]
      (expression_statement [0, 13] - [0, 27]
        (call_expression [0, 13] - [0, 26]
          function: (sizeof_expression [0, 13] - [0, 24]
            type: (type_descriptor [0, 20] - [0, 23]
              type: (type_identifier [0, 20] - [0, 23])))
          arguments: (argument_list [0, 24] - [0, 26]))))))

after the fix

process version:0, version_count:1, state:527, row:0, col:23
lex_internal state:49, row:0, column:23
  consume character:')'
lexed_lookahead sym:), size:1
reduce sym:type_specifier, child_count:1
reduce sym:expression, child_count:1
shift state:664
process version:1, version_count:2, state:1077, row:0, col:23
reduce sym:type_descriptor, child_count:1
shift state:396
process version:0, version_count:2, state:664, row:0, col:24
lex_internal state:50, row:0, column:24

(translation_unit [0, 0] - [1, 0]
  (function_definition [0, 0] - [0, 29]
    type: (primitive_type [0, 0] - [0, 3])
    declarator: (function_declarator [0, 4] - [0, 10]
      declarator: (identifier [0, 4] - [0, 8])
      parameters: (parameter_list [0, 8] - [0, 10]))
    body: (compound_statement [0, 11] - [0, 29]
      (expression_statement [0, 13] - [0, 27]
        (sizeof_expression [0, 13] - [0, 26]
          value: (call_expression [0, 19] - [0, 26]
            function: (parenthesized_expression [0, 19] - [0, 24]
              (identifier [0, 20] - [0, 23]))
            arguments: (argument_list [0, 24] - [0, 26])))))))

and i added it to the corpus, the other test cases remain unaffected.

Ensures `sizeof(foo)()` parses as sizeof of call expression, not
calling the result of sizeof.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant