Skip to content

bug: Inconsistency when parsing comments below function definition respective to docstring usage #322

@nnako

Description

@nnako

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues of tree-sitter-python

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.25.0

Describe the bug

When parsing Python code, there seems to be a strange behaviour concerning the location of comments placed directly below the function definition.

Here an example where the parsing seems to work properly.

The function func_2 has no arguments, is described by a three-line docstring. The function body is starting with a three-line comment string, and the expression is a global variable definition:

def func_2():
    '''
    a function with no specific return type and no parameters
    '''

    #
    # assign a global variable
    #

    global glbStrVar1

As expected, the three-line comment string is shown below the "expression_statement" which contains strings to represent the docstring. The position of the comment lines are marked by ">". They are clearly (as expected) located as children below the block element:

  function_definition [Point(row=136, column=0) - Point(row=182, column=20)]
    def [Point(row=136, column=0) - Point(row=136, column=3)]
    identifier [Point(row=136, column=4) - Point(row=136, column=10)]
    parameters [Point(row=136, column=10) - Point(row=136, column=12)]
      ( [Point(row=136, column=10) - Point(row=136, column=11)]
      ) [Point(row=136, column=11) - Point(row=136, column=12)]
    : [Point(row=136, column=12) - Point(row=136, column=13)]
    block [Point(row=137, column=4) - Point(row=182, column=20)]
      expression_statement [Point(row=137, column=4) - Point(row=139, column=7)]
        string [Point(row=137, column=4) - Point(row=139, column=7)]
          string_start [Point(row=137, column=4) - Point(row=137, column=7)]
          string_content [Point(row=137, column=7) - Point(row=139, column=4)]
          string_end [Point(row=139, column=4) - Point(row=139, column=7)]
>     comment [Point(row=141, column=4) - Point(row=141, column=5)]
>     comment [Point(row=142, column=4) - Point(row=142, column=30)]
>     comment [Point(row=143, column=4) - Point(row=143, column=5)]
      global_statement [Point(row=145, column=4) - Point(row=145, column=21)]
        global [Point(row=145, column=4) - Point(row=145, column=10)]
        identifier [Point(row=145, column=11) - Point(row=145, column=21)]

Now the strange part: The function func_3 also has no arguments, is NOT described by a three-line docstring. The function body is starting with a three-line comment string, and the expression is some assignment:

def func_3():
    
    #
    # call a function
    #

    typedDefaultParameter = func_1(typedDefaultParameter)

Parsing this part of the code leads basically to a concrete-syntax-tree which shows an element order that I would not have expected. The position of the comment lines are, again, marked by ">". It can be seen that the comment lines are now located above the block element which introduces the body of the function:

  function_definition [Point(row=184, column=0) - Point(row=193, column=17)]
    def [Point(row=184, column=0) - Point(row=184, column=3)]
    identifier [Point(row=184, column=4) - Point(row=184, column=10)]
    parameters [Point(row=184, column=10) - Point(row=184, column=12)]
      ( [Point(row=184, column=10) - Point(row=184, column=11)]
      ) [Point(row=184, column=11) - Point(row=184, column=12)]
    : [Point(row=184, column=12) - Point(row=184, column=13)]
>   comment [Point(row=186, column=4) - Point(row=186, column=5)]
>   comment [Point(row=187, column=4) - Point(row=187, column=21)]
>   comment [Point(row=188, column=4) - Point(row=188, column=5)]
    block [Point(row=190, column=4) - Point(row=193, column=17)]
      expression_statement [Point(row=190, column=4) - Point(row=190, column=57)]
        assignment [Point(row=190, column=4) - Point(row=190, column=57)]
          identifier [Point(row=190, column=4) - Point(row=190, column=25)]
          = [Point(row=190, column=26) - Point(row=190, column=27)]
          call [Point(row=190, column=28) - Point(row=190, column=57)]
            identifier [Point(row=190, column=28) - Point(row=190, column=34)]
            argument_list [Point(row=190, column=34) - Point(row=190, column=57)]
              ( [Point(row=190, column=34) - Point(row=190, column=35)]
              identifier [Point(row=190, column=35) - Point(row=190, column=56)]
              ) [Point(row=190, column=56) - Point(row=190, column=57)]

Steps To Reproduce/Bad Parse Tree

  1. take the function definition snippets for func_2 and func_3
  2. parse the code into a tree representation
  3. observe the location of the representations of the three comment lines

Expected Behavior/Parse Tree

the expected behaviour would show the comment (which is meant to describe the 1st expression within the block) below the block element. Regardless of the existence of a leading docstring, as shown here:

  function_definition [Point(row=184, column=0) - Point(row=193, column=17)]
    def [Point(row=184, column=0) - Point(row=184, column=3)]
    identifier [Point(row=184, column=4) - Point(row=184, column=10)]
    parameters [Point(row=184, column=10) - Point(row=184, column=12)]
      ( [Point(row=184, column=10) - Point(row=184, column=11)]
      ) [Point(row=184, column=11) - Point(row=184, column=12)]
    : [Point(row=184, column=12) - Point(row=184, column=13)]
    block [Point(row=190, column=4) - Point(row=193, column=17)]
>     comment [Point(row=186, column=4) - Point(row=186, column=5)]
>     comment [Point(row=187, column=4) - Point(row=187, column=21)]
>     comment [Point(row=188, column=4) - Point(row=188, column=5)]
      expression_statement [Point(row=190, column=4) - Point(row=190, column=57)]
        assignment [Point(row=190, column=4) - Point(row=190, column=57)]
          identifier [Point(row=190, column=4) - Point(row=190, column=25)]
          = [Point(row=190, column=26) - Point(row=190, column=27)]
          call [Point(row=190, column=28) - Point(row=190, column=57)]
            identifier [Point(row=190, column=28) - Point(row=190, column=34)]
            argument_list [Point(row=190, column=34) - Point(row=190, column=57)]
              ( [Point(row=190, column=34) - Point(row=190, column=35)]
              identifier [Point(row=190, column=35) - Point(row=190, column=56)]
              ) [Point(row=190, column=56) - Point(row=190, column=57)]

Repro

# please see within the 1st section for code examples. the line numbers within the tree representation might possibly be different when trying. The snippets have been cut and taken from a real code example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions