Skip to content

Conversation

@AmiZya
Copy link

@AmiZya AmiZya commented Oct 15, 2024

This PR adds support for the Robot Framework test suite https://robotframework.org/

<rule pattern="#.*$">
<token type="Comment"/>
</rule>
<rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this ampersand and the other one below were causing the lexer not to be loaded. It seems to work as <rule pattern="${[^}]+}|@{[^}]+}|&amp;{[^}]+}">

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I fixed it.

<rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}">
<token type="NameVariable"/>
</rule>
<rule pattern="(True|False|None|null|on|off)\b">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed here is that this pattern will match portions of words. For example, when I tried it with a file that had the word Documentation in it, it highlighted the on at the end differently from the rest of the word.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it wrapped it in \b in order to match the whole word and not only parts of it.

<rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|&amp;\{[^\}]+\}">
<token type="NameVariable"/>
</rule>
<rule pattern="\b(True|False|None|null|on|off)\b">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're getting closer.

The change to use word boundaries has fixed the issue when the keyword appears at the beginning or in the middle of a word, but there still seems to be an issue when the keyword appears at the end of a word. Here I mean 'word' as a 'space separated' word.

For example, for the following document still has the 'on' at the end of the word Documentation highlighted:

*** Settings ***
Documentation A test suite

I think the problem is that there is an ambiguity in the lexer grammar between the 'keyword' rule and the '.' rule in the root state that the initial word-boundary doesn't solve. I temporarily added a debug log to print the tokens and for that document I see:

syntax: token: token: Type: Keyword Value: '*** Settings ***' Start: 0 End: 16
syntax: token: token: Type: TextWhitespace Value: '
' Start: 16 End: 17
syntax: token: token: Type: Text Value: 'Documentati' Start: 17 End: 28
syntax: token: token: Type: KeywordConstant Value: 'on' Start: 28 End: 30
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 30 End: 31
syntax: token: token: Type: Text Value: 'A' Start: 31 End: 32
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 32 End: 33
syntax: token: token: Type: Text Value: 'test' Start: 33 End: 37
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 37 End: 38
syntax: token: token: Type: Text Value: 'suite' Start: 38 End: 43
syntax: token: token: Type: EOFType Value: '' Start: 0 End: 0

It looks like the grammar '.' rule allows parsing each letter in 'Documentati' as text (which is then combined internally), but when it sees the 'on' it matches the keyword rule. I think the word boundary is somehow honoured because the text being passed to match the regexp is the remainder of the unmatched text, which is 'on A test suite', and so the first word-boundary is satisfied.

Perhaps we must only enter the 'keyword' state in a more specific condition? I don't know the syntax of robot files, but I am assuming that the keywords can only appear in test case sections, and only in a specific place in a line, and maybe only after some other syntax element that introduces it. If so, maybe we only need to enter the keyword state under those conditions.

If you want I can give you the source-code diff to print those logs from Anvil, or give you a custom compiled binary that prints the logs so you can troubleshoot it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants