-
Notifications
You must be signed in to change notification settings - Fork 1
Added Robot Framework lexer #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
lexers/embedded/robot.xml
Outdated
| <rule pattern="#.*$"> | ||
| <token type="Comment"/> | ||
| </rule> | ||
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this ampersand and the other one below were causing the lexer not to be loaded. It seems to work as <rule pattern="${[^}]+}|@{[^}]+}|&{[^}]+}">
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I fixed it.
lexers/embedded/robot.xml
Outdated
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|\&\{[^\}]+\}"> | ||
| <token type="NameVariable"/> | ||
| </rule> | ||
| <rule pattern="(True|False|None|null|on|off)\b"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I noticed here is that this pattern will match portions of words. For example, when I tried it with a file that had the word Documentation in it, it highlighted the on at the end differently from the rest of the word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it wrapped it in \b in order to match the whole word and not only parts of it.
| <rule pattern="\$\{[^\}]+\}|\@{[^\}]+\}|&\{[^\}]+\}"> | ||
| <token type="NameVariable"/> | ||
| </rule> | ||
| <rule pattern="\b(True|False|None|null|on|off)\b"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're getting closer.
The change to use word boundaries has fixed the issue when the keyword appears at the beginning or in the middle of a word, but there still seems to be an issue when the keyword appears at the end of a word. Here I mean 'word' as a 'space separated' word.
For example, for the following document still has the 'on' at the end of the word Documentation highlighted:
*** Settings ***
Documentation A test suite
I think the problem is that there is an ambiguity in the lexer grammar between the 'keyword' rule and the '.' rule in the root state that the initial word-boundary doesn't solve. I temporarily added a debug log to print the tokens and for that document I see:
syntax: token: token: Type: Keyword Value: '*** Settings ***' Start: 0 End: 16
syntax: token: token: Type: TextWhitespace Value: '
' Start: 16 End: 17
syntax: token: token: Type: Text Value: 'Documentati' Start: 17 End: 28
syntax: token: token: Type: KeywordConstant Value: 'on' Start: 28 End: 30
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 30 End: 31
syntax: token: token: Type: Text Value: 'A' Start: 31 End: 32
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 32 End: 33
syntax: token: token: Type: Text Value: 'test' Start: 33 End: 37
syntax: token: token: Type: TextWhitespace Value: ' ' Start: 37 End: 38
syntax: token: token: Type: Text Value: 'suite' Start: 38 End: 43
syntax: token: token: Type: EOFType Value: '' Start: 0 End: 0
It looks like the grammar '.' rule allows parsing each letter in 'Documentati' as text (which is then combined internally), but when it sees the 'on' it matches the keyword rule. I think the word boundary is somehow honoured because the text being passed to match the regexp is the remainder of the unmatched text, which is 'on A test suite', and so the first word-boundary is satisfied.
Perhaps we must only enter the 'keyword' state in a more specific condition? I don't know the syntax of robot files, but I am assuming that the keywords can only appear in test case sections, and only in a specific place in a line, and maybe only after some other syntax element that introduces it. If so, maybe we only need to enter the keyword state under those conditions.
If you want I can give you the source-code diff to print those logs from Anvil, or give you a custom compiled binary that prints the logs so you can troubleshoot it.
This PR adds support for the Robot Framework test suite https://robotframework.org/