Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@
Changelog
================

Version 1.0.0
-------------

- Remove formatting from ``ParseString.__str__``, as this is not needed in
scancode-toolkit and is a performance issue


Version 0.9.0
-------------
Expand Down
15 changes: 2 additions & 13 deletions src/pygmars/parse.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,18 +319,7 @@ def __repr__(self):
return f"<ParseString: {self._parse_string!r}>"

def __str__(self):
"""
Return a formatted representation of this ``ParseString``. This
representation includes extra spaces to ensure that labels will line up
with the representation of other ``ParseString`` for the same text,
regardless of the grouping.
"""
# Add spaces to make everything line up.
s = re.sub(r">(?!\})", r"> ", self._parse_string)
s = re.sub(r"([^\{])<", r"\1 <", s)
if s[0] == "<":
s = " " + s
return s.rstrip()
return self._parse_string.rstrip()


# used to split a ParseString on labels and braces delimiters
Expand Down Expand Up @@ -525,7 +514,7 @@ def parse(self, tree, trace=0):
if after_parse != before_parse:
# only update the tree and the trace if there have been changes from this parse
if trace:
updated = re.sub(r"\{[^\{]+\}", f" <{self.label}> ", after_parse)
updated = re.sub(r"\{[^\{]+\}", f"<{self.label}>", after_parse)
trace_elements.append("-------------------------------------")
trace_elements.append(f"Rule.parse: applied rule: {self!r}")
trace_elements.append(f" Rule regex: {self._regexp}")
Expand Down
42 changes: 15 additions & 27 deletions tests/test_parse_doctest.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,18 +82,6 @@
...
AttributeError: 'str' object has no attribute 'label'

The `str()` for a parse string adds spaces to it, which makes it line
up with `str()` output for other parse strings over the same
underlying input.

>>> cs = ParseString(t1)
>>> print(cs)
<T0> <T1> <T2> <T3> <T4> <T5> <T6> <T7> <T8> <T9>
>>> cs.apply_transform(partial(re.compile('<T3>').sub, '{<T3>}'))
'<T0><T1><T2>{<T3>}<T4><T5><T6><T7><T8><T9>'
>>> print(cs)
<T0> <T1> <T2> {<T3>} <T4> <T5> <T6> <T7> <T8> <T9>

The `validate()` method makes sure that the parsing does not corrupt
the parse string. By setting validate=True, `validate()` will be
called at the end of every call to `apply_transform`.
Expand Down Expand Up @@ -203,41 +191,41 @@
Rule.parse: applied rule: <Rule: <DT>? <JJ>* <NN>* / NP # NP>
Rule regex: (?P<group>(?:<(?:DT)>)?(?:<(?:JJ)>)*(?:<(?:NN)>)*)
Input parsed to label: NP
before : <DT> <NN> <VBD> <IN> <DT> <NN> <DT> <NN> <VBD>
after : {<DT> <NN>} <VBD> <IN> {<DT> <NN>}{<DT> <NN>} <VBD>
new : <NP> <VBD> <IN> <NP> <NP> <VBD>
before : <DT><NN><VBD><IN><DT><NN><DT><NN><VBD>
after : {<DT><NN>}<VBD><IN>{<DT><NN>}{<DT><NN>}<VBD>
new : <NP><VBD><IN><NP><NP><VBD>
length : 9,6
-------------------------------------
Rule.parse: applied rule: <Rule: <IN> / P # Preposition>
Rule regex: (?P<group>(?:<(?:IN)>))
Input parsed to label: P
before : <NP> <VBD> <IN> <NP> <NP> <VBD>
after : <NP> <VBD> {<IN>} <NP> <NP> <VBD>
new : <NP> <VBD> <P> <NP> <NP> <VBD>
before : <NP><VBD><IN><NP><NP><VBD>
after : <NP><VBD>{<IN>}<NP><NP><VBD>
new : <NP><VBD><P><NP><NP><VBD>
length : 6,6
-------------------------------------
Rule.parse: applied rule: <Rule: <V.*> / V # Verb>
Rule regex: (?P<group>(?:<(?:V[^\{\}<>]*)>))
Input parsed to label: V
before : <NP> <VBD> <P> <NP> <NP> <VBD>
after : <NP> {<VBD>} <P> <NP> <NP> {<VBD>}
new : <NP> <V> <P> <NP> <NP> <V>
before : <NP><VBD><P><NP><NP><VBD>
after : <NP>{<VBD>}<P><NP><NP>{<VBD>}
new : <NP><V><P><NP><NP><V>
length : 6,6
-------------------------------------
Rule.parse: applied rule: <Rule: <P> <NP> / PP # PP -> P NP>
Rule regex: (?P<group>(?:<(?:P)>)(?:<(?:NP)>))
Input parsed to label: PP
before : <NP> <V> <P> <NP> <NP> <V>
after : <NP> <V> {<P> <NP>} <NP> <V>
new : <NP> <V> <PP> <NP> <V>
before : <NP><V><P><NP><NP><V>
after : <NP><V>{<P><NP>}<NP><V>
new : <NP><V><PP><NP><V>
length : 6,5
-------------------------------------
Rule.parse: applied rule: <Rule: <V> <NP|PP>* / VP # VP -> V (NP|PP)*>
Rule regex: (?P<group>(?:<(?:V)>)(?:<(?:NP|PP)>)*)
Input parsed to label: VP
before : <NP> <V> <PP> <NP> <V>
after : <NP> {<V> <PP> <NP>}{<V>}
new : <NP> <VP> <VP>
before : <NP><V><PP><NP><V>
after : <NP>{<V><PP><NP>}{<V>}
new : <NP><VP><VP>
length : 5,3
parse tree: (label='ROOT', children=(
(label='NP', children=(
Expand Down
Loading