Commit 0454f13
committed
Allow custom character classes to begin with
ICU and Oniguruma allow custom character classes to
begin with `:`, and only lex a POSIX character
property if they detect a closing `:]`. However
their behavior for this differs:
- ICU will consider *any* `:]` in the regex as a
closing delimiter, even e.g `[[:a]][:]`.
- Oniguruma will stop if it hits a `]`, so
`[[:a]][:]` is treated as a custom character class.
However it only scans ahead 20 chars max, and doesn't
stop for e.g a nested character class opening `[`.
Our detection behavior for this is as follows:
- When `[:` is encountered inside a custom character
class, scan ahead to the closing `:]`.
- While scanning, bail if we see any characters
that are obviously invalid property names. Currently
this includes `[`, `]`, `}`, as well as a second
occurrence of `=`.
- Otherwise, if we end on `:]`, consider that a
POSIX character property.
We could include more metacharacters to bail on,
e.g `{`, `(`, `)`, but for now I'm tending on the
side of lexing an invalid POSIX character property.
We can always relax this in the future (as we'd be
turning invalid code into valid code). Users can
always escape the initial `:` in `[:` if they want
a custom character class. In fact, we may want to
suggest this via a warning, as this behavior can
be pretty subtle.:
1 parent fed4c53 commit 0454f13
File tree
3 files changed
+129
-35
lines changed- Sources/_RegexParser/Regex/Parse
- Tests/RegexTests
3 files changed
+129
-35
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1064 | 1064 | | |
1065 | 1065 | | |
1066 | 1066 | | |
| 1067 | + | |
1067 | 1068 | | |
1068 | 1069 | | |
1069 | | - | |
1070 | | - | |
1071 | | - | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
1072 | 1077 | | |
1073 | 1078 | | |
1074 | 1079 | | |
| |||
1099 | 1104 | | |
1100 | 1105 | | |
1101 | 1106 | | |
| 1107 | + | |
1102 | 1108 | | |
1103 | | - | |
1104 | | - | |
1105 | | - | |
1106 | | - | |
1107 | | - | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
1108 | 1139 | | |
1109 | 1140 | | |
1110 | 1141 | | |
| |||
1129 | 1160 | | |
1130 | 1161 | | |
1131 | 1162 | | |
1132 | | - | |
1133 | | - | |
1134 | | - | |
1135 | | - | |
1136 | | - | |
1137 | | - | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
1142 | | - | |
1143 | | - | |
1144 | | - | |
1145 | | - | |
1146 | | - | |
1147 | | - | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
| 1176 | + | |
| 1177 | + | |
| 1178 | + | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
1148 | 1188 | | |
1149 | | - | |
1150 | | - | |
1151 | 1189 | | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
1152 | 1209 | | |
1153 | 1210 | | |
1154 | 1211 | | |
| |||
1164 | 1221 | | |
1165 | 1222 | | |
1166 | 1223 | | |
1167 | | - | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
1168 | 1228 | | |
1169 | 1229 | | |
1170 | 1230 | | |
| |||
1758 | 1818 | | |
1759 | 1819 | | |
1760 | 1820 | | |
1761 | | - | |
1762 | | - | |
1763 | | - | |
1764 | | - | |
1765 | | - | |
| 1821 | + | |
| 1822 | + | |
1766 | 1823 | | |
1767 | 1824 | | |
1768 | 1825 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
406 | | - | |
| 406 | + | |
407 | 407 | | |
408 | 408 | | |
409 | 409 | | |
| |||
487 | 487 | | |
488 | 488 | | |
489 | 489 | | |
490 | | - | |
| 490 | + | |
491 | 491 | | |
492 | 492 | | |
493 | 493 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
474 | 474 | | |
475 | 475 | | |
476 | 476 | | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
477 | 497 | | |
478 | 498 | | |
479 | 499 | | |
| |||
1096 | 1116 | | |
1097 | 1117 | | |
1098 | 1118 | | |
| 1119 | + | |
1099 | 1120 | | |
1100 | 1121 | | |
1101 | 1122 | | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
1102 | 1126 | | |
1103 | 1127 | | |
1104 | 1128 | | |
| |||
2183 | 2207 | | |
2184 | 2208 | | |
2185 | 2209 | | |
2186 | | - | |
| 2210 | + | |
| 2211 | + | |
| 2212 | + | |
| 2213 | + | |
| 2214 | + | |
2187 | 2215 | | |
2188 | 2216 | | |
2189 | 2217 | | |
| |||
2218 | 2246 | | |
2219 | 2247 | | |
2220 | 2248 | | |
| 2249 | + | |
| 2250 | + | |
| 2251 | + | |
| 2252 | + | |
| 2253 | + | |
| 2254 | + | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
2221 | 2258 | | |
2222 | 2259 | | |
2223 | 2260 | | |
| |||
0 commit comments