On 2023-08-02 08:45, Ihor Radchenko wrote: > `rx' would be great. > But even adding comments like in your example would be an improvement. Since the future of this code snippet seems to be uncertain I went for comments only. And I thought I was pretty much done when I noticed at least one major issues in the existing code, so I decided to go with a prerelease first plus some notes and questions. So there will be a follow up to the attached patch, and I leave it to you whether you give it already a review or not. But I'd ask you for your opinion on the following notes, where the first few should be uncritical: - I used "\(?NUM: ... \)" constructs to explicitly number the subres. Hope this is OK w.r.t. style and backward-compatibility. - I fixed the operator-matching subre to also include `==', `!=', `/=' but exclude `<<' and the like which currently give void-function errors. - I did not fix some "a[^b]*b"-style subres to use non-greedy variants since these are strictly speaking not identical. Even though newline characters shouldn't play a big role here ... - I likewise did not fix the number-matching subre allowing for numbers like "1.2.3" to keep things short at least there. `string-to-number' silently takes care of these, even if an exponent gets lost that way. But from here it gets more intersting: - The code uses subre "\\\\-" in property names to (supposedly) allow for inclusion of minus characters in property names, which (probably) could be confused with term negation. - It also unquotes these minus characters for {tag regexps}: (tag (save-match-data (replace-regexp-in-string "\\\\-" "-" (match-string 2 term)))) But it never unquotes them in property names. That missing unquoting could be easily amended, but: - The other issue is: Why do we need "\\\\-" for both property names and {tag regexps}? This forces us to do queries like: {[a\\-z]}|foo\\-bar="baz" where in my opinion {[a\-z]}|foo\-bar="baz" should be sufficient. - Even more, IMO one could do away completely with the minus-quoting and unquoting, since the overall regexp should allow for unambiguously matching minus characters both + in {tag regexps} (because of "{[^}]+}" gobbling them) and + in property names (because a property name must always be followed by some operator) *without* them getting confused with term negation. Or do I miss something here? A cursory test with sth like +foo-bar="xxx"-patchday=202302 seems to work fine. - However, removing the unquoting of {tag regexps} would be a breaking change. Even though I doubt anybody has ever used it, the more it is not mentioned in the documentation. > I had this in mind for a wile, but I am still hoping that we can > eventually (when it is added to Emacs) rely upon peg.el for parsing. Given the fact that we have to discuss issues like those above, I heartily agree. > https://yhetil.org/emacs-devel/875yvtbbn3.fsf@ericabrahamsen.net/ Arthouse thread: Interesting plot, surprising sidelines, not everything comprehensible, (unfortunately) open end.