* Re: Radio links work only in small numbers
2022-12-13 23:10 Radio links work only in small numbers Rudolf Adamkovič
@ 2022-12-14 8:30 ` Ihor Radchenko
2022-12-14 15:29 ` Rudolf Adamkovič
2022-12-14 15:47 ` Max Nikulin
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Ihor Radchenko @ 2022-12-14 8:30 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 588 bytes --]
Rudolf Adamkovič <salutis@me.com> writes:
> Greetings smart people!
>
> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
> that I may as well ask Org to make links for me. So, I replaced all the
> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
> errors out with "Regular expression too big".
>
> Does anyone know how to overcome this limitation? Or, perhaps someone
> has a patch in works that fixes it? If so, please let me know!
This is C-level error emitted by Emacs' regexp engine.
Can you try the attached patch?
[-- Attachment #2: 0001-org-update-radio-target-regexp-Try-to-get-a-shorter-.patch --]
[-- Type: text/x-patch, Size: 1631 bytes --]
From 53e6c067e4897c5b6c1ac8201e2dbc49baa9e6e8 Mon Sep 17 00:00:00 2001
Message-Id: <53e6c067e4897c5b6c1ac8201e2dbc49baa9e6e8.1671006575.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Wed, 14 Dec 2022 11:28:03 +0300
Subject: [PATCH] org-update-radio-target-regexp: Try to get a shorter regexp
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* lisp/ol.el (org-update-radio-target-regexp): Use `regexp-opt' to
compact the resulting regexp as much as possible. Otherwise, some
users hit regexp length limits when the number of radio targets
approaches thousands.
Reported-by: Rudolf Adamkovič <salutis@me.com>
Link: https://orgmode.org/list/m2lenax5m6.fsf@me.com
---
lisp/ol.el | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/lisp/ol.el b/lisp/ol.el
index 3ae8f8875..51ee028e4 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -2065,13 +2065,11 @@ (defun org-update-radio-target-regexp ()
(setq org-target-link-regexp
(and targets
(concat before-re
- (mapconcat
- (lambda (x)
- (replace-regexp-in-string
- " +" "\\s-+" (regexp-quote x) t t))
- targets
- "\\|")
- after-re)))
+ (replace-regexp-in-string
+ " +" "\\s-+"
+ (regexp-opt targets)
+ nil t)
+ after-re)))
(unless (equal old-regexp org-target-link-regexp)
;; Clean-up cache.
(let ((regexp (cond ((not old-regexp) org-target-link-regexp)
--
2.38.1
[-- Attachment #3: Type: text/plain, Size: 224 bytes --]
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-14 8:30 ` Ihor Radchenko
@ 2022-12-14 15:29 ` Rudolf Adamkovič
0 siblings, 0 replies; 17+ messages in thread
From: Rudolf Adamkovič @ 2022-12-14 15:29 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode
Ihor Radchenko <yantar92@posteo.net> writes:
> Can you try the attached patch?
No luck:
"Regular expression too big"
Still too large:
(length org-target-link-regexp) => 33219
:-(
Rudy
--
"Programming reliably -- must be an activity of an undeniably
mathematical nature […] You see, mathematics is about thinking, and
doing mathematics is always trying to think as well as possible."
-- Edsger W. Dijkstra, 1981
Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-13 23:10 Radio links work only in small numbers Rudolf Adamkovič
2022-12-14 8:30 ` Ihor Radchenko
@ 2022-12-14 15:47 ` Max Nikulin
2022-12-14 16:02 ` Ihor Radchenko
2022-12-20 1:00 ` Rudolf Adamkovič
2022-12-28 9:46 ` Daniel Clemente
2024-02-19 9:26 ` Ihor Radchenko
3 siblings, 2 replies; 17+ messages in thread
From: Max Nikulin @ 2022-12-14 15:47 UTC (permalink / raw)
To: emacs-orgmode
On 14/12/2022 06:10, Rudolf Adamkovič wrote:
>
> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
> that I may as well ask Org to make links for me. So, I replaced all the
> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
> errors out with "Regular expression too big".
I think, you are abusing the feature. Too light markup requires too
heavy processing. What about <<target>> and explicit [[target]]?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-14 15:47 ` Max Nikulin
@ 2022-12-14 16:02 ` Ihor Radchenko
2022-12-20 3:21 ` Timothy
2022-12-20 1:00 ` Rudolf Adamkovič
1 sibling, 1 reply; 17+ messages in thread
From: Ihor Radchenko @ 2022-12-14 16:02 UTC (permalink / raw)
To: Max Nikulin; +Cc: emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 14/12/2022 06:10, Rudolf Adamkovič wrote:
>>
>> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
>> that I may as well ask Org to make links for me. So, I replaced all the
>> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
>> errors out with "Regular expression too big".
>
> I think, you are abusing the feature. Too light markup requires too
> heavy processing. What about <<target>> and explicit [[target]]?
But why not?
It's not like we cannot fix this by splitting the regexp manually into
prefix tree.
The question is performance though.
Maybe Emacs people can offer some more ideas.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-14 16:02 ` Ihor Radchenko
@ 2022-12-20 3:21 ` Timothy
0 siblings, 0 replies; 17+ messages in thread
From: Timothy @ 2022-12-20 3:21 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: Max Nikulin, emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 869 bytes --]
Hi All,
> It’s not like we cannot fix this by splitting the regexp manually into
> prefix tree.
>
> The question is performance though.
For reference, I ran into this issue with Org Glossary, and did exactly this:
see <https://github.com/tecosaur/org-glossary/blob/master/org-glossary.el#L804>.
I sort all the strings being matched with `string<' and then create clumps of ~800
terms and then search with the constructed regex for each clump in turn.
Obviously this has a performance impact, but it seems to work well enough, and
working slower is clearly better than not working at all.
All the best,
Timothy
--
Timothy (‘tecosaur’/‘TEC’), Org mode contributor.
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/tec>.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-14 15:47 ` Max Nikulin
2022-12-14 16:02 ` Ihor Radchenko
@ 2022-12-20 1:00 ` Rudolf Adamkovič
1 sibling, 0 replies; 17+ messages in thread
From: Rudolf Adamkovič @ 2022-12-20 1:00 UTC (permalink / raw)
To: Max Nikulin, emacs-orgmode
Max Nikulin <manikulin@gmail.com> writes:
> On 14/12/2022 06:10, Rudolf Adamkovič wrote:
>>
>> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
>> that I may as well ask Org to make links for me. So, I replaced all the
>> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
>> errors out with "Regular expression too big".
>
> I think, you are abusing the feature. Too light markup requires too
> heavy processing. What about <<target>> and explicit [[target]]?
For context, I started using radio links back when I started studying
biology, and they helped me tremendously. However, as I learned more,
Org "gave up". Personally, I would not call taking some more notes
"abusing the feature", especially given the fact that the manual says
nothing about the limitation, and it stopped working pretty quickly.
That said, two days ago, I split my notebook into smaller files, so
radio links will not work for me anymore!
[1]: https://orgmode.org/manual/Radio-Targets.html
--
"Thinking is a momentary dismissal of irrelevancies."
-- Richard Buckminster Fuller, 1969
Rudolf Adamkovič <salutis@me.com> [he/him]
Studenohorská 25
84103 Bratislava
Slovakia
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-13 23:10 Radio links work only in small numbers Rudolf Adamkovič
2022-12-14 8:30 ` Ihor Radchenko
2022-12-14 15:47 ` Max Nikulin
@ 2022-12-28 9:46 ` Daniel Clemente
2024-02-19 9:26 ` Ihor Radchenko
3 siblings, 0 replies; 17+ messages in thread
From: Daniel Clemente @ 2022-12-28 9:46 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]
Hi,
I also found this limitation, and my solution was to disable radio links
and to replace them by a manual approach:
1. I disabled the call to (org-update-radio-target-regexp) in org.el. Well,
I added a boolean org-inhibit-startup-radio-refresh, that works in a
similar way to org-inhibit-startup-visibility-stuff. After this change, I
still type <<<this>>> to define a title, but org doesn't handle it anymore
2. I created a helm <https://github.com/emacs-helm/helm> menu, that offers
me all the radio links. I compute the list of radio links myself, through
grep, by looking for <<<. It's easier than it seems, and very fast. The
code (no explanations) is my configuration
<https://www.danielclemente.com/emacs/confi.html>, in these functions:
anythingyhelm-fuente-etiquetas-radio-org,
precarga-etiquetas-radio-de-wiki-para-helm
3. Often, when I want to refer to a title, I write in in cursive, /like
this/. That's my way of telling myself „that's a link, you can manually
search for it through helm“. The target will be tagged <<<like this>>>. I
could automate this link-following but I don't mind typing for 1 or 2
seconds to go a header. I still use normal links (C-c C-l, :ID: etc.) if I
want something that is easier to follow.
My system doesn't provide the same features as org's radio links, but I get
an interactive menu with pattern matching and very fast access to all
headers. I would still prefer the real radio links, with no limitations.
However, I understand that org-mode's approach is computationally harder.
Whereas I'm looking for ONE target link through all my 100 files, org-mode
is doing a regexp that looks for ALL target links in the current file. And
I have around 20k radio links!. Grepping for 20k things everywhere is
harder than grepping for 1 thing everywhere. Even GNU grep is slow (>25
seconds) if I use a long regular expression
<https://gist.github.com/clemente/4dbe4fd4d4e9a8cc88e11f4a731d88f8> with
20k things inside.
In other words: since radio links don't scale well, I have replaced the
real radio links (search for everything everywhere) with a directed
approach (search for 1 thing everywhere).
On Tue, 13 Dec 2022 at 23:11, Rudolf Adamkovič <salutis@me.com> wrote:
> Greetings smart people!
>
> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
> that I may as well ask Org to make links for me. So, I replaced all the
> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
> errors out with "Regular expression too big".
>
> Does anyone know how to overcome this limitation? Or, perhaps someone
> has a patch in works that fixes it? If so, please let me know!
>
> Thank you.
>
> Rudy
> --
> "The introduction of suitable abstractions is our only mental aid to
> organize and master complexity."
> -- Edsger Wybe Dijkstra, 1930-2002
>
> Rudolf Adamkovič <salutis@me.com> [he/him]
> Studenohorská 25
> 84103 Bratislava
> Slovakia
>
>
[-- Attachment #2: Type: text/html, Size: 3697 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2022-12-13 23:10 Radio links work only in small numbers Rudolf Adamkovič
` (2 preceding siblings ...)
2022-12-28 9:46 ` Daniel Clemente
@ 2024-02-19 9:26 ` Ihor Radchenko
2024-02-28 16:33 ` nobiot
3 siblings, 1 reply; 17+ messages in thread
From: Ihor Radchenko @ 2024-02-19 9:26 UTC (permalink / raw)
To: Rudolf Adamkovič; +Cc: emacs-orgmode
[-- Attachment #1: Type: text/plain, Size: 500 bytes --]
Rudolf Adamkovič <salutis@me.com> writes:
> All [[link]]s in my notes perfectly match LEVEL-1 headings, so I figured
> that I may as well ask Org to make links for me. So, I replaced all the
> ~4000 headings in my notes with radio <<<targets>>>. However, Org now
> errors out with "Regular expression too big".
>
> Does anyone know how to overcome this limitation? Or, perhaps someone
> has a patch in works that fixes it? If so, please let me know!
May you try the attached patch?
[-- Attachment #2: 0001-Work-around-regexp-size-limitation-for-large-number-.patch --]
[-- Type: text/x-patch, Size: 10623 bytes --]
From e01f5a56fff7b3a4c9ad016802a4da6a01053e22 Mon Sep 17 00:00:00 2001
Message-ID: <e01f5a56fff7b3a4c9ad016802a4da6a01053e22.1708334729.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Mon, 19 Feb 2024 12:19:34 +0300
Subject: [PATCH] Work around regexp size limitation for large number of link
targets
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* lisp/ol.el (org-target-link-regexp-limit): New constant defining
maximum regexp limit where `org-target-link-regexp' is still safe to
use without triggering "Regexp too long" error.
(org-target-link-regexps): New variable holding a series of shorter
regexps to be used instead of too long single
`org-target-link-regexp'.
(org--re-list-search-forward): New function like `re-search-forward',
but accepting a list of regexps.
(org--re-list-looking-at): New function like `looking-at', but
accepting a list of regexps.
(org-update-radio-target-regexp): When `org-target-link-regexp' is too
long, set `org-target-link-regexps', partitioning the link target list
into smaller regexps.
* lisp/org-element.el (org-element-link-parser):
(org-element--object-lex):
* lisp/org.el (org-activate-target-links): Use
`org--re-list-search-forward' and `org--re-list-looking-at' when
`org-target-link-regexps' is non-nil.
* testing/lisp/test-org-element.el (test-org-element/link-parser): Add
tests.
Reported-by: Rudolf Adamkovič <salutis@me.com>
Link: https://list.orgmode.org/orgmode/m2lenax5m6.fsf@me.com/
---
lisp/ol.el | 68 ++++++++++++++++++++++++++++++--
lisp/org-element.el | 16 ++++++--
lisp/org.el | 4 +-
testing/lisp/test-org-element.el | 17 ++++++++
4 files changed, 97 insertions(+), 8 deletions(-)
diff --git a/lisp/ol.el b/lisp/ol.el
index a680c43f3..6842fb8fe 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -52,6 +52,7 @@ (declare-function org-before-first-heading-p "org" ())
(declare-function org-do-occur "org" (regexp &optional cleanup))
(declare-function org-element-at-point "org-element" (&optional pom cached-only))
(declare-function org-element-cache-refresh "org-element" (pos))
+(declare-function org-element-cache-reset "org-element" (&optional all no-persistence))
(declare-function org-element-context "org-element" (&optional element))
(declare-function org-element-lineage "org-element-ast" (datum &optional types with-self))
(declare-function org-element-link-parser "org-element" ())
@@ -532,6 +533,13 @@ (defconst org-radio-target-regexp (format "<%s>" org-target-regexp)
(defvar-local org-target-link-regexp nil
"Regular expression matching radio targets in plain text.")
+(defconst org-target-link-regexp-limit (ash 2 15)
+ "Maximum allowed length of regexp.
+See MAX_BUF_SIZE in src/regex-emacs.c")
+(defvar-local org-target-link-regexps nil
+ "List of regular expressions matching radio targets in plain text.
+This list is non-nil, when a single regexp would be too long to match
+all the possible targets, exceeding Emacs' regexp length limit.")
(defvar org-link-types-re nil
"Matches a link that has a url-like prefix like \"http:\".")
@@ -2078,6 +2086,34 @@ (defun org-insert-link-global ()
(org-load-modules-maybe)
(org-run-like-in-org-mode 'org-insert-link))
+(defun org--re-list-search-forward (regexp-list &optional bound noerror count)
+ "Like `re-search-forward', but REGEXP-LIST is a list of regexps.
+BOUND, NOERROR, and COUNT are passed to `re-search-forward'."
+ (let (result (min-found most-positive-fixnum)
+ (pos-found nil)
+ (min-found-data nil)
+ (tail regexp-list))
+ (while tail
+ (setq result (save-excursion (re-search-forward (pop tail) bound t count)))
+ (when (and result (< result min-found))
+ (setq min-found result
+ pos-found (match-end 0)
+ min-found-data (match-data))))
+ (if (= most-positive-fixnum min-found)
+ (pcase noerror
+ (`t nil)
+ (_ (re-search-forward (car regexp-list) bound noerror count)))
+ (set-match-data min-found-data)
+ (goto-char pos-found))))
+
+(defun org--re-list-looking-at (regexp-list &optional inhibit-modify)
+ "Like `looking-at', but REGEXP-LIST is a list of regexps.
+INHIBIT-MODIFY is passed to `looking-at'."
+ (catch :found
+ (while regexp-list
+ (when (looking-at (pop regexp-list) inhibit-modify)
+ (throw :found t)))))
+
;;;###autoload
(defun org-update-radio-target-regexp ()
"Find all radio targets in this file and update the regular expression.
@@ -2115,6 +2151,30 @@ (defun org-update-radio-target-regexp ()
targets
"\\|")
after-re)))
+ (setq org-target-link-regexps nil)
+ (let (current-length sub-targets)
+ (when (<= org-target-link-regexp-limit (length org-target-link-regexp))
+ (while (or targets sub-targets)
+ (when (and sub-targets
+ (or (not targets)
+ (>= (+ current-length (length (car targets)))
+ org-target-link-regexp-limit)))
+ (push (concat before-re
+ (mapconcat
+ (lambda (x)
+ (replace-regexp-in-string
+ " +" "\\s-+" (regexp-quote x) t t))
+ (nreverse sub-targets)
+ "\\|")
+ after-re)
+ org-target-link-regexps)
+ (setq current-length nil
+ sub-targets nil))
+ (unless current-length
+ (setq current-length (+ (length before-re) (length after-re))))
+ (when targets (push (pop targets) sub-targets))
+ (cl-incf current-length (length (car sub-targets))))
+ (setq org-target-link-regexps (nreverse org-target-link-regexps))))
(unless (equal old-regexp org-target-link-regexp)
;; Clean-up cache.
(let ((regexp (cond ((not old-regexp) org-target-link-regexp)
@@ -2130,9 +2190,11 @@ (defun org-update-radio-target-regexp ()
after-re)))))
(when (and (featurep 'org-element)
(not (bound-and-true-p org-mode-loading)))
- (org-with-point-at 1
- (while (re-search-forward regexp nil t)
- (org-element-cache-refresh (match-beginning 1))))))
+ (if org-target-link-regexps
+ (org-element-cache-reset)
+ (org-with-point-at 1
+ (while (re-search-forward regexp nil t)
+ (org-element-cache-refresh (match-beginning 1)))))))
;; Re fontify buffer.
(when (memq 'radio org-highlight-links)
(org-restart-font-lock)))))
diff --git a/lisp/org-element.el b/lisp/org-element.el
index 091059c6c..4ad7fe8c3 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3842,7 +3842,9 @@ (defun org-element-link-parser ()
;; Type 1: Text targeted from a radio target.
((and org-target-link-regexp
(save-excursion (or (bolp) (backward-char))
- (looking-at org-target-link-regexp)))
+ (if org-target-link-regexps
+ (org--re-list-looking-at org-target-link-regexps)
+ (looking-at org-target-link-regexp))))
(setq type "radio")
(setq format 'plain)
(setq link-end (match-end 1))
@@ -5215,7 +5217,10 @@ (defun org-element--object-lex (restriction)
((not (memq 'link restriction)) nil)
((progn
(unless (bolp) (forward-char -1))
- (not (re-search-forward org-target-link-regexp nil t)))
+ (not
+ (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps nil t)
+ (re-search-forward org-target-link-regexp nil t))))
nil)
;; Since we moved backward, we do not want to
;; match again an hypothetical 1-character long
@@ -5224,8 +5229,11 @@ (defun org-element--object-lex (restriction)
;; beginning of line, we prevent this here.
((and (= start (1+ (line-beginning-position)))
(= start (match-end 1)))
- (and (re-search-forward org-target-link-regexp nil t)
- (1+ (match-beginning 1))))
+ (and
+ (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps nil t)
+ (re-search-forward org-target-link-regexp nil t))
+ (1+ (match-beginning 1))))
(t (1+ (match-beginning 1))))))
found)
(save-excursion
diff --git a/lisp/org.el b/lisp/org.el
index ad2394ae3..0c0376a9e 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5705,7 +5705,9 @@ (defun org-activate-target-links (limit)
;; `org-target-link-regexp' matches one character before the
;; actual target.
(unless (bolp) (forward-char -1))
- (when (re-search-forward org-target-link-regexp limit t)
+ (when (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps limit t)
+ (re-search-forward org-target-link-regexp limit t))
(org-remove-flyspell-overlays-in (match-beginning 1) (match-end 1))
(add-text-properties (match-beginning 1) (match-end 1)
(list 'mouse-face 'highlight
diff --git a/testing/lisp/test-org-element.el b/testing/lisp/test-org-element.el
index 93754bae5..61cd70693 100644
--- a/testing/lisp/test-org-element.el
+++ b/testing/lisp/test-org-element.el
@@ -2378,6 +2378,23 @@ (ert-deftest test-org-element/link-parser ()
(org-element-property
:type
(org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+ (should
+ (equal
+ "radio"
+ (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+ (org-update-radio-target-regexp)
+ (org-element-property
+ :type
+ (org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+ (should
+ (equal
+ "radio"
+ (let ((org-target-link-regexp-limit 9))
+ (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+ (org-update-radio-target-regexp)
+ (org-element-property
+ :type
+ (org-element-map (org-element-parse-buffer) 'link #'identity nil t))))))
;; Pathological case: radio target of length 1 at beginning of line
;; not followed by spaces.
(should
--
2.43.0
[-- Attachment #3: Type: text/plain, Size: 224 bytes --]
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-19 9:26 ` Ihor Radchenko
@ 2024-02-28 16:33 ` nobiot
2024-02-28 17:14 ` Ihor Radchenko
0 siblings, 1 reply; 17+ messages in thread
From: nobiot @ 2024-02-28 16:33 UTC (permalink / raw)
To: yantar92; +Cc: emacs-orgmode, salutis
Hi Ihor, Rudy, and everyone,
> May you try the attached patch?
I would love to help this patch move forward and would be happy to try
the patch, if this is not going to waste anyone's time:
(1) I took the liberty of creating two test Org files we can use on
sr.ht: https://git.sr.ht/~nobiot/org-radio-links-patch-20240228/tree
The two files, `500-terms.org` and `5000-terms.org`, contain 500
and 5000 radio targets respectively.
Both files have two H1 headlines "Definitions" and "Body text". Once
you open the file, call `M-x org-update-radio-target-regexp`. For
500 entries, radio targets work beautifully; for the 5000 entries, I
get 'org-element-context: Invalid regexp: "Regular expression too
big"' error.
(2) I am struggling to apply the patch cleanly to the current HEAD of
Org-mode source. I git-cloned the source from
https://git.sr.ht/~bzg/org-mode (commit 755fef38f Merge branch
'bugfix').
Which commit can I test the patch? I'd appreciate some guidance
here.
I use Magit and use "W" -> "a" -> "a" to apply the
plain patch file. I "get error: patch failed:
testing/lisp/test-org-element.el:2378". If I delete the hunk in the
patch for test-org-element.el, I then get "error: patch failed:
lisp/org.el:5705" on this latest commit.
Kind regards,
nobiot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-28 16:33 ` nobiot
@ 2024-02-28 17:14 ` Ihor Radchenko
2024-02-28 22:18 ` Noboru Ota
0 siblings, 1 reply; 17+ messages in thread
From: Ihor Radchenko @ 2024-02-28 17:14 UTC (permalink / raw)
To: nobiot; +Cc: emacs-orgmode, salutis
[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]
nobiot <me@nobiot.com> writes:
>> May you try the attached patch?
>
> I would love to help this patch move forward and would be happy to try
> the patch, if this is not going to waste anyone's time:
>
> (1) I took the liberty of creating two test Org files we can use on
> sr.ht: https://git.sr.ht/~nobiot/org-radio-links-patch-20240228/tree
>
> The two files, `500-terms.org` and `5000-terms.org`, contain 500
> and 5000 radio targets respectively.
>
> Both files have two H1 headlines "Definitions" and "Body text". Once
> you open the file, call `M-x org-update-radio-target-regexp`. For
> 500 entries, radio targets work beautifully; for the 5000 entries, I
> get 'org-element-context: Invalid regexp: "Regular expression too
> big"' error.
The attached version of the patch should work for 5000-terms.org.
(I had to scale down the maximum allowed regexp size; apparently,
the number from C sources was not small enough).
> (2) I am struggling to apply the patch cleanly to the current HEAD of
> Org-mode source. I git-cloned the source from
> https://git.sr.ht/~bzg/org-mode (commit 755fef38f Merge branch
> 'bugfix').
>
> I use Magit and use "W" -> "a" -> "a" to apply the
> plain patch file. I "get error: patch failed:
You need Www.
[-- Attachment #2: 0001-Work-around-regexp-size-limitation-for-large-number-.patch --]
[-- Type: text/x-patch, Size: 10623 bytes --]
From f565a9f3187ae99680ec92969bb3f6c29b542b04 Mon Sep 17 00:00:00 2001
Message-ID: <f565a9f3187ae99680ec92969bb3f6c29b542b04.1709140338.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Mon, 19 Feb 2024 12:19:34 +0300
Subject: [PATCH] Work around regexp size limitation for large number of link
targets
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* lisp/ol.el (org-target-link-regexp-limit): New constant defining
maximum regexp limit where `org-target-link-regexp' is still safe to
use without triggering "Regexp too long" error.
(org-target-link-regexps): New variable holding a series of shorter
regexps to be used instead of too long single
`org-target-link-regexp'.
(org--re-list-search-forward): New function like `re-search-forward',
but accepting a list of regexps.
(org--re-list-looking-at): New function like `looking-at', but
accepting a list of regexps.
(org-update-radio-target-regexp): When `org-target-link-regexp' is too
long, set `org-target-link-regexps', partitioning the link target list
into smaller regexps.
* lisp/org-element.el (org-element-link-parser):
(org-element--object-lex):
* lisp/org.el (org-activate-target-links): Use
`org--re-list-search-forward' and `org--re-list-looking-at' when
`org-target-link-regexps' is non-nil.
* testing/lisp/test-org-element.el (test-org-element/link-parser): Add
tests.
Reported-by: Rudolf Adamkovič <salutis@me.com>
Link: https://list.orgmode.org/orgmode/m2lenax5m6.fsf@me.com/
---
lisp/ol.el | 68 ++++++++++++++++++++++++++++++--
lisp/org-element.el | 16 ++++++--
lisp/org.el | 4 +-
testing/lisp/test-org-element.el | 17 ++++++++
4 files changed, 97 insertions(+), 8 deletions(-)
diff --git a/lisp/ol.el b/lisp/ol.el
index 22782578c..97423738a 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -52,6 +52,7 @@ (declare-function org-before-first-heading-p "org" ())
(declare-function org-do-occur "org" (regexp &optional cleanup))
(declare-function org-element-at-point "org-element" (&optional pom cached-only))
(declare-function org-element-cache-refresh "org-element" (pos))
+(declare-function org-element-cache-reset "org-element" (&optional all no-persistence))
(declare-function org-element-context "org-element" (&optional element))
(declare-function org-element-lineage "org-element-ast" (datum &optional types with-self))
(declare-function org-element-link-parser "org-element" ())
@@ -532,6 +533,13 @@ (defconst org-radio-target-regexp (format "<%s>" org-target-regexp)
(defvar-local org-target-link-regexp nil
"Regular expression matching radio targets in plain text.")
+(defconst org-target-link-regexp-limit (ash 2 10)
+ "Maximum allowed length of regexp.
+See MAX_BUF_SIZE in src/regex-emacs.c")
+(defvar-local org-target-link-regexps nil
+ "List of regular expressions matching radio targets in plain text.
+This list is non-nil, when a single regexp would be too long to match
+all the possible targets, exceeding Emacs' regexp length limit.")
(defvar org-link-types-re nil
"Matches a link that has a url-like prefix like \"http:\".")
@@ -2170,6 +2178,34 @@ (defun org-insert-link-global ()
(org-load-modules-maybe)
(org-run-like-in-org-mode 'org-insert-link))
+(defun org--re-list-search-forward (regexp-list &optional bound noerror count)
+ "Like `re-search-forward', but REGEXP-LIST is a list of regexps.
+BOUND, NOERROR, and COUNT are passed to `re-search-forward'."
+ (let (result (min-found most-positive-fixnum)
+ (pos-found nil)
+ (min-found-data nil)
+ (tail regexp-list))
+ (while tail
+ (setq result (save-excursion (re-search-forward (pop tail) bound t count)))
+ (when (and result (< result min-found))
+ (setq min-found result
+ pos-found (match-end 0)
+ min-found-data (match-data))))
+ (if (= most-positive-fixnum min-found)
+ (pcase noerror
+ (`t nil)
+ (_ (re-search-forward (car regexp-list) bound noerror count)))
+ (set-match-data min-found-data)
+ (goto-char pos-found))))
+
+(defun org--re-list-looking-at (regexp-list &optional inhibit-modify)
+ "Like `looking-at', but REGEXP-LIST is a list of regexps.
+INHIBIT-MODIFY is passed to `looking-at'."
+ (catch :found
+ (while regexp-list
+ (when (looking-at (pop regexp-list) inhibit-modify)
+ (throw :found t)))))
+
;;;###autoload
(defun org-update-radio-target-regexp ()
"Find all radio targets in this file and update the regular expression.
@@ -2207,6 +2243,30 @@ (defun org-update-radio-target-regexp ()
targets
"\\|")
after-re)))
+ (setq org-target-link-regexps nil)
+ (let (current-length sub-targets)
+ (when (<= org-target-link-regexp-limit (length org-target-link-regexp))
+ (while (or targets sub-targets)
+ (when (and sub-targets
+ (or (not targets)
+ (>= (+ current-length (length (car targets)))
+ org-target-link-regexp-limit)))
+ (push (concat before-re
+ (mapconcat
+ (lambda (x)
+ (replace-regexp-in-string
+ " +" "\\s-+" (regexp-quote x) t t))
+ (nreverse sub-targets)
+ "\\|")
+ after-re)
+ org-target-link-regexps)
+ (setq current-length nil
+ sub-targets nil))
+ (unless current-length
+ (setq current-length (+ (length before-re) (length after-re))))
+ (when targets (push (pop targets) sub-targets))
+ (cl-incf current-length (length (car sub-targets))))
+ (setq org-target-link-regexps (nreverse org-target-link-regexps))))
(unless (equal old-regexp org-target-link-regexp)
;; Clean-up cache.
(let ((regexp (cond ((not old-regexp) org-target-link-regexp)
@@ -2222,9 +2282,11 @@ (defun org-update-radio-target-regexp ()
after-re)))))
(when (and (featurep 'org-element)
(not (bound-and-true-p org-mode-loading)))
- (org-with-point-at 1
- (while (re-search-forward regexp nil t)
- (org-element-cache-refresh (match-beginning 1))))))
+ (if org-target-link-regexps
+ (org-element-cache-reset)
+ (org-with-point-at 1
+ (while (re-search-forward regexp nil t)
+ (org-element-cache-refresh (match-beginning 1)))))))
;; Re fontify buffer.
(when (memq 'radio org-highlight-links)
(org-restart-font-lock)))))
diff --git a/lisp/org-element.el b/lisp/org-element.el
index 573de0106..c490dcd29 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3848,7 +3848,9 @@ (defun org-element-link-parser ()
;; Type 1: Text targeted from a radio target.
((and org-target-link-regexp
(save-excursion (or (bolp) (backward-char))
- (looking-at org-target-link-regexp)))
+ (if org-target-link-regexps
+ (org--re-list-looking-at org-target-link-regexps)
+ (looking-at org-target-link-regexp))))
(setq type "radio")
(setq format 'plain)
(setq link-end (match-end 1))
@@ -5221,7 +5223,10 @@ (defun org-element--object-lex (restriction)
((not (memq 'link restriction)) nil)
((progn
(unless (bolp) (forward-char -1))
- (not (re-search-forward org-target-link-regexp nil t)))
+ (not
+ (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps nil t)
+ (re-search-forward org-target-link-regexp nil t))))
nil)
;; Since we moved backward, we do not want to
;; match again an hypothetical 1-character long
@@ -5230,8 +5235,11 @@ (defun org-element--object-lex (restriction)
;; beginning of line, we prevent this here.
((and (= start (1+ (line-beginning-position)))
(= start (match-end 1)))
- (and (re-search-forward org-target-link-regexp nil t)
- (1+ (match-beginning 1))))
+ (and
+ (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps nil t)
+ (re-search-forward org-target-link-regexp nil t))
+ (1+ (match-beginning 1))))
(t (1+ (match-beginning 1))))))
found)
(save-excursion
diff --git a/lisp/org.el b/lisp/org.el
index 28f43cb67..89cc8bdd9 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5710,7 +5710,9 @@ (defun org-activate-target-links (limit)
;; `org-target-link-regexp' matches one character before the
;; actual target.
(unless (bolp) (forward-char -1))
- (when (re-search-forward org-target-link-regexp limit t)
+ (when (if org-target-link-regexps
+ (org--re-list-search-forward org-target-link-regexps limit t)
+ (re-search-forward org-target-link-regexp limit t))
(org-remove-flyspell-overlays-in (match-beginning 1) (match-end 1))
(add-text-properties (match-beginning 1) (match-end 1)
(list 'mouse-face 'highlight
diff --git a/testing/lisp/test-org-element.el b/testing/lisp/test-org-element.el
index 93754bae5..61cd70693 100644
--- a/testing/lisp/test-org-element.el
+++ b/testing/lisp/test-org-element.el
@@ -2378,6 +2378,23 @@ (ert-deftest test-org-element/link-parser ()
(org-element-property
:type
(org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+ (should
+ (equal
+ "radio"
+ (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+ (org-update-radio-target-regexp)
+ (org-element-property
+ :type
+ (org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+ (should
+ (equal
+ "radio"
+ (let ((org-target-link-regexp-limit 9))
+ (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+ (org-update-radio-target-regexp)
+ (org-element-property
+ :type
+ (org-element-map (org-element-parse-buffer) 'link #'identity nil t))))))
;; Pathological case: radio target of length 1 at beginning of line
;; not followed by spaces.
(should
--
2.43.0
[-- Attachment #3: Type: text/plain, Size: 224 bytes --]
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-28 17:14 ` Ihor Radchenko
@ 2024-02-28 22:18 ` Noboru Ota
2024-02-29 11:11 ` Ihor Radchenko
0 siblings, 1 reply; 17+ messages in thread
From: Noboru Ota @ 2024-02-28 22:18 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode, salutis
Ihor Radchenko <yantar92@posteo.net> writes:
> You need Www.
> The attached version of the patch should work for 5000-terms.org.
Thank you, Ihor. With your advice, I have managed to apply the patch. It
works for 5000 terms! Thank you.
Some observations:
- Running `org-update-radio-target-regexp` takes about 15 seconds for the
file[1].
- Writing experience is very good. When I type one of the terms, and a
radio target link gets added without noticable delay for me on my end.
> (I had to scale down the maximum allowed regexp size; apparently,
> the number from C sources was not small enough).
Do you have any indication on how many would be maximum?
Thank you.
---
[1]: You can also see the benchmark results in the file. https://git.sr.ht/~nobiot/org-radio-links-patch-20240228/tree/main/item/5000-terms.org
– nobiot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-28 22:18 ` Noboru Ota
@ 2024-02-29 11:11 ` Ihor Radchenko
2024-02-29 16:47 ` Noboru Ota
2024-04-04 20:20 ` Rudolf Adamkovič
0 siblings, 2 replies; 17+ messages in thread
From: Ihor Radchenko @ 2024-02-29 11:11 UTC (permalink / raw)
To: Noboru Ota; +Cc: emacs-orgmode, salutis
Noboru Ota <me@nobiot.com> writes:
>> You need Www.
>> The attached version of the patch should work for 5000-terms.org.
>
> Thank you, Ihor. With your advice, I have managed to apply the patch. It
> works for 5000 terms! Thank you.
Thanks for testing!
Applied, onto main.
https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=341a01a07
Fixed.
> Some observations:
>
> - Running `org-update-radio-target-regexp` takes about 15 seconds for the
> file[1].
This is expected. By splitting radio target regexp into multiple
regexps, we increase the number of regexp search invocations. Regexp
search is relatively costly in Emacs.
>> (I had to scale down the maximum allowed regexp size; apparently,
>> the number from C sources was not small enough).
>
> Do you have any indication on how many would be maximum?
See the docstring of `org-target-link-regexp-limit'.
The hard limit is MAX_BUF_SIZE in src/regex-emacs.c, but that's after
the regexp is compiled (AFAIU). So, the real limit is smaller and may
depend on regexp complexity.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-29 11:11 ` Ihor Radchenko
@ 2024-02-29 16:47 ` Noboru Ota
2024-02-29 17:21 ` Ihor Radchenko
2024-04-04 20:20 ` Rudolf Adamkovič
1 sibling, 1 reply; 17+ messages in thread
From: Noboru Ota @ 2024-02-29 16:47 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode, salutis
Ihor Radchenko <yantar92@posteo.net> writes:
> Thanks for testing!
> Applied, onto main.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=341a01a07
> Fixed.
You're very welcome. This is amazing. Thank you so much. I have rebased
the source to the latest main and also tested with 10,000 dummy entries.
The target links still work!
> See the docstring of `org-target-link-regexp-limit'. The hard limit is
> MAX_BUF_SIZE in src/regex-emacs.c, but that's after the regexp is
> compiled (AFAIU). So, the real limit is smaller and may depend on
> regexp complexity.
Thank you for the detail.
I do not need to understand all the technical detail here, but for
future reference and if you have the time, let me ask this -- please try
not to waste your time answering.
- MAX_BUF_SIZE in src/regex-emacs.c is, if I understand the comment
around it correctly, 2^15 bytes = 32768 bytes.
- `org-target-link-regexp-limit` is 2^11 bytes = 2048 bytes (?).
- For my 5000 terms, (length org-target-link-regexp) => 103603 (what's
the unit of measure?)
- (string-bytes org-target-link-regexp) => 103603, the same number.
- If I evaluate the following to try to get a compiled length of the
regexp, 88785.
(let ((regexp org-target-link-regexp))
(with-temp-file "org-target-link-regexp"
(insert regexp)))
(byte-compile-file "org-target-link-regexp")
(file-attribute-size (file-attributes "org-target-link-regexp.elc"))
- For 10000 dummy terms, (length org-target-link-regexp) => 228848.
The numbers I get seem to be much greater than 2048 (bytes?) and 32768
bytes....
My test with 10,000 terms is more than sufficient to cover my needs, so
please do not feel obliged to answer this technical part -- someone else
might want to jump in to help :)
– nobiot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-29 16:47 ` Noboru Ota
@ 2024-02-29 17:21 ` Ihor Radchenko
2024-02-29 17:44 ` Noboru Ota
0 siblings, 1 reply; 17+ messages in thread
From: Ihor Radchenko @ 2024-02-29 17:21 UTC (permalink / raw)
To: Noboru Ota; +Cc: emacs-orgmode, salutis
Noboru Ota <me@nobiot.com> writes:
> - MAX_BUF_SIZE in src/regex-emacs.c is, if I understand the comment
> around it correctly, 2^15 bytes = 32768 bytes.
>
> - `org-target-link-regexp-limit` is 2^11 bytes = 2048 bytes (?).
>
> - For my 5000 terms, (length org-target-link-regexp) => 103603 (what's
> the unit of measure?)
>
> - (string-bytes org-target-link-regexp) => 103603, the same number.
org-target-link-regexp does not matter in this case. For this long
regexps I introduced a new variable org-target-link-regexps that
contains a list of shorter regexps. This list is used instead of
org-target-link-regexp.
> - If I evaluate the following to try to get a compiled length of the
> regexp, 88785.
By "compiled", I did not mean byte-compiled. Regexp compilation is a
completely different beast. It is performed by Emacs C internals,
converting string regexp into representation used by regexp matching
code. This compiled representation is not exposed to Elisp.
If you want to learn more about regexp matching theory and
implementation, you may check
https://blog.burntsushi.net/regex-internals/
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-29 17:21 ` Ihor Radchenko
@ 2024-02-29 17:44 ` Noboru Ota
0 siblings, 0 replies; 17+ messages in thread
From: Noboru Ota @ 2024-02-29 17:44 UTC (permalink / raw)
To: Ihor Radchenko; +Cc: emacs-orgmode, salutis
Ihor Radchenko <yantar92@posteo.net> writes:
> org-target-link-regexp does not matter in this case. For this long
> regexps I introduced a new variable org-target-link-regexps that
> contains a list of shorter regexps. This list is used instead of
> org-target-link-regexp.
Ah! Thank you.
If it is valid to compare (length org-target-link-regexps) and 2048, at
least for a rough indication, I have these:
- 33 for 5000 terms
- 76 for 10000 terms
It seems we should not worry about hitting the limit in normal personal
uses.
– nobiot
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Radio links work only in small numbers
2024-02-29 11:11 ` Ihor Radchenko
2024-02-29 16:47 ` Noboru Ota
@ 2024-04-04 20:20 ` Rudolf Adamkovič
1 sibling, 0 replies; 17+ messages in thread
From: Rudolf Adamkovič @ 2024-04-04 20:20 UTC (permalink / raw)
To: Ihor Radchenko, Noboru Ota; +Cc: emacs-orgmode, salutis
Ihor Radchenko <yantar92@posteo.net> writes:
> Thanks for testing!
> Applied, onto main.
> https://git.savannah.gnu.org/cgit/emacs/org-mode.git/commit/?id=341a01a07
> Fixed.
Thanks everyone for chiming in and fixing the problem. I have been
unexpectedly busy lately, hence the radio silence [pun intended].
Great work!
Rudy
--
"The whole science is nothing more than a refinement of everyday
thinking." --- Albert Einstein, 1879-1955
Rudolf Adamkovič <rudolf@adamkovic.org> [he/him]
Studenohorská 25, 84103 Bratislava, Slovakia, European Union
^ permalink raw reply [flat|nested] 17+ messages in thread