emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: nobiot <me@nobiot.com>
Cc: emacs-orgmode@gnu.org, salutis@me.com
Subject: Re: Radio links work only in small numbers
Date: Wed, 28 Feb 2024 17:14:02 +0000	[thread overview]
Message-ID: <87a5nk3411.fsf@localhost> (raw)
In-Reply-To: <86il281rc6.fsf@nobiot.com>

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

nobiot <me@nobiot.com> writes:

>> May you try the attached patch?
>
> I would love to help this patch move forward and would be happy to try
> the patch, if this is not going to waste anyone's time:
>
> (1) I took the liberty of creating two test Org files we can use on
>     sr.ht: https://git.sr.ht/~nobiot/org-radio-links-patch-20240228/tree
>
>     The two files, `500-terms.org`  and `5000-terms.org`, contain 500
>     and 5000 radio targets respectively.
>
>     Both files have two H1 headlines "Definitions" and "Body text". Once
>     you open the file, call `M-x org-update-radio-target-regexp`. For
>     500 entries, radio targets work beautifully; for the 5000 entries, I
>     get 'org-element-context: Invalid regexp: "Regular expression too
>     big"' error.

The attached version of the patch should work for 5000-terms.org.
(I had to scale down the maximum allowed regexp size; apparently,
the number from C sources was not small enough).

> (2) I am struggling to apply the patch cleanly to the current HEAD of
>     Org-mode source. I git-cloned the source from
>     https://git.sr.ht/~bzg/org-mode (commit 755fef38f Merge branch
>     'bugfix').
>
>     I use Magit and use "W" -> "a" -> "a" to apply the
>     plain patch file. I "get error: patch failed:

You need Www.


[-- Attachment #2: 0001-Work-around-regexp-size-limitation-for-large-number-.patch --]
[-- Type: text/x-patch, Size: 10623 bytes --]

From f565a9f3187ae99680ec92969bb3f6c29b542b04 Mon Sep 17 00:00:00 2001
Message-ID: <f565a9f3187ae99680ec92969bb3f6c29b542b04.1709140338.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Mon, 19 Feb 2024 12:19:34 +0300
Subject: [PATCH] Work around regexp size limitation for large number of link
 targets
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lisp/ol.el (org-target-link-regexp-limit): New constant defining
maximum regexp limit where `org-target-link-regexp' is still safe to
use without triggering "Regexp too long" error.
(org-target-link-regexps): New variable holding a series of shorter
regexps to be used instead of too long single
`org-target-link-regexp'.
(org--re-list-search-forward): New function like `re-search-forward',
but accepting a list of regexps.
(org--re-list-looking-at): New function like `looking-at', but
accepting a list of regexps.
(org-update-radio-target-regexp): When `org-target-link-regexp' is too
long, set `org-target-link-regexps', partitioning the link target list
into smaller regexps.
* lisp/org-element.el (org-element-link-parser):
(org-element--object-lex):
* lisp/org.el (org-activate-target-links): Use
`org--re-list-search-forward' and `org--re-list-looking-at' when
`org-target-link-regexps' is non-nil.
* testing/lisp/test-org-element.el (test-org-element/link-parser): Add
tests.

Reported-by: Rudolf Adamkovič <salutis@me.com>
Link: https://list.orgmode.org/orgmode/m2lenax5m6.fsf@me.com/
---
 lisp/ol.el                       | 68 ++++++++++++++++++++++++++++++--
 lisp/org-element.el              | 16 ++++++--
 lisp/org.el                      |  4 +-
 testing/lisp/test-org-element.el | 17 ++++++++
 4 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/lisp/ol.el b/lisp/ol.el
index 22782578c..97423738a 100644
--- a/lisp/ol.el
+++ b/lisp/ol.el
@@ -52,6 +52,7 @@ (declare-function org-before-first-heading-p "org" ())
 (declare-function org-do-occur "org" (regexp &optional cleanup))
 (declare-function org-element-at-point "org-element" (&optional pom cached-only))
 (declare-function org-element-cache-refresh "org-element" (pos))
+(declare-function org-element-cache-reset "org-element" (&optional all no-persistence))
 (declare-function org-element-context "org-element" (&optional element))
 (declare-function org-element-lineage "org-element-ast" (datum &optional types with-self))
 (declare-function org-element-link-parser "org-element" ())
@@ -532,6 +533,13 @@ (defconst org-radio-target-regexp (format "<%s>" org-target-regexp)
 
 (defvar-local org-target-link-regexp nil
   "Regular expression matching radio targets in plain text.")
+(defconst org-target-link-regexp-limit (ash 2 10)
+  "Maximum allowed length of regexp.
+See MAX_BUF_SIZE in src/regex-emacs.c")
+(defvar-local org-target-link-regexps nil
+  "List of regular expressions matching radio targets in plain text.
+This list is non-nil, when a single regexp would be too long to match
+all the possible targets, exceeding Emacs' regexp length limit.")
 
 (defvar org-link-types-re nil
   "Matches a link that has a url-like prefix like \"http:\".")
@@ -2170,6 +2178,34 @@ (defun org-insert-link-global ()
   (org-load-modules-maybe)
   (org-run-like-in-org-mode 'org-insert-link))
 
+(defun org--re-list-search-forward (regexp-list &optional bound noerror count)
+  "Like `re-search-forward', but REGEXP-LIST is a list of regexps.
+BOUND, NOERROR, and COUNT are passed to `re-search-forward'."
+  (let (result (min-found most-positive-fixnum)
+               (pos-found nil)
+               (min-found-data nil)
+               (tail regexp-list))
+    (while tail
+      (setq result (save-excursion (re-search-forward (pop tail) bound t count)))
+      (when (and result (< result min-found))
+        (setq min-found result
+              pos-found (match-end 0)
+              min-found-data (match-data))))
+    (if (= most-positive-fixnum min-found)
+        (pcase noerror
+          (`t nil)
+          (_ (re-search-forward (car regexp-list) bound noerror count)))
+      (set-match-data min-found-data)
+      (goto-char pos-found))))
+
+(defun org--re-list-looking-at (regexp-list &optional inhibit-modify)
+  "Like `looking-at', but REGEXP-LIST is a list of regexps.
+INHIBIT-MODIFY is passed to `looking-at'."
+  (catch :found
+    (while regexp-list
+      (when (looking-at (pop regexp-list) inhibit-modify)
+        (throw :found t)))))
+
 ;;;###autoload
 (defun org-update-radio-target-regexp ()
   "Find all radio targets in this file and update the regular expression.
@@ -2207,6 +2243,30 @@ (defun org-update-radio-target-regexp ()
 			targets
 			"\\|")
 		       after-re)))
+    (setq org-target-link-regexps nil)
+    (let (current-length sub-targets)
+      (when (<= org-target-link-regexp-limit (length org-target-link-regexp))
+        (while (or targets sub-targets)
+          (when (and sub-targets
+                     (or (not targets)
+                         (>= (+ current-length (length (car targets)))
+                            org-target-link-regexp-limit)))
+            (push (concat before-re
+                          (mapconcat
+			   (lambda (x)
+			     (replace-regexp-in-string
+			      " +" "\\s-+" (regexp-quote x) t t))
+			   (nreverse sub-targets)
+			   "\\|")
+		          after-re)
+                  org-target-link-regexps)
+            (setq current-length nil
+                  sub-targets nil))
+          (unless current-length
+            (setq current-length (+ (length before-re) (length after-re))))
+          (when targets (push (pop targets) sub-targets))
+          (cl-incf current-length (length (car sub-targets))))
+        (setq org-target-link-regexps (nreverse org-target-link-regexps))))
     (unless (equal old-regexp org-target-link-regexp)
       ;; Clean-up cache.
       (let ((regexp (cond ((not old-regexp) org-target-link-regexp)
@@ -2222,9 +2282,11 @@ (defun org-update-radio-target-regexp ()
 				   after-re)))))
 	(when (and (featurep 'org-element)
                    (not (bound-and-true-p org-mode-loading)))
-	  (org-with-point-at 1
-	    (while (re-search-forward regexp nil t)
-	      (org-element-cache-refresh (match-beginning 1))))))
+          (if org-target-link-regexps
+              (org-element-cache-reset)
+	    (org-with-point-at 1
+	      (while (re-search-forward regexp nil t)
+	        (org-element-cache-refresh (match-beginning 1)))))))
       ;; Re fontify buffer.
       (when (memq 'radio org-highlight-links)
 	(org-restart-font-lock)))))
diff --git a/lisp/org-element.el b/lisp/org-element.el
index 573de0106..c490dcd29 100644
--- a/lisp/org-element.el
+++ b/lisp/org-element.el
@@ -3848,7 +3848,9 @@ (defun org-element-link-parser ()
        ;; Type 1: Text targeted from a radio target.
        ((and org-target-link-regexp
 	     (save-excursion (or (bolp) (backward-char))
-			     (looking-at org-target-link-regexp)))
+                             (if org-target-link-regexps
+                                 (org--re-list-looking-at org-target-link-regexps)
+                               (looking-at org-target-link-regexp))))
 	(setq type "radio")
 	(setq format 'plain)
 	(setq link-end (match-end 1))
@@ -5221,7 +5223,10 @@ (defun org-element--object-lex (restriction)
 		       ((not (memq 'link restriction)) nil)
 		       ((progn
 		          (unless (bolp) (forward-char -1))
-		          (not (re-search-forward org-target-link-regexp nil t)))
+		          (not
+                           (if org-target-link-regexps
+                               (org--re-list-search-forward org-target-link-regexps nil t)
+                             (re-search-forward org-target-link-regexp nil t))))
 		        nil)
 		       ;; Since we moved backward, we do not want to
 		       ;; match again an hypothetical 1-character long
@@ -5230,8 +5235,11 @@ (defun org-element--object-lex (restriction)
 		       ;; beginning of line, we prevent this here.
 		       ((and (= start (1+ (line-beginning-position)))
 			     (= start (match-end 1)))
-		        (and (re-search-forward org-target-link-regexp nil t)
-			     (1+ (match-beginning 1))))
+		        (and
+                         (if org-target-link-regexps
+                             (org--re-list-search-forward org-target-link-regexps nil t)
+                           (re-search-forward org-target-link-regexp nil t))
+			 (1+ (match-beginning 1))))
 		       (t (1+ (match-beginning 1))))))
 	      found)
          (save-excursion
diff --git a/lisp/org.el b/lisp/org.el
index 28f43cb67..89cc8bdd9 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -5710,7 +5710,9 @@ (defun org-activate-target-links (limit)
       ;; `org-target-link-regexp' matches one character before the
       ;; actual target.
       (unless (bolp) (forward-char -1))
-      (when (re-search-forward org-target-link-regexp limit t)
+      (when (if org-target-link-regexps
+                (org--re-list-search-forward org-target-link-regexps limit t)
+              (re-search-forward org-target-link-regexp limit t))
 	(org-remove-flyspell-overlays-in (match-beginning 1) (match-end 1))
 	(add-text-properties (match-beginning 1) (match-end 1)
 			     (list 'mouse-face 'highlight
diff --git a/testing/lisp/test-org-element.el b/testing/lisp/test-org-element.el
index 93754bae5..61cd70693 100644
--- a/testing/lisp/test-org-element.el
+++ b/testing/lisp/test-org-element.el
@@ -2378,6 +2378,23 @@ (ert-deftest test-org-element/link-parser ()
       (org-element-property
        :type
        (org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+  (should
+   (equal
+    "radio"
+    (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+      (org-update-radio-target-regexp)
+      (org-element-property
+       :type
+       (org-element-map (org-element-parse-buffer) 'link #'identity nil t)))))
+  (should
+   (equal
+    "radio"
+    (let ((org-target-link-regexp-limit 9))
+      (org-test-with-temp-text "<<<radio>>><<<radio2>>><<<foo>>>A radio link"
+        (org-update-radio-target-regexp)
+        (org-element-property
+         :type
+         (org-element-map (org-element-parse-buffer) 'link #'identity nil t))))))
   ;; Pathological case: radio target of length 1 at beginning of line
   ;; not followed by spaces.
   (should
-- 
2.43.0


[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

  reply	other threads:[~2024-02-28 17:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 23:10 Radio links work only in small numbers Rudolf Adamkovič
2022-12-14  8:30 ` Ihor Radchenko
2022-12-14 15:29   ` Rudolf Adamkovič
2022-12-14 15:47 ` Max Nikulin
2022-12-14 16:02   ` Ihor Radchenko
2022-12-20  3:21     ` Timothy
2022-12-20  1:00   ` Rudolf Adamkovič
2022-12-28  9:46 ` Daniel Clemente
2024-02-19  9:26 ` Ihor Radchenko
2024-02-28 16:33   ` nobiot
2024-02-28 17:14     ` Ihor Radchenko [this message]
2024-02-28 22:18       ` Noboru Ota
2024-02-29 11:11         ` Ihor Radchenko
2024-02-29 16:47           ` Noboru Ota
2024-02-29 17:21             ` Ihor Radchenko
2024-02-29 17:44               ` Noboru Ota
2024-04-04 20:20           ` Rudolf Adamkovič

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5nk3411.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=emacs-orgmode@gnu.org \
    --cc=me@nobiot.com \
    --cc=salutis@me.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).