From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id CDtgAj9p32URVQAAqHPOHw:P1 (envelope-from ) for ; Wed, 28 Feb 2024 18:11:27 +0100 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id CDtgAj9p32URVQAAqHPOHw (envelope-from ) for ; Wed, 28 Feb 2024 18:11:27 +0100 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=X6juyK3D; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1709140286; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=gi4sxHQdKD5Lw3UjodJTcyeVcxED6qChCL398PNN3Yc=; b=ljHZB25RtxQ3AEmKJIzJ73sj76ASD6ox54k4dq8gVrSU0s8eg3bq45OWC6Qqto0+MtB3Xb VWydrXr4kAtyVZsWg26Uv6M78tZScDBQPGGbLobPJPKkH80rGWp+YSVXY7J7b4HJ8EW4OO oT3XidMq25HsBav7jvwFVpBjLz7hJ25HD6DMdymcOSeP0QwcPNBRSNy5kJ2dHTt3RQRfnv 1iUeeTbNaoTfcf5adRh8qfMtRzxxisUkytyffWuneY+dKwv4ckD2JhUVhzClupuKYDAfjc 5+qt3dIjk5MxnP3Nia1C0IvhQ6cKolSpU4PEKnkkUwZS9uwtx11B3690zG0krg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=X6juyK3D; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=posteo.net ARC-Seal: i=1; s=key1; d=yhetil.org; t=1709140286; a=rsa-sha256; cv=none; b=XBQag9VooSCT14NkD9AlwZ4cLvq0bmNvRbBJD7NeLn1IieXvU61TxEASi0OGKVwQ7w13ZS RurNA8nriGZlENhFmff52T6rLCVaUoQUa/ez0fAWpC+kxdgBqXT/B2LtoJw+Too/LBZmMG VQwzRFmYpAOPDlhVZrDdWDKiwBWAFmfSa7Yu+h8SZnmoeWUTRtUfY1uNFMWVCbVZmcKrZ5 Ya2gRoPr+LbmjqgTTUE1SO4vNhzW0vYR8MON17u21O/jHQRBDAJcDo5EGztBIVvezYwjRb 4Iz9jhkupbc0URRL2mTg8Bj1cedfcStE0EvWu9sI9xscIiBaapofWlnIEUsSEg== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1AD45272C5 for ; Wed, 28 Feb 2024 18:11:26 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rfNRn-0000QB-Uj; Wed, 28 Feb 2024 12:10:23 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rfNRk-0000PX-Th for emacs-orgmode@gnu.org; Wed, 28 Feb 2024 12:10:20 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rfNRg-0001Zr-42 for emacs-orgmode@gnu.org; Wed, 28 Feb 2024 12:10:20 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id B57E224002B for ; Wed, 28 Feb 2024 18:10:12 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1709140212; bh=ap4QQB8emYadw5T1Rmt1in28HLcP5AjKtE7nScDNV3U=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type: From; b=X6juyK3DiQnnZiw44jeVGPyjhyZzqbuDanuLH7ULbTvpGEkCOFxnjoUiE6erBRKe2 sqlCkmg35sA7ydoeIiC9oeJnJWfKWv0W2X9ke6NayVAooa95yGV0vpeC4nNb7UInH3 xW+BLlA/mez4EGYnq1+uTZS1LZJnuoJBumkrcMXtkdV7ir+bddnIhFqEn4mt1z41I0 T7ArNbubtxXvQIBg8wsb8YmZi63bW9nwSI0AG3lpW0NfSyA8WK6WNsqrurNBz4oCS/ WzlvUn6vBBhmPN+L6eweX95xCXqY5PSejn7PFDYAyUeV72ztW4fwAEkJJVQ1DrIn8F npSxhNpkpPPkQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4TlLSz50RFz6tvJ; Wed, 28 Feb 2024 18:10:11 +0100 (CET) From: Ihor Radchenko To: nobiot Cc: emacs-orgmode@gnu.org, salutis@me.com Subject: Re: Radio links work only in small numbers In-Reply-To: <86il281rc6.fsf@nobiot.com> References: <86il281rc6.fsf@nobiot.com> Date: Wed, 28 Feb 2024 17:14:02 +0000 Message-ID: <87a5nk3411.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=185.67.36.65; envelope-from=yantar92@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Scanner: mx12.migadu.com X-Migadu-Spam-Score: -9.48 X-Spam-Score: -9.48 X-Migadu-Queue-Id: 1AD45272C5 X-TUID: s1JMQiPdgIug --=-=-= Content-Type: text/plain nobiot writes: >> May you try the attached patch? > > I would love to help this patch move forward and would be happy to try > the patch, if this is not going to waste anyone's time: > > (1) I took the liberty of creating two test Org files we can use on > sr.ht: https://git.sr.ht/~nobiot/org-radio-links-patch-20240228/tree > > The two files, `500-terms.org` and `5000-terms.org`, contain 500 > and 5000 radio targets respectively. > > Both files have two H1 headlines "Definitions" and "Body text". Once > you open the file, call `M-x org-update-radio-target-regexp`. For > 500 entries, radio targets work beautifully; for the 5000 entries, I > get 'org-element-context: Invalid regexp: "Regular expression too > big"' error. The attached version of the patch should work for 5000-terms.org. (I had to scale down the maximum allowed regexp size; apparently, the number from C sources was not small enough). > (2) I am struggling to apply the patch cleanly to the current HEAD of > Org-mode source. I git-cloned the source from > https://git.sr.ht/~bzg/org-mode (commit 755fef38f Merge branch > 'bugfix'). > > I use Magit and use "W" -> "a" -> "a" to apply the > plain patch file. I "get error: patch failed: You need Www. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-Work-around-regexp-size-limitation-for-large-number-.patch Content-Transfer-Encoding: quoted-printable >From f565a9f3187ae99680ec92969bb3f6c29b542b04 Mon Sep 17 00:00:00 2001 Message-ID: From: Ihor Radchenko Date: Mon, 19 Feb 2024 12:19:34 +0300 Subject: [PATCH] Work around regexp size limitation for large number of link targets MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit * lisp/ol.el (org-target-link-regexp-limit): New constant defining maximum regexp limit where `org-target-link-regexp' is still safe to use without triggering "Regexp too long" error. (org-target-link-regexps): New variable holding a series of shorter regexps to be used instead of too long single `org-target-link-regexp'. (org--re-list-search-forward): New function like `re-search-forward', but accepting a list of regexps. (org--re-list-looking-at): New function like `looking-at', but accepting a list of regexps. (org-update-radio-target-regexp): When `org-target-link-regexp' is too long, set `org-target-link-regexps', partitioning the link target list into smaller regexps. * lisp/org-element.el (org-element-link-parser): (org-element--object-lex): * lisp/org.el (org-activate-target-links): Use `org--re-list-search-forward' and `org--re-list-looking-at' when `org-target-link-regexps' is non-nil. * testing/lisp/test-org-element.el (test-org-element/link-parser): Add tests. Reported-by: Rudolf Adamkovi=C4=8D Link: https://list.orgmode.org/orgmode/m2lenax5m6.fsf@me.com/ --- lisp/ol.el | 68 ++++++++++++++++++++++++++++++-- lisp/org-element.el | 16 ++++++-- lisp/org.el | 4 +- testing/lisp/test-org-element.el | 17 ++++++++ 4 files changed, 97 insertions(+), 8 deletions(-) diff --git a/lisp/ol.el b/lisp/ol.el index 22782578c..97423738a 100644 --- a/lisp/ol.el +++ b/lisp/ol.el @@ -52,6 +52,7 @@ (declare-function org-before-first-heading-p "org" ()) (declare-function org-do-occur "org" (regexp &optional cleanup)) (declare-function org-element-at-point "org-element" (&optional pom cached= -only)) (declare-function org-element-cache-refresh "org-element" (pos)) +(declare-function org-element-cache-reset "org-element" (&optional all no-= persistence)) (declare-function org-element-context "org-element" (&optional element)) (declare-function org-element-lineage "org-element-ast" (datum &optional t= ypes with-self)) (declare-function org-element-link-parser "org-element" ()) @@ -532,6 +533,13 @@ (defconst org-radio-target-regexp (format "<%s>" org-t= arget-regexp) =20 (defvar-local org-target-link-regexp nil "Regular expression matching radio targets in plain text.") +(defconst org-target-link-regexp-limit (ash 2 10) + "Maximum allowed length of regexp. +See MAX_BUF_SIZE in src/regex-emacs.c") +(defvar-local org-target-link-regexps nil + "List of regular expressions matching radio targets in plain text. +This list is non-nil, when a single regexp would be too long to match +all the possible targets, exceeding Emacs' regexp length limit.") =20 (defvar org-link-types-re nil "Matches a link that has a url-like prefix like \"http:\".") @@ -2170,6 +2178,34 @@ (defun org-insert-link-global () (org-load-modules-maybe) (org-run-like-in-org-mode 'org-insert-link)) =20 +(defun org--re-list-search-forward (regexp-list &optional bound noerror co= unt) + "Like `re-search-forward', but REGEXP-LIST is a list of regexps. +BOUND, NOERROR, and COUNT are passed to `re-search-forward'." + (let (result (min-found most-positive-fixnum) + (pos-found nil) + (min-found-data nil) + (tail regexp-list)) + (while tail + (setq result (save-excursion (re-search-forward (pop tail) bound t c= ount))) + (when (and result (< result min-found)) + (setq min-found result + pos-found (match-end 0) + min-found-data (match-data)))) + (if (=3D most-positive-fixnum min-found) + (pcase noerror + (`t nil) + (_ (re-search-forward (car regexp-list) bound noerror count))) + (set-match-data min-found-data) + (goto-char pos-found)))) + +(defun org--re-list-looking-at (regexp-list &optional inhibit-modify) + "Like `looking-at', but REGEXP-LIST is a list of regexps. +INHIBIT-MODIFY is passed to `looking-at'." + (catch :found + (while regexp-list + (when (looking-at (pop regexp-list) inhibit-modify) + (throw :found t))))) + ;;;###autoload (defun org-update-radio-target-regexp () "Find all radio targets in this file and update the regular expression. @@ -2207,6 +2243,30 @@ (defun org-update-radio-target-regexp () targets "\\|") after-re))) + (setq org-target-link-regexps nil) + (let (current-length sub-targets) + (when (<=3D org-target-link-regexp-limit (length org-target-link-reg= exp)) + (while (or targets sub-targets) + (when (and sub-targets + (or (not targets) + (>=3D (+ current-length (length (car targets))) + org-target-link-regexp-limit))) + (push (concat before-re + (mapconcat + (lambda (x) + (replace-regexp-in-string + " +" "\\s-+" (regexp-quote x) t t)) + (nreverse sub-targets) + "\\|") + after-re) + org-target-link-regexps) + (setq current-length nil + sub-targets nil)) + (unless current-length + (setq current-length (+ (length before-re) (length after-re)))) + (when targets (push (pop targets) sub-targets)) + (cl-incf current-length (length (car sub-targets)))) + (setq org-target-link-regexps (nreverse org-target-link-regexps)))) (unless (equal old-regexp org-target-link-regexp) ;; Clean-up cache. (let ((regexp (cond ((not old-regexp) org-target-link-regexp) @@ -2222,9 +2282,11 @@ (defun org-update-radio-target-regexp () after-re))))) (when (and (featurep 'org-element) (not (bound-and-true-p org-mode-loading))) - (org-with-point-at 1 - (while (re-search-forward regexp nil t) - (org-element-cache-refresh (match-beginning 1)))))) + (if org-target-link-regexps + (org-element-cache-reset) + (org-with-point-at 1 + (while (re-search-forward regexp nil t) + (org-element-cache-refresh (match-beginning 1))))))) ;; Re fontify buffer. (when (memq 'radio org-highlight-links) (org-restart-font-lock))))) diff --git a/lisp/org-element.el b/lisp/org-element.el index 573de0106..c490dcd29 100644 --- a/lisp/org-element.el +++ b/lisp/org-element.el @@ -3848,7 +3848,9 @@ (defun org-element-link-parser () ;; Type 1: Text targeted from a radio target. ((and org-target-link-regexp (save-excursion (or (bolp) (backward-char)) - (looking-at org-target-link-regexp))) + (if org-target-link-regexps + (org--re-list-looking-at org-target-link-= regexps) + (looking-at org-target-link-regexp)))) (setq type "radio") (setq format 'plain) (setq link-end (match-end 1)) @@ -5221,7 +5223,10 @@ (defun org-element--object-lex (restriction) ((not (memq 'link restriction)) nil) ((progn (unless (bolp) (forward-char -1)) - (not (re-search-forward org-target-link-regexp nil t))) + (not + (if org-target-link-regexps + (org--re-list-search-forward org-target-lin= k-regexps nil t) + (re-search-forward org-target-link-regexp nil= t)))) nil) ;; Since we moved backward, we do not want to ;; match again an hypothetical 1-character long @@ -5230,8 +5235,11 @@ (defun org-element--object-lex (restriction) ;; beginning of line, we prevent this here. ((and (=3D start (1+ (line-beginning-position))) (=3D start (match-end 1))) - (and (re-search-forward org-target-link-regexp nil t) - (1+ (match-beginning 1)))) + (and + (if org-target-link-regexps + (org--re-list-search-forward org-target-link-= regexps nil t) + (re-search-forward org-target-link-regexp nil t= )) + (1+ (match-beginning 1)))) (t (1+ (match-beginning 1)))))) found) (save-excursion diff --git a/lisp/org.el b/lisp/org.el index 28f43cb67..89cc8bdd9 100644 --- a/lisp/org.el +++ b/lisp/org.el @@ -5710,7 +5710,9 @@ (defun org-activate-target-links (limit) ;; `org-target-link-regexp' matches one character before the ;; actual target. (unless (bolp) (forward-char -1)) - (when (re-search-forward org-target-link-regexp limit t) + (when (if org-target-link-regexps + (org--re-list-search-forward org-target-link-regexps limit= t) + (re-search-forward org-target-link-regexp limit t)) (org-remove-flyspell-overlays-in (match-beginning 1) (match-end 1)) (add-text-properties (match-beginning 1) (match-end 1) (list 'mouse-face 'highlight diff --git a/testing/lisp/test-org-element.el b/testing/lisp/test-org-eleme= nt.el index 93754bae5..61cd70693 100644 --- a/testing/lisp/test-org-element.el +++ b/testing/lisp/test-org-element.el @@ -2378,6 +2378,23 @@ (ert-deftest test-org-element/link-parser () (org-element-property :type (org-element-map (org-element-parse-buffer) 'link #'identity nil t)= )))) + (should + (equal + "radio" + (org-test-with-temp-text "<<>><<>><<>>A radio link" + (org-update-radio-target-regexp) + (org-element-property + :type + (org-element-map (org-element-parse-buffer) 'link #'identity nil t)= )))) + (should + (equal + "radio" + (let ((org-target-link-regexp-limit 9)) + (org-test-with-temp-text "<<>><<>><<>>A radio li= nk" + (org-update-radio-target-regexp) + (org-element-property + :type + (org-element-map (org-element-parse-buffer) 'link #'identity nil = t)))))) ;; Pathological case: radio target of length 1 at beginning of line ;; not followed by spaces. (should --=20 2.43.0 --=-=-= Content-Type: text/plain -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at --=-=-=--