There is an edge case triggering infinite loop in oc-basic. It is caused by bibtex-map-entries (used in org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1]. The proposed patch validates the bibtex buffer before processing and throws an error if issues are found. `bibtex-validate` also conveniently displays a list of errors with clickable links to problematic lines. I believe that it is useful for the users to see such issues instead of, say, failing silently on malformed bibliographies. WDYT? Best, Ihor [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036 * lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the bibliography before parsing. Display list of issues if any (via `bibtex-validate`). (org-cite-basic--parse-bibliography): Set buffer file name needed by `bibtex-validate`. Empty the cache in case of error. --- lisp/oc-basic.el | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el index 873986d07..79f7a4844 100644 --- a/lisp/oc-basic.el +++ b/lisp/oc-basic.el @@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect) (let ((entries (make-hash-table :test #'equal)) (bibtex-sort-ignore-string-entries t)) (bibtex-set-dialect dialect t) + ;; Throw an error if bibliography is malformed. + (unless (bibtex-validate) + (user-error "Malformed bibliography at %S" + (or (buffer-file-name) (current-buffer)))) (bibtex-map-entries (lambda (key &rest _) ;; Normalize entries: field names are turned into symbols @@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography (&optional info) (when (or (org-file-has-changed-p file) (not (gethash file org-cite-basic--file-id-cache))) (insert-file-contents file) + (setf (buffer-file-name) file) (puthash file (org-buffer-hash) org-cite-basic--file-id-cache)) - (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache))) - (entries - (or (cdr (assoc file-id org-cite-basic--bibliography-cache)) - (let ((table - (pcase (file-name-extension file) - ("json" (org-cite-basic--parse-json)) - ("bib" (org-cite-basic--parse-bibtex 'biblatex)) - ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX)) - (ext - (user-error "Unknown bibliography extension: %S" - ext))))) - (push (cons file-id table) org-cite-basic--bibliography-cache) - table)))) - (push (cons file entries) results))))) + (unwind-protect + (condition-case nil + (unwind-protect + (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache))) + (entries + (or (cdr (assoc file-id org-cite-basic--bibliography-cache)) + (let ((table + (pcase (file-name-extension file) + ("json" (org-cite-basic--parse-json)) + ("bib" (org-cite-basic--parse-bibtex 'biblatex)) + ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX)) + (ext + (user-error "Unknown bibliography extension: %S" + ext))))) + (push (cons file-id table) org-cite-basic--bibliography-cache) + table)))) + (push (cons file entries) results)) + (setf (buffer-file-name) nil)) + (error (setq org-cite-basic--file-id-cache nil))))))) (when info (plist-put info :cite-basic/bibliography results)) results))) -- 2.35.1 -- Ihor Radchenko, PhD, Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg
On Wed, Apr 20, 2022 at 8:28 AM Ihor Radchenko <yantar92@gmail.com> wrote: > > There is an edge case triggering infinite loop in oc-basic. > > It is caused by bibtex-map-entries (used in > org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1]. > > The proposed patch validates the bibtex buffer before processing and > throws an error if issues are found. `bibtex-validate` also > conveniently displays a list of errors with clickable links to > problematic lines. > > I believe that it is useful for the users to see such issues instead of, > say, failing silently on malformed bibliographies. > > WDYT? I haven't tested it, but this is an excellent idea! Bruce > Best, > Ihor > > [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036 > > * lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the > bibliography before parsing. Display list of issues if any (via > `bibtex-validate`). > (org-cite-basic--parse-bibliography): Set buffer file name needed by > `bibtex-validate`. Empty the cache in case of error. > --- > lisp/oc-basic.el | 38 ++++++++++++++++++++++++-------------- > 1 file changed, 24 insertions(+), 14 deletions(-) > > diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el > index 873986d07..79f7a4844 100644 > --- a/lisp/oc-basic.el > +++ b/lisp/oc-basic.el > @@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect) > (let ((entries (make-hash-table :test #'equal)) > (bibtex-sort-ignore-string-entries t)) > (bibtex-set-dialect dialect t) > + ;; Throw an error if bibliography is malformed. > + (unless (bibtex-validate) > + (user-error "Malformed bibliography at %S" > + (or (buffer-file-name) (current-buffer)))) > (bibtex-map-entries > (lambda (key &rest _) > ;; Normalize entries: field names are turned into symbols > @@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography (&optional info) > (when (or (org-file-has-changed-p file) > (not (gethash file org-cite-basic--file-id-cache))) > (insert-file-contents file) > + (setf (buffer-file-name) file) > (puthash file (org-buffer-hash) org-cite-basic--file-id-cache)) > - (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache))) > - (entries > - (or (cdr (assoc file-id org-cite-basic--bibliography-cache)) > - (let ((table > - (pcase (file-name-extension file) > - ("json" (org-cite-basic--parse-json)) > - ("bib" (org-cite-basic--parse-bibtex 'biblatex)) > - ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX)) > - (ext > - (user-error "Unknown bibliography extension: %S" > - ext))))) > - (push (cons file-id table) org-cite-basic--bibliography-cache) > - table)))) > - (push (cons file entries) results))))) > + (unwind-protect > + (condition-case nil > + (unwind-protect > + (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache))) > + (entries > + (or (cdr (assoc file-id org-cite-basic--bibliography-cache)) > + (let ((table > + (pcase (file-name-extension file) > + ("json" (org-cite-basic--parse-json)) > + ("bib" (org-cite-basic--parse-bibtex 'biblatex)) > + ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX)) > + (ext > + (user-error "Unknown bibliography extension: %S" > + ext))))) > + (push (cons file-id table) org-cite-basic--bibliography-cache) > + table)))) > + (push (cons file entries) results)) > + (setf (buffer-file-name) nil)) > + (error (setq org-cite-basic--file-id-cache nil))))))) > (when info (plist-put info :cite-basic/bibliography results)) > results))) > > -- > 2.35.1 > > > > -- > Ihor Radchenko, > PhD, > Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) > State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China > Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg >
[-- Attachment #1: Type: text/plain, Size: 5562 bytes --] I would see if you can cache the result and not do it more than needed; it can add a performance issue on large files. John ----------------------------------- Professor John Kitchin (he/him/his) Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu On Wed, Apr 20, 2022 at 8:39 AM Bruce D'Arcus <bdarcus@gmail.com> wrote: > On Wed, Apr 20, 2022 at 8:28 AM Ihor Radchenko <yantar92@gmail.com> wrote: > > > > There is an edge case triggering infinite loop in oc-basic. > > > > It is caused by bibtex-map-entries (used in > > org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1]. > > > > The proposed patch validates the bibtex buffer before processing and > > throws an error if issues are found. `bibtex-validate` also > > conveniently displays a list of errors with clickable links to > > problematic lines. > > > > I believe that it is useful for the users to see such issues instead of, > > say, failing silently on malformed bibliographies. > > > > WDYT? > > I haven't tested it, but this is an excellent idea! > > Bruce > > > Best, > > Ihor > > > > [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036 > > > > * lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the > > bibliography before parsing. Display list of issues if any (via > > `bibtex-validate`). > > (org-cite-basic--parse-bibliography): Set buffer file name needed by > > `bibtex-validate`. Empty the cache in case of error. > > --- > > lisp/oc-basic.el | 38 ++++++++++++++++++++++++-------------- > > 1 file changed, 24 insertions(+), 14 deletions(-) > > > > diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el > > index 873986d07..79f7a4844 100644 > > --- a/lisp/oc-basic.el > > +++ b/lisp/oc-basic.el > > @@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect) > > (let ((entries (make-hash-table :test #'equal)) > > (bibtex-sort-ignore-string-entries t)) > > (bibtex-set-dialect dialect t) > > + ;; Throw an error if bibliography is malformed. > > + (unless (bibtex-validate) > > + (user-error "Malformed bibliography at %S" > > + (or (buffer-file-name) (current-buffer)))) > > (bibtex-map-entries > > (lambda (key &rest _) > > ;; Normalize entries: field names are turned into symbols > > @@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography > (&optional info) > > (when (or (org-file-has-changed-p file) > > (not (gethash file > org-cite-basic--file-id-cache))) > > (insert-file-contents file) > > + (setf (buffer-file-name) file) > > (puthash file (org-buffer-hash) > org-cite-basic--file-id-cache)) > > - (let* ((file-id (cons file (gethash file > org-cite-basic--file-id-cache))) > > - (entries > > - (or (cdr (assoc file-id > org-cite-basic--bibliography-cache)) > > - (let ((table > > - (pcase (file-name-extension file) > > - ("json" (org-cite-basic--parse-json)) > > - ("bib" (org-cite-basic--parse-bibtex > 'biblatex)) > > - ("bibtex" > (org-cite-basic--parse-bibtex 'BibTeX)) > > - (ext > > - (user-error "Unknown bibliography > extension: %S" > > - ext))))) > > - (push (cons file-id table) > org-cite-basic--bibliography-cache) > > - table)))) > > - (push (cons file entries) results))))) > > + (unwind-protect > > + (condition-case nil > > + (unwind-protect > > + (let* ((file-id (cons file (gethash file > org-cite-basic--file-id-cache))) > > + (entries > > + (or (cdr (assoc file-id > org-cite-basic--bibliography-cache)) > > + (let ((table > > + (pcase (file-name-extension > file) > > + ("json" > (org-cite-basic--parse-json)) > > + ("bib" > (org-cite-basic--parse-bibtex 'biblatex)) > > + ("bibtex" > (org-cite-basic--parse-bibtex 'BibTeX)) > > + (ext > > + (user-error "Unknown > bibliography extension: %S" > > + ext))))) > > + (push (cons file-id table) > org-cite-basic--bibliography-cache) > > + table)))) > > + (push (cons file entries) results)) > > + (setf (buffer-file-name) nil)) > > + (error (setq org-cite-basic--file-id-cache nil))))))) > > (when info (plist-put info :cite-basic/bibliography results)) > > results))) > > > > -- > > 2.35.1 > > > > > > > > -- > > Ihor Radchenko, > > PhD, > > Center for Advancing Materials Performance from the Nanoscale (CAMP-nano) > > State Key Laboratory for Mechanical Behavior of Materials, Xi'an > Jiaotong University, Xi'an, China > > Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg > > > > [-- Attachment #2: Type: text/html, Size: 7735 bytes --]
John Kitchin <jkitchin@andrew.cmu.edu> writes:
> I would see if you can cache the result and not do it more than needed; it
> can add a performance issue on large files.
The results of parsing are already cached. See 7ddc5b57c.
With this patch, I'd expect 2x performance degradation on (1) first time
Org opens the bibliography in current Emacs session; (2) after
bibliography is changed.
I believe that it is anyway worth it. Having errors in bibliography can
be easily overlooked and can potentially cause annoying issues.
Best,
Ihor
Ihor Radchenko <yantar92@gmail.com> writes:
> I believe that it is useful for the users to see such issues instead of,
> say, failing silently on malformed bibliographies.
Applied onto main via 5b45ad083.
Best,
Ihor