emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* [PATCH] oc-basic: Detect malformed bibtex bibliographies
@ 2022-04-20 12:28 Ihor Radchenko
  2022-04-20 12:38 ` Bruce D'Arcus
  0 siblings, 1 reply; 4+ messages in thread
From: Ihor Radchenko @ 2022-04-20 12:28 UTC (permalink / raw)
  To: emacs-orgmode, Nicolas Goaziou

There is an edge case triggering infinite loop in oc-basic.

It is caused by bibtex-map-entries (used in
org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1].

The proposed patch validates the bibtex buffer before processing and
throws an error if issues are found.  `bibtex-validate` also
conveniently displays a list of errors with clickable links to
problematic lines.

I believe that it is useful for the users to see such issues instead of,
say, failing silently on malformed bibliographies.

WDYT?

Best,
Ihor

[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036

* lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the
bibliography before parsing.  Display list of issues if any (via
`bibtex-validate`).
(org-cite-basic--parse-bibliography): Set buffer file name needed by
`bibtex-validate`.  Empty the cache in case of error.
---
 lisp/oc-basic.el | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index 873986d07..79f7a4844 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect)
   (let ((entries (make-hash-table :test #'equal))
         (bibtex-sort-ignore-string-entries t))
     (bibtex-set-dialect dialect t)
+    ;; Throw an error if bibliography is malformed.
+    (unless (bibtex-validate)
+      (user-error "Malformed bibliography at %S"
+                  (or (buffer-file-name) (current-buffer))))
     (bibtex-map-entries
      (lambda (key &rest _)
        ;; Normalize entries: field names are turned into symbols
@@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography (&optional info)
             (when (or (org-file-has-changed-p file)
                       (not (gethash file org-cite-basic--file-id-cache)))
               (insert-file-contents file)
+              (setf (buffer-file-name) file)
               (puthash file (org-buffer-hash) org-cite-basic--file-id-cache))
-	    (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache)))
-                   (entries
-                    (or (cdr (assoc file-id org-cite-basic--bibliography-cache))
-                        (let ((table
-                               (pcase (file-name-extension file)
-                                 ("json" (org-cite-basic--parse-json))
-                                 ("bib" (org-cite-basic--parse-bibtex 'biblatex))
-                                 ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX))
-                                 (ext
-                                  (user-error "Unknown bibliography extension: %S"
-                                              ext)))))
-                          (push (cons file-id table) org-cite-basic--bibliography-cache)
-                          table))))
-              (push (cons file entries) results)))))
+            (unwind-protect
+                (condition-case nil
+                    (unwind-protect
+	                (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache)))
+                               (entries
+                                (or (cdr (assoc file-id org-cite-basic--bibliography-cache))
+                                    (let ((table
+                                           (pcase (file-name-extension file)
+                                             ("json" (org-cite-basic--parse-json))
+                                             ("bib" (org-cite-basic--parse-bibtex 'biblatex))
+                                             ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX))
+                                             (ext
+                                              (user-error "Unknown bibliography extension: %S"
+                                                          ext)))))
+                                      (push (cons file-id table) org-cite-basic--bibliography-cache)
+                                      table))))
+                          (push (cons file entries) results))
+                      (setf (buffer-file-name) nil))
+                  (error (setq org-cite-basic--file-id-cache nil)))))))
       (when info (plist-put info :cite-basic/bibliography results))
       results)))
 
-- 
2.35.1



-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] oc-basic: Detect malformed bibtex bibliographies
  2022-04-20 12:28 [PATCH] oc-basic: Detect malformed bibtex bibliographies Ihor Radchenko
@ 2022-04-20 12:38 ` Bruce D'Arcus
  2022-04-20 12:44   ` John Kitchin
  0 siblings, 1 reply; 4+ messages in thread
From: Bruce D'Arcus @ 2022-04-20 12:38 UTC (permalink / raw)
  To: org-mode-email

On Wed, Apr 20, 2022 at 8:28 AM Ihor Radchenko <yantar92@gmail.com> wrote:
>
> There is an edge case triggering infinite loop in oc-basic.
>
> It is caused by bibtex-map-entries (used in
> org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1].
>
> The proposed patch validates the bibtex buffer before processing and
> throws an error if issues are found.  `bibtex-validate` also
> conveniently displays a list of errors with clickable links to
> problematic lines.
>
> I believe that it is useful for the users to see such issues instead of,
> say, failing silently on malformed bibliographies.
>
> WDYT?

I haven't tested it, but this is an excellent idea!

Bruce

> Best,
> Ihor
>
> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036
>
> * lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the
> bibliography before parsing.  Display list of issues if any (via
> `bibtex-validate`).
> (org-cite-basic--parse-bibliography): Set buffer file name needed by
> `bibtex-validate`.  Empty the cache in case of error.
> ---
>  lisp/oc-basic.el | 38 ++++++++++++++++++++++++--------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
> index 873986d07..79f7a4844 100644
> --- a/lisp/oc-basic.el
> +++ b/lisp/oc-basic.el
> @@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect)
>    (let ((entries (make-hash-table :test #'equal))
>          (bibtex-sort-ignore-string-entries t))
>      (bibtex-set-dialect dialect t)
> +    ;; Throw an error if bibliography is malformed.
> +    (unless (bibtex-validate)
> +      (user-error "Malformed bibliography at %S"
> +                  (or (buffer-file-name) (current-buffer))))
>      (bibtex-map-entries
>       (lambda (key &rest _)
>         ;; Normalize entries: field names are turned into symbols
> @@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography (&optional info)
>              (when (or (org-file-has-changed-p file)
>                        (not (gethash file org-cite-basic--file-id-cache)))
>                (insert-file-contents file)
> +              (setf (buffer-file-name) file)
>                (puthash file (org-buffer-hash) org-cite-basic--file-id-cache))
> -           (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache)))
> -                   (entries
> -                    (or (cdr (assoc file-id org-cite-basic--bibliography-cache))
> -                        (let ((table
> -                               (pcase (file-name-extension file)
> -                                 ("json" (org-cite-basic--parse-json))
> -                                 ("bib" (org-cite-basic--parse-bibtex 'biblatex))
> -                                 ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX))
> -                                 (ext
> -                                  (user-error "Unknown bibliography extension: %S"
> -                                              ext)))))
> -                          (push (cons file-id table) org-cite-basic--bibliography-cache)
> -                          table))))
> -              (push (cons file entries) results)))))
> +            (unwind-protect
> +                (condition-case nil
> +                    (unwind-protect
> +                       (let* ((file-id (cons file (gethash file org-cite-basic--file-id-cache)))
> +                               (entries
> +                                (or (cdr (assoc file-id org-cite-basic--bibliography-cache))
> +                                    (let ((table
> +                                           (pcase (file-name-extension file)
> +                                             ("json" (org-cite-basic--parse-json))
> +                                             ("bib" (org-cite-basic--parse-bibtex 'biblatex))
> +                                             ("bibtex" (org-cite-basic--parse-bibtex 'BibTeX))
> +                                             (ext
> +                                              (user-error "Unknown bibliography extension: %S"
> +                                                          ext)))))
> +                                      (push (cons file-id table) org-cite-basic--bibliography-cache)
> +                                      table))))
> +                          (push (cons file entries) results))
> +                      (setf (buffer-file-name) nil))
> +                  (error (setq org-cite-basic--file-id-cache nil)))))))
>        (when info (plist-put info :cite-basic/bibliography results))
>        results)))
>
> --
> 2.35.1
>
>
>
> --
> Ihor Radchenko,
> PhD,
> Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
> State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong University, Xi'an, China
> Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] oc-basic: Detect malformed bibtex bibliographies
  2022-04-20 12:38 ` Bruce D'Arcus
@ 2022-04-20 12:44   ` John Kitchin
  2022-04-20 13:02     ` Ihor Radchenko
  0 siblings, 1 reply; 4+ messages in thread
From: John Kitchin @ 2022-04-20 12:44 UTC (permalink / raw)
  To: Bruce D'Arcus; +Cc: org-mode-email

[-- Attachment #1: Type: text/plain, Size: 5562 bytes --]

I would see if you can cache the result and not do it more than needed; it
can add a performance issue on large files.

John

-----------------------------------
Professor John Kitchin (he/him/his)
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu



On Wed, Apr 20, 2022 at 8:39 AM Bruce D'Arcus <bdarcus@gmail.com> wrote:

> On Wed, Apr 20, 2022 at 8:28 AM Ihor Radchenko <yantar92@gmail.com> wrote:
> >
> > There is an edge case triggering infinite loop in oc-basic.
> >
> > It is caused by bibtex-map-entries (used in
> > org-cite-basic--parse-bibtex) when ran on a malformed bibtex buffer [1].
> >
> > The proposed patch validates the bibtex buffer before processing and
> > throws an error if issues are found.  `bibtex-validate` also
> > conveniently displays a list of errors with clickable links to
> > problematic lines.
> >
> > I believe that it is useful for the users to see such issues instead of,
> > say, failing silently on malformed bibliographies.
> >
> > WDYT?
>
> I haven't tested it, but this is an excellent idea!
>
> Bruce
>
> > Best,
> > Ihor
> >
> > [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=55036
> >
> > * lisp/oc-basic.el (org-cite-basic--parse-bibtex): Validate the
> > bibliography before parsing.  Display list of issues if any (via
> > `bibtex-validate`).
> > (org-cite-basic--parse-bibliography): Set buffer file name needed by
> > `bibtex-validate`.  Empty the cache in case of error.
> > ---
> >  lisp/oc-basic.el | 38 ++++++++++++++++++++++++--------------
> >  1 file changed, 24 insertions(+), 14 deletions(-)
> >
> > diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
> > index 873986d07..79f7a4844 100644
> > --- a/lisp/oc-basic.el
> > +++ b/lisp/oc-basic.el
> > @@ -214,6 +214,10 @@ (defun org-cite-basic--parse-bibtex (dialect)
> >    (let ((entries (make-hash-table :test #'equal))
> >          (bibtex-sort-ignore-string-entries t))
> >      (bibtex-set-dialect dialect t)
> > +    ;; Throw an error if bibliography is malformed.
> > +    (unless (bibtex-validate)
> > +      (user-error "Malformed bibliography at %S"
> > +                  (or (buffer-file-name) (current-buffer))))
> >      (bibtex-map-entries
> >       (lambda (key &rest _)
> >         ;; Normalize entries: field names are turned into symbols
> > @@ -258,21 +262,27 @@ (defun org-cite-basic--parse-bibliography
> (&optional info)
> >              (when (or (org-file-has-changed-p file)
> >                        (not (gethash file
> org-cite-basic--file-id-cache)))
> >                (insert-file-contents file)
> > +              (setf (buffer-file-name) file)
> >                (puthash file (org-buffer-hash)
> org-cite-basic--file-id-cache))
> > -           (let* ((file-id (cons file (gethash file
> org-cite-basic--file-id-cache)))
> > -                   (entries
> > -                    (or (cdr (assoc file-id
> org-cite-basic--bibliography-cache))
> > -                        (let ((table
> > -                               (pcase (file-name-extension file)
> > -                                 ("json" (org-cite-basic--parse-json))
> > -                                 ("bib" (org-cite-basic--parse-bibtex
> 'biblatex))
> > -                                 ("bibtex"
> (org-cite-basic--parse-bibtex 'BibTeX))
> > -                                 (ext
> > -                                  (user-error "Unknown bibliography
> extension: %S"
> > -                                              ext)))))
> > -                          (push (cons file-id table)
> org-cite-basic--bibliography-cache)
> > -                          table))))
> > -              (push (cons file entries) results)))))
> > +            (unwind-protect
> > +                (condition-case nil
> > +                    (unwind-protect
> > +                       (let* ((file-id (cons file (gethash file
> org-cite-basic--file-id-cache)))
> > +                               (entries
> > +                                (or (cdr (assoc file-id
> org-cite-basic--bibliography-cache))
> > +                                    (let ((table
> > +                                           (pcase (file-name-extension
> file)
> > +                                             ("json"
> (org-cite-basic--parse-json))
> > +                                             ("bib"
> (org-cite-basic--parse-bibtex 'biblatex))
> > +                                             ("bibtex"
> (org-cite-basic--parse-bibtex 'BibTeX))
> > +                                             (ext
> > +                                              (user-error "Unknown
> bibliography extension: %S"
> > +                                                          ext)))))
> > +                                      (push (cons file-id table)
> org-cite-basic--bibliography-cache)
> > +                                      table))))
> > +                          (push (cons file entries) results))
> > +                      (setf (buffer-file-name) nil))
> > +                  (error (setq org-cite-basic--file-id-cache nil)))))))
> >        (when info (plist-put info :cite-basic/bibliography results))
> >        results)))
> >
> > --
> > 2.35.1
> >
> >
> >
> > --
> > Ihor Radchenko,
> > PhD,
> > Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
> > State Key Laboratory for Mechanical Behavior of Materials, Xi'an
> Jiaotong University, Xi'an, China
> > Email: yantar92@gmail.com, ihor_radchenko@alumni.sutd.edu.sg
> >
>
>

[-- Attachment #2: Type: text/html, Size: 7735 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] oc-basic: Detect malformed bibtex bibliographies
  2022-04-20 12:44   ` John Kitchin
@ 2022-04-20 13:02     ` Ihor Radchenko
  0 siblings, 0 replies; 4+ messages in thread
From: Ihor Radchenko @ 2022-04-20 13:02 UTC (permalink / raw)
  To: John Kitchin; +Cc: org-mode-email, Bruce D'Arcus

John Kitchin <jkitchin@andrew.cmu.edu> writes:

> I would see if you can cache the result and not do it more than needed; it
> can add a performance issue on large files.

The results of parsing are already cached. See 7ddc5b57c.

With this patch, I'd expect 2x performance degradation on (1) first time
Org opens the bibliography in current Emacs session; (2) after
bibliography is changed.

I believe that it is anyway worth it. Having errors in bibliography can
be easily overlooked and can potentially cause annoying issues.

Best,
Ihor


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-20 13:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 12:28 [PATCH] oc-basic: Detect malformed bibtex bibliographies Ihor Radchenko
2022-04-20 12:38 ` Bruce D'Arcus
2022-04-20 12:44   ` John Kitchin
2022-04-20 13:02     ` Ihor Radchenko

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).