emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: "David Lukeš" <dafydd.lukes@gmail.com>
To: emacs-orgmode@gnu.org
Subject: More robust CSL-JSON date parsing in oc-basic
Date: Tue, 22 Feb 2022 00:48:27 +0100	[thread overview]
Message-ID: <CAEPTPExcZKGAm3v-brzezfCwMM4h3hQtOq+89Qg+5ULJq1K4Yw@mail.gmail.com> (raw)

Hi all,

I think I've found an opportunity to make oc-basic a bit more resilient
w.r.t. different kinds of CSL-JSON publication date input. I encountered
the following error:

  Error (org-mode-hook): Error running hook "org-fancy-priorities-mode"
  because: (error Unknown CSL-JSON date format: nil)

(Based on the debugger traceback, what triggered it was citar loading
oc-basic for fontification purposes.)

After a small tweak to oc-basic (see below), the error message got a bit
more informative:

  Error (org-mode-hook): Error running hook "org-fancy-priorities-mode"
  because: (error Unknown CSL-JSON date format: ((literal . "2009 / 09 /
  01 /")))

Based on this, I ended up making the following changes to oc-basic:

--8<---------------------------------------------------------------->8--

diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el
index fc71d22fc..de9be40a1 100644
--- a/lisp/oc-basic.el
+++ b/lisp/oc-basic.el
@@ -178,21 +178,24 @@ Return a hash table with citation references as
keys and fields alist as values.
                                      " and ")))
                              ('issued
                               ;; Date are expressed as an array
-                              ;; (`date-parts') or a "string (`raw').
-                              ;; In both cases, extract the year and
-                              ;; associate it to `year' field, for
-                              ;; compatibility with BibTeX format.
+                              ;; (`date-parts') or a "string (`raw'
+                              ;; or `literal'). In both cases,
+                              ;; extract the year and associate it
+                              ;; to `year' field, for compatibility
+                              ;; with BibTeX format.
                               (let ((date (or (alist-get 'date-parts value)
-                                              (alist-get 'raw value))))
+                                              (alist-get 'raw value)
+                                              (alist-get 'literal value))))
                                 (cons 'year
                                       (cond
                                        ((consp date)
                                         (caar date))
                                        ((stringp date)
-                                        (car (split-string date "-")))
+                                        (replace-regexp-in-string
".*?\\([0-9]\\{4\\}\\).*" "\\1" date))
+
                                        (t
                                         (error "Unknown CSL-JSON date
format: %S"
-                                               date))))))
+                                               value))))))
                              (_
                               (cons field value))))
                          item)

--8<---------------------------------------------------------------->8--

A few comments, starting from the top:

1. In practice, it looks like string-valued dates can either be tagged
   with 'raw or with 'literal. Not sure what the difference is, whether
   one is older or deprecated. I just know that my CSL-JSON (produced by
   Better BibTeX from Zotero) is full of the latter, and oc-basic didn't
   account for this possibility, so I added it.
2. I'm pretty sure the weird date format, "2009 / 09 / 01 /", is not
   something I entered manually. My preferred format would be
   "2009-09-01", which is what oc-basic currently expects. But with
   Zotero, I tend to rely on metadata being filled in automagically, and
   it looks like it's the Wild West out there. So I figured a more
   robust way to extract the year portion might be to match the first
   sequence of 4 digits, instead of splitting the string on - and taking
   the car of the resulting list.

   Of course, both are just heuristics, both can fail (each in different
   ways), and both can result in the entire string being used as the
   year. My suggested change is based purely on the fact that the
   4-digit heuristic more often does the right thing in my own
   collection of bibliography entries.
3. The last change uses value instead of date in the error message,
   which makes it more informative, as shown above -- when encountering
   a problem, the user gets to see the offending entry, instead of just
   nil.

Let me know which of these changes -- if any -- seem acceptable, and
whether they qualify as tinychange, and I'll happily submit a proper
patch :)

Best,

David


             reply	other threads:[~2022-02-21 23:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 23:48 David Lukeš [this message]
2022-02-22 19:01 ` More robust CSL-JSON date parsing in oc-basic Nicolas Goaziou
2022-02-25 13:20   ` David Lukeš

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEPTPExcZKGAm3v-brzezfCwMM4h3hQtOq+89Qg+5ULJq1K4Yw@mail.gmail.com \
    --to=dafydd.lukes@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).