From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id +KP/IvckFGIdcwAAgWs5BA (envelope-from ) for ; Tue, 22 Feb 2022 00:49:11 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id WCc4IPckFGLfagAA9RJhRA (envelope-from ) for ; Tue, 22 Feb 2022 00:49:11 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 210DB21844 for ; Tue, 22 Feb 2022 00:49:11 +0100 (CET) Received: from localhost ([::1]:57888 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nMIQX-0004c3-Jd for larch@yhetil.org; Mon, 21 Feb 2022 18:49:09 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58238) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nMIQ5-0004bs-AM for emacs-orgmode@gnu.org; Mon, 21 Feb 2022 18:48:41 -0500 Received: from [2607:f8b0:4864:20::d29] (port=34699 helo=mail-io1-xd29.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nMIQ3-0002VO-MC for emacs-orgmode@gnu.org; Mon, 21 Feb 2022 18:48:40 -0500 Received: by mail-io1-xd29.google.com with SMTP id f14so10234944ioz.1 for ; Mon, 21 Feb 2022 15:48:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=ZyoeRr0tcqbCYBCnQ9tysGAqcPIEEpc+oAmc18ZYXLM=; b=CMh/dDPQVwXy0UCFPxtClTeNZvrCQvWcpzvPJnUhn1kY8FNazAEfbde7bQ6soE7aCh WMjRphCFTWHGhZt8ZGUAHs0B6BglXeje6UlPH4QIEPJd+ZuhwJMlEcVoFn+Wc2lOiBGP Zf53CqRbcM6oqTUYuwKNvii1RyuGz86QCV/aO6Ezz03NoVQcMX16LEEv1jcSiQmC+/NW gkQBo5szVbgXmtc4zhk4+qmywpxdS6GbV4+O9uBZuomXFy2jc28LuiLRkZZKQStZw6SD uXM735Vtgt4j1pDqWvc4/wQL7tUuGY5OKdYP+aOAG48TtyDaJ1lcgt8GRlzilImfrSOH Nq5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ZyoeRr0tcqbCYBCnQ9tysGAqcPIEEpc+oAmc18ZYXLM=; b=Zp9WOJQGMzOzt/lbxSijnuTD5VpcO2M8kARt+p/bg/3Cyx+qTmsdZJd7FLpZmN+5Lw Mjr+kGXYUIz8oNq8laXn36rpI/h/aWefKmEzZyJgfX0XjopG/jCfeegeyBEvwSuR/0dI S/rN1qAJ/E6M41Ajknk1NXyk/Iyi3nc3Zb6nyvgxG8ayE0PSljAKgfM532FQGyJ5VYtR RGhms8551XDB3Fafa+e+rEXtgrm6XqSaSrpGtQAtnQnRPgd2v3S2d0MSIegIl+WYN9pF /q8Gl1cDNU17rm+E8cG8i0dfdTWeS5nY0V48BhLWajUHkTBgLM4JsWAUtmdafNVYzA1t FxIQ== X-Gm-Message-State: AOAM532qM4V2/MxOdmVnk80xht5NdncPi0LjtICPbF1W4w5WpUIJvp+n Rsl9Nw9UD5+sGt8eLdd9vrh6ZkbpvaPeMULP0obuzYziJso= X-Google-Smtp-Source: ABdhPJxDoUcEm0yLJfnN6Fj/TGmEul/TjPrbWml55PcQrJAxxA40ROdOYLw9vSym4airz2mB1Cc2Aq+mUA2KL+k1Ejk= X-Received: by 2002:a05:6638:22cf:b0:314:e10e:34dd with SMTP id j15-20020a05663822cf00b00314e10e34ddmr5938249jat.143.1645487317820; Mon, 21 Feb 2022 15:48:37 -0800 (PST) MIME-Version: 1.0 From: =?UTF-8?B?RGF2aWQgTHVrZcWh?= Date: Tue, 22 Feb 2022 00:48:27 +0100 Message-ID: Subject: More robust CSL-JSON date parsing in oc-basic To: emacs-orgmode@gnu.org Content-Type: text/plain; charset="UTF-8" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::d29 (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::d29; envelope-from=dafydd.lukes@gmail.com; helo=mail-io1-xd29.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1645487351; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=ZyoeRr0tcqbCYBCnQ9tysGAqcPIEEpc+oAmc18ZYXLM=; b=t/5VG1fIWMnRJf+OYXNAbyC/7CQRfg9tH+Srm9SngKMoFMnm5gv7c0izT9cBmaxlCuha+O kCMLQPeHnOWlbpnjCILE+x16EIz709u40DPppgHTawmmUlj+cqwUwEr8oblVAOq6MH6GnM G12PHHJbdGWBwB7Rjo9KFabYq+5a1GJhnaqzCyCZ1/Guo2YDDq740rQNB5kS+ZeOesZfzJ 0Vrx6/oBWsh0B9s2or7mSzVfdsqlLt0ql8jAgGPsoqB6XRblay3ohJkbo3F3k7zpA1DwFg aYCoWgEJSF96hoJJ16aXsboN8327HlhJrA31nMcZnmh+fSkevZ0CJqXNLZEajA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1645487351; a=rsa-sha256; cv=none; b=cjrGa+NO22ufVSVn9+hr/murkgvEp5gJl8couGSlZmpC/w3BguDswtDspClVFvFEmI5Axn JiBIbH59TOF/h3MXaljEBTp2An6+/zA46KO2EiDC5Zmz9/Hn2aWPjHhjzRGKReU7dGJgDx kQVd0ljBZ+TdCO4prc6m4juYzmfMvpY9/FeKuSnSofEmcjrKTnFtCRFaQTFieZ3i/xNP5v ptsF9yjH0cOKG+UN3yt9jeACGo3xytxL1J27XYXJ5Cg9ftnQrZBfNmA9N/tE+KlB04jw/A 5XJ8bVdoO5LHXVI1kNMVODld8UryR1ylIjbIcoj9A/vkEUjBVNbqlQJSFxg8Nw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="CMh/dDPQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -9.13 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="CMh/dDPQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 210DB21844 X-Spam-Score: -9.13 X-Migadu-Scanner: scn0.migadu.com X-TUID: XNZMzZPruZNJ Hi all, I think I've found an opportunity to make oc-basic a bit more resilient w.r.t. different kinds of CSL-JSON publication date input. I encountered the following error: Error (org-mode-hook): Error running hook "org-fancy-priorities-mode" because: (error Unknown CSL-JSON date format: nil) (Based on the debugger traceback, what triggered it was citar loading oc-basic for fontification purposes.) After a small tweak to oc-basic (see below), the error message got a bit more informative: Error (org-mode-hook): Error running hook "org-fancy-priorities-mode" because: (error Unknown CSL-JSON date format: ((literal . "2009 / 09 / 01 /"))) Based on this, I ended up making the following changes to oc-basic: --8<---------------------------------------------------------------->8-- diff --git a/lisp/oc-basic.el b/lisp/oc-basic.el index fc71d22fc..de9be40a1 100644 --- a/lisp/oc-basic.el +++ b/lisp/oc-basic.el @@ -178,21 +178,24 @@ Return a hash table with citation references as keys and fields alist as values. " and "))) ('issued ;; Date are expressed as an array - ;; (`date-parts') or a "string (`raw'). - ;; In both cases, extract the year and - ;; associate it to `year' field, for - ;; compatibility with BibTeX format. + ;; (`date-parts') or a "string (`raw' + ;; or `literal'). In both cases, + ;; extract the year and associate it + ;; to `year' field, for compatibility + ;; with BibTeX format. (let ((date (or (alist-get 'date-parts value) - (alist-get 'raw value)))) + (alist-get 'raw value) + (alist-get 'literal value)))) (cons 'year (cond ((consp date) (caar date)) ((stringp date) - (car (split-string date "-"))) + (replace-regexp-in-string ".*?\\([0-9]\\{4\\}\\).*" "\\1" date)) + (t (error "Unknown CSL-JSON date format: %S" - date)))))) + value)))))) (_ (cons field value)))) item) --8<---------------------------------------------------------------->8-- A few comments, starting from the top: 1. In practice, it looks like string-valued dates can either be tagged with 'raw or with 'literal. Not sure what the difference is, whether one is older or deprecated. I just know that my CSL-JSON (produced by Better BibTeX from Zotero) is full of the latter, and oc-basic didn't account for this possibility, so I added it. 2. I'm pretty sure the weird date format, "2009 / 09 / 01 /", is not something I entered manually. My preferred format would be "2009-09-01", which is what oc-basic currently expects. But with Zotero, I tend to rely on metadata being filled in automagically, and it looks like it's the Wild West out there. So I figured a more robust way to extract the year portion might be to match the first sequence of 4 digits, instead of splitting the string on - and taking the car of the resulting list. Of course, both are just heuristics, both can fail (each in different ways), and both can result in the entire string being used as the year. My suggested change is based purely on the fact that the 4-digit heuristic more often does the right thing in my own collection of bibliography entries. 3. The last change uses value instead of date in the error message, which makes it more informative, as shown above -- when encountering a problem, the user gets to see the offending entry, instead of just nil. Let me know which of these changes -- if any -- seem acceptable, and whether they qualify as tinychange, and I'll happily submit a proper patch :) Best, David