From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 2PN/LC/Kb2ORkwAAbAwnHQ (envelope-from ) for ; Sat, 12 Nov 2022 17:30:39 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id IOGbKy/Kb2O39wAAG6o9tA (envelope-from ) for ; Sat, 12 Nov 2022 17:30:39 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 723C124A51 for ; Sat, 12 Nov 2022 17:30:39 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ottOM-0007v6-0k; Sat, 12 Nov 2022 11:30:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ottOK-0007u7-P6 for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 11:30:00 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ottOI-0008Es-MI for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 11:30:00 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 6C72E240026 for ; Sat, 12 Nov 2022 17:29:56 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668270596; bh=Ieomns9NmDaR+Vh81/wsSb7o/K9lSZvo+tfsH7Bf3WE=; h=From:To:Cc:Subject:Date:From; b=ksliyjIEWpa/yhp29FpY3pqzmvN1nAfD/Ye94Mm8dsDNv4saci+9u9hCntoD3+1YP X874eDeYlwhBsHILKZDeNYfToObFtbbcD5ZoLJf8N4fJL65pOMvRTF3gEPS53zH/6f 6C8HpGkGkG8AwByoqEkj5QukXvJnFSn1g0b/wmrPSEzpcSys06TivGn1gPqc+kLo2u tD6KEvAARzDIgGw5bvy2D3/S/rHAkuq0TKmaXVPs7YmRssek7TzM2/HuBJgDEgXwGX G+FoUjH4wYNO3n9M3hmKq5bKKslkWtvxkHrTfcImqjr6JwWBqID2bTcfDOa1b77MXq DNgtjwUEk7udA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4N8gyq61ptz9rxF; Sat, 12 Nov 2022 17:29:55 +0100 (CET) From: =?utf-8?Q?Juan_Manuel_Mac=C3=ADas?= To: Ypo Cc: Org-mode Subject: Re: Help with a (query) replacement References: <4b3f7f60-9983-8a46-7a42-b14644f18347@gmail.com> <87k040jjio.fsf@posteo.net> <878rkgxiq6.fsf@posteo.net> Date: Sat, 12 Nov 2022 16:29:52 +0000 In-Reply-To: <878rkgxiq6.fsf@posteo.net> ("Juan Manuel =?utf-8?Q?Mac=C3=AD?= =?utf-8?Q?as=22's?= message of "Sat, 12 Nov 2022 17:04:01 +0100") Message-ID: <874jv4xhj3.fsf@posteo.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.67.36.65; envelope-from=maciaschain@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1668270639; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=Ieomns9NmDaR+Vh81/wsSb7o/K9lSZvo+tfsH7Bf3WE=; b=JEt+3stDWKG0NmsNQlM/CAe9XQwhwa5h+a6MQ7zx8B5SenJ2JIDGfMLjA74JqiD9uV1jUC bgnKPh87G0ecp9QVrAV1Qx3VBU8ZY6XUPU5+bGz8R2LZVyYbADz5fSZ8qEqt2sN5/1NBzo lj6jByFHXyCbMQ5bPo4MVx4QFU9qB4bgiwjUgNWaLqfo7IQG2I6Sgcr9y0h/odjZG3oag+ sZRL0mHWhJrx1dMhro0AWJUe9DXZdblHyyACfzCBz6oS9N1V+c9L2/av/BarFQXwRzBzS6 /rlNGcN/BSXo6w3atHyTIfOGTREBkLCp30zbS4M/q/lGDQmzsdWU3MBNN3bJfw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1668270639; a=rsa-sha256; cv=none; b=HrCcPPOE+g96gQFYHP5xRppTA8LzhGR2EGmY7QS1mptFgfY6L0roT6ZNNBDOB11eLLdwwH Uf3KPY6HH5BtXycG21NTfFnYeVY08LZgeKAnH62zWe6W54Io4RQ8Zap7TSSo3Zfl7NcDLv mQl++CjyyU2cZKeps/1QNiM6skxd6KqqYngXTneKxIPnBt2wmWPyrWmgVKNqefnuDLYTN8 1lqMuruRqeKmzant3HQQTTRSwGFWSmD4smc7PaIYkdfqfk280AtSV06RKtfPLl0yA5cQcX 27Up0C059eDEwmsGGOdsUKyBJAGQ7AP+VNjVNGzabSm8L7pSbz+4++0URjo3Wg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=ksliyjIE; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.04 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=ksliyjIE; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 723C124A51 X-Spam-Score: -4.04 X-Migadu-Scanner: scn0.migadu.com X-TUID: nWUNTJloylG/ Juan Manuel Mac=C3=ADas writes: > In the case of PDFs, I would use pdftotext. It converts the PDF to plain > text and (in theory) removes hyphens from the PDF after conversion. The > resulting plain text is somewhat ugly (page numbers and other elements > are preserved), but if you just want to copy/paste text, I think it's > enough. And if you don't want to mess with the command line, you can also use calibre here to convert from PDF to plain text or even Epub (the latter is better because Epub is a tagged format and then you can have more control over how to process that, for example by converting it to Org or Markdown with pandoc). Calibre will do its best to preserve the structure of the PDF, removing hyphens and other unnecessary elements. But keep in mind that this process is largely heuristic, and the conversion is not 100% perfect. However, it works acceptably well. https://calibre-ebook.com/about