From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id ME8LGzzEb2OYZQAAbAwnHQ (envelope-from ) for ; Sat, 12 Nov 2022 17:05:16 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id Ku7zGjzEb2NvBQEAauVa8A (envelope-from ) for ; Sat, 12 Nov 2022 17:05:16 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 410BC240C2 for ; Sat, 12 Nov 2022 17:05:16 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1otszV-0000jg-2Q; Sat, 12 Nov 2022 11:04:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1otszN-0000iy-Sa for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 11:04:15 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1otszL-00048t-G0 for emacs-orgmode@gnu.org; Sat, 12 Nov 2022 11:04:13 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 17D32240027 for ; Sat, 12 Nov 2022 17:04:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668269048; bh=HWGxKSlfHbRk2XsYXwlfZWROoNtHQflgfpGodXyjt3Y=; h=From:To:Cc:Subject:Date:From; b=C1SNYbDnk3MrLkqqg8czosKwVC5uQFOSZhpWMpsXkXfczo/PQns9ne+49hOUY//Ok Qk5A6mDl0NhbJsV2HKOXuWJIIaVrFVdd5RK6R2LD/wjnGlahjGtQpqiKsA5/y/zGQm UGlqI4AC1MBv9SYjzb6ZxcoaP9fp61dS5txrjnSVN4BvoN/KFAVQ+pCm9a9lSGDn2A 2ycMOSuZTckaC6cvEXJ6A+dzS43ttsY72cO2RL9CdIxoDKt9L76qyzUAjdFuMPZQu+ 2V1GjQ9FLB7MI+wgJQwriIQgmDfdku6cD6LR3hC3IzrKO6RC+0NM/uDArfDKRBCukN qcmp1WF5OMZ1w== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4N8gP24MlPz9rxF; Sat, 12 Nov 2022 17:04:04 +0100 (CET) From: =?utf-8?Q?Juan_Manuel_Mac=C3=ADas?= To: Ypo Cc: Org-mode Subject: Re: Help with a (query) replacement References: <4b3f7f60-9983-8a46-7a42-b14644f18347@gmail.com> <87k040jjio.fsf@posteo.net> Date: Sat, 12 Nov 2022 16:04:01 +0000 In-Reply-To: (Ypo's message of "Sat, 12 Nov 2022 16:12:10 +0100") Message-ID: <878rkgxiq6.fsf@posteo.net> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=185.67.36.65; envelope-from=maciaschain@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1668269116; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=HWGxKSlfHbRk2XsYXwlfZWROoNtHQflgfpGodXyjt3Y=; b=szmTsVDthgosDzhVZZvmgIMWte57qeuql3r9UQQfp6+F9/ChzjXyItxpzjOU8deTjFth1e OSwCzlHSdeBvoSavgwZ22Bs0NyrynVoC7IAf32Medjyc7eTy7csoF4l4XvvtqmB4HFI8kK 4pOUFGGQktbBMwHlp2a+gZ1q7W5f9JB4h32txZeiXZ+xOwGYYrw9ym/raeK+a03U2/81FF slYGxtrzlC0v5WUODlVOUzxDM0IINPrc8ItQUH6m0LwkWZxfBn7xr7WZJUUFdSAFIFq3aE WM7DeQslWsHLOnTsyUaPIzxIIQ11gqJ0piWt+P0q6Uy/5wlzyraaRIS7ki7AIw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1668269116; a=rsa-sha256; cv=none; b=pR6O2YMJBpFBEq4c88WXI/TAa5mL8fXcDR57alMlujLGjHrQQl7338ijqhHiBPt/WC/8qh qWDMhcw6kt4ufPUw3ypUsyIgi+tQ2XdmBiI4dVQJAjpvomht+4+Hr2JE1+Fc4uMfHPY0eG jT10HrxHvFYidmnSXoXGX2TvBLzgFqZpBNx2aOG7p5vk0Jcn/ZK7D35Y+w8c5EA8k1fLi5 AUSjx8TxsRAs4wpzjITtanrGtFBYMrQV3EbS7Sw3qqVSSig4H6nN4ZH/Q9mjcq3JJW4PIy zn3TKjP/tzg4bt3jtlcd9LUMTx3ONyoO7dO5tdsC8HwWV/vrYcKvseNcKDQn0g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=C1SNYbDn; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -7.54 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b=C1SNYbDn; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 410BC240C2 X-Spam-Score: -7.54 X-Migadu-Scanner: scn0.migadu.com X-TUID: yRHoazUcBcrY Ypo writes: > Thanks, Juan Manuel. > > I normally study using PDF books. Their typography is like > "hardcoded", so a post-processing using Orgmode is needed, I think. If it's a PDF then forget what I told you about pandoc, because here pandoc would have nothing to do. I thought you were referring to files in epub format, sorry. In the case of PDFs, I would use pdftotext. It converts the PDF to plain text and (in theory) removes hyphens from the PDF after conversion. The resulting plain text is somewhat ugly (page numbers and other elements are preserved), but if you just want to copy/paste text, I think it's enough. The command: pdftotext my-file.pdf https://man.archlinux.org/man/pdftotext.1.en https://en.wikipedia.org/wiki/Pdftotext