emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Help with a (query) replacement
@ 2022-11-12 14:42 Ypo
  2022-11-12 15:10 ` Juan Manuel Macías
  2022-11-12 15:31 ` Alain.Cochard
  0 siblings, 2 replies; 10+ messages in thread
From: Ypo @ 2022-11-12 14:42 UTC (permalink / raw)
  To: Org-mode

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

Hi

I am copy-pasting e-books into org-mode to read and study them.

Usually, words come hyphenated, like "ato- mized", that I wanted to 
transform into "atomized".

I am trying with query replace, but I am starting to think that it is 
not the correct tool for this job.
I tried "query-replace [a-z]-" but I don't know how to exclude the 
letter before the "-".

What would you advise?


Best regards

[-- Attachment #2: Type: text/html, Size: 1057 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 14:42 Help with a (query) replacement Ypo
@ 2022-11-12 15:10 ` Juan Manuel Macías
  2022-11-12 15:12   ` Ypo
  2022-11-12 15:23   ` Juan Manuel Macías
  2022-11-12 15:31 ` Alain.Cochard
  1 sibling, 2 replies; 10+ messages in thread
From: Juan Manuel Macías @ 2022-11-12 15:10 UTC (permalink / raw)
  To: Ypo; +Cc: Org-mode

Ypo writes:

> Hi
>
> I am copy-pasting e-books into org-mode to read and study them. 
>
> Usually, words come hyphenated, like "ato- mized", that I wanted to
> transform into "atomized". 
>
> I am trying with query replace, but I am starting to think that it is
> not the correct tool for this job. 
> I tried "query-replace [a-z]-" but I don't know how to exclude the
> letter before the "-".
>
> What would you advise?

I think it will be more practical for you to use pandoc:

With this command you can convert an epub format to org:

pandoc my-epub.epub -o my.epub.org

(https://pandoc.org)

You can also install calibre and convert your epubs to plain text from
there.

Best regards,

Juan Manuel 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 15:10 ` Juan Manuel Macías
@ 2022-11-12 15:12   ` Ypo
  2022-11-12 16:04     ` Juan Manuel Macías
  2022-11-12 15:23   ` Juan Manuel Macías
  1 sibling, 1 reply; 10+ messages in thread
From: Ypo @ 2022-11-12 15:12 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Org-mode

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

Thanks, Juan Manuel.

I normally study using PDF books. Their typography is like "hardcoded", 
so a post-processing using Orgmode is needed, I think.

El 12/11/2022 a las 16:10, Juan Manuel Macías escribió:
> Ypo writes:
>
>> Hi
>>
>> I am copy-pasting e-books into org-mode to read and study them.
>>
>> Usually, words come hyphenated, like "ato- mized", that I wanted to
>> transform into "atomized".
>>
>> I am trying with query replace, but I am starting to think that it is
>> not the correct tool for this job.
>> I tried "query-replace [a-z]-" but I don't know how to exclude the
>> letter before the "-".
>>
>> What would you advise?
> I think it will be more practical for you to use pandoc:
>
> With this command you can convert an epub format to org:
>
> pandoc my-epub.epub -o my.epub.org
>
> (https://pandoc.org)
>
> You can also install calibre and convert your epubs to plain text from
> there.
>
> Best regards,
>
> Juan Manuel

[-- Attachment #2: Type: text/html, Size: 1589 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 15:10 ` Juan Manuel Macías
  2022-11-12 15:12   ` Ypo
@ 2022-11-12 15:23   ` Juan Manuel Macías
  2022-11-12 15:25     ` Ypo
  1 sibling, 1 reply; 10+ messages in thread
From: Juan Manuel Macías @ 2022-11-12 15:23 UTC (permalink / raw)
  To: Ypo; +Cc: Org-mode

Juan Manuel Macías writes:

> I think it will be more practical for you to use pandoc:
>
> With this command you can convert an epub format to org:
>
> pandoc my-epub.epub -o my.epub.org
>
> (https://pandoc.org)
>
> You can also install calibre and convert your epubs to plain text from
> there.

PS: And you can also open epubs in Emacs with the nov.el package
(https://github.com/wasamasa/nov.el) and copy whatever text you want
from there.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 15:23   ` Juan Manuel Macías
@ 2022-11-12 15:25     ` Ypo
  0 siblings, 0 replies; 10+ messages in thread
From: Ypo @ 2022-11-12 15:25 UTC (permalink / raw)
  To: Juan Manuel Macías; +Cc: Org-mode

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]

ummm, so you copy-paste from nov.el into an org buffer and then format 
would be maintained... interesting.

Thanks

El 12/11/2022 a las 16:23, Juan Manuel Macías escribió:
> Juan Manuel Macías writes:
>
>> I think it will be more practical for you to use pandoc:
>>
>> With this command you can convert an epub format to org:
>>
>> pandoc my-epub.epub -o my.epub.org
>>
>> (https://pandoc.org)
>>
>> You can also install calibre and convert your epubs to plain text from
>> there.
> PS: And you can also open epubs in Emacs with the nov.el package
> (https://github.com/wasamasa/nov.el) and copy whatever text you want
> from there.

[-- Attachment #2: Type: text/html, Size: 1380 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 14:42 Help with a (query) replacement Ypo
  2022-11-12 15:10 ` Juan Manuel Macías
@ 2022-11-12 15:31 ` Alain.Cochard
  2022-11-16 19:20   ` Ypo
  1 sibling, 1 reply; 10+ messages in thread
From: Alain.Cochard @ 2022-11-12 15:31 UTC (permalink / raw)
  To: Ypo; +Cc: Org-mode

Ypo writes on Sat 12 Nov 2022 15:42:
 > Hi
 > 
 > I am copy-pasting e-books into org-mode to read and study them.
 > 
 > Usually, words come hyphenated, like "ato- mized", that I wanted to 
 > transform into "atomized".
 > 
 > I am trying with query replace, but I am starting to think that it is 
 > not the correct tool for this job.
 > I tried "query-replace [a-z]-" but I don't know how to exclude the 
 > letter before the "-".

I don't really understand the thing with e-books but as for
transforming "ato- mized" (or "ato-mized") into "atomized", how about
something like

query-replace-regex

\([a-z]\)-[[:blank:]]*\([a-z]\)

\1\2


-- 
EOST (École et Observatoire des Sciences de la Terre) 
ITE (Institut Terre & Environnement) | alain.cochard@unistra.fr
5 rue René Descartes   [bureau 110]  | Phone: +33 (0)3 68 85 50 44 
F-67084 Strasbourg Cedex, France     | [ slot available for rent ]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 15:12   ` Ypo
@ 2022-11-12 16:04     ` Juan Manuel Macías
  2022-11-12 16:29       ` Juan Manuel Macías
  0 siblings, 1 reply; 10+ messages in thread
From: Juan Manuel Macías @ 2022-11-12 16:04 UTC (permalink / raw)
  To: Ypo; +Cc: Org-mode

Ypo writes:

> Thanks, Juan Manuel.
>
> I normally study using PDF books. Their typography is like
> "hardcoded", so a post-processing using Orgmode is needed, I think.

If it's a PDF then forget what I told you about pandoc, because here
pandoc would have nothing to do. I thought you were referring to files
in epub format, sorry.

In the case of PDFs, I would use pdftotext. It converts the PDF to plain
text and (in theory) removes hyphens from the PDF after conversion. The
resulting plain text is somewhat ugly (page numbers and other elements
are preserved), but if you just want to copy/paste text, I think it's
enough.

The command:

pdftotext my-file.pdf

https://man.archlinux.org/man/pdftotext.1.en

https://en.wikipedia.org/wiki/Pdftotext


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 16:04     ` Juan Manuel Macías
@ 2022-11-12 16:29       ` Juan Manuel Macías
  0 siblings, 0 replies; 10+ messages in thread
From: Juan Manuel Macías @ 2022-11-12 16:29 UTC (permalink / raw)
  To: Ypo; +Cc: Org-mode

Juan Manuel Macías writes:

> In the case of PDFs, I would use pdftotext. It converts the PDF to plain
> text and (in theory) removes hyphens from the PDF after conversion. The
> resulting plain text is somewhat ugly (page numbers and other elements
> are preserved), but if you just want to copy/paste text, I think it's
> enough.

And if you don't want to mess with the command line, you can also use
calibre here to convert from PDF to plain text or even Epub (the latter
is better because Epub is a tagged format and then you can have more
control over how to process that, for example by converting it to Org or
Markdown with pandoc). Calibre will do its best to preserve the
structure of the PDF, removing hyphens and other unnecessary elements.
But keep in mind that this process is largely heuristic, and the
conversion is not 100% perfect. However, it works acceptably well.

https://calibre-ebook.com/about


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-12 15:31 ` Alain.Cochard
@ 2022-11-16 19:20   ` Ypo
  2022-11-16 22:16     ` Alain.Cochard
  0 siblings, 1 reply; 10+ messages in thread
From: Ypo @ 2022-11-16 19:20 UTC (permalink / raw)
  To: alain.cochard; +Cc: Org-mode

[-- Attachment #1: Type: text/plain, Size: 920 bytes --]

Thanks, working!!

I can't see those options in the manual (15.10.4 Query Replace) and the 
help is a little bit obscure for me, a new rabbit hole to study ;D

Thanks!


El 12/11/2022 a las 16:31, Alain.Cochard@unistra.fr escribió:
> Ypo writes on Sat 12 Nov 2022 15:42:
>   > Hi
>   >
>   > I am copy-pasting e-books into org-mode to read and study them.
>   >
>   > Usually, words come hyphenated, like "ato- mized", that I wanted to
>   > transform into "atomized".
>   >
>   > I am trying with query replace, but I am starting to think that it is
>   > not the correct tool for this job.
>   > I tried "query-replace [a-z]-" but I don't know how to exclude the
>   > letter before the "-".
>
> I don't really understand the thing with e-books but as for
> transforming "ato- mized" (or "ato-mized") into "atomized", how about
> something like
>
> query-replace-regex
>
> \([a-z]\)-[[:blank:]]*\([a-z]\)
>
> \1\2
>
>

[-- Attachment #2: Type: text/html, Size: 1567 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help with a (query) replacement
  2022-11-16 19:20   ` Ypo
@ 2022-11-16 22:16     ` Alain.Cochard
  0 siblings, 0 replies; 10+ messages in thread
From: Alain.Cochard @ 2022-11-16 22:16 UTC (permalink / raw)
  To: Ypo; +Cc: alain.cochard, Org-mode

Ypo writes on Wed 16 Nov 2022 20:20:
 > Thanks, working!!
 > 
 > I can't see those options in the manual (15.10.4 Query Replace)

How about 15.6 (Syntax of Regular Expressions)?

-- 
EOST (École et Observatoire des Sciences de la Terre) 
ITE (Institut Terre & Environnement) | alain.cochard@unistra.fr
5 rue René Descartes   [bureau 110]  | Phone: +33 (0)3 68 85 50 44 
F-67084 Strasbourg Cedex, France     | [ slot available for rent ]



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-16 22:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-12 14:42 Help with a (query) replacement Ypo
2022-11-12 15:10 ` Juan Manuel Macías
2022-11-12 15:12   ` Ypo
2022-11-12 16:04     ` Juan Manuel Macías
2022-11-12 16:29       ` Juan Manuel Macías
2022-11-12 15:23   ` Juan Manuel Macías
2022-11-12 15:25     ` Ypo
2022-11-12 15:31 ` Alain.Cochard
2022-11-16 19:20   ` Ypo
2022-11-16 22:16     ` Alain.Cochard

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).