From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id WIpbDefh/GQcVQEA9RJhRA:P1 (envelope-from ) for ; Sat, 09 Sep 2023 23:21:43 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id WIpbDefh/GQcVQEA9RJhRA (envelope-from ) for ; Sat, 09 Sep 2023 23:21:43 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3BFF65095E for ; Sat, 9 Sep 2023 23:21:42 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Y/n12y/H"; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1694294502; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=MsfbQojz9xTtIDgsHgkqSemIVaRCsNfEfUnMVocDtYI=; b=jWyL8zyTQ5OO7ZsZdae+5QzFXYCQVrkPgO/fHeAgnqUSsanNFI+jQ4yweP/S8wAu+8uViS h5a8xWW5GqECx8Pth/UeO7WZQWqOkthy32DjJxc3N7KIh4bz2WhNTXsuKaLK7cq4vM4G09 APwzyJpZ1o/an7BduZLL6/YIMjT+BR0+R2ko5nyBStBUKumUMOXcJrvL3TOkhBULeew0Yk a1va56/BWxe4CroNCOdx0SPmtDn2CeJdWtZIuvYpQjMjZjrMGGNPTXp8kYR/CREdceH2eX ikma2KjT9ehvlL/CHqkFHFSNTD0IHXusEtkcajltcbyzsZ/0sfoIKU7R2Pa7QQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1694294502; a=rsa-sha256; cv=none; b=MegjaeNUj1VP6XZ+A3/P4yc86kBNZp5uua1ItuxyVY/L+qSSWQ3u6q2yzSmW6rLOkC2WkB YogOPyaKpYSmRshP4kIsNJgGuZQw0jmHbTctM/kSPHH8NMML7EO8OjLOWDmu2m1id6s0N/ AHqd5ODxp8tHXfyvG/262y8DiIjR1i7iZd1BLTSE1Ozwn3EuIQcLbUVYpvqAO6IGbCVnT2 xWBkYxMYiHSHVPnjyYnrpqC2O+a6UxI7k7/XBZgs2dbefbutbxRPOn5pV2TuRs+ITucRBf BdC41bwrLmyHdpcUx36d04r+ze3YRv3DilM8L6nMrc6eGxaPDxmeyKMlmf3xlA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20221208 header.b="Y/n12y/H"; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qf5Nn-0005ry-7J; Sat, 09 Sep 2023 17:20:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qf5Nl-0005qv-9S for emacs-orgmode@gnu.org; Sat, 09 Sep 2023 17:20:45 -0400 Received: from mail-lj1-x229.google.com ([2a00:1450:4864:20::229]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qf5Ni-0003ko-It for emacs-orgmode@gnu.org; Sat, 09 Sep 2023 17:20:44 -0400 Received: by mail-lj1-x229.google.com with SMTP id 38308e7fff4ca-2b703a0453fso53973301fa.3 for ; Sat, 09 Sep 2023 14:20:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694294440; x=1694899240; darn=gnu.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=MsfbQojz9xTtIDgsHgkqSemIVaRCsNfEfUnMVocDtYI=; b=Y/n12y/H1zQaWho4a5/B/EWBeFR9aEz6TfDiFd5fMVeJypI8epiwT2m2RksW0gDKft e0Tz0SFt0xxlZt3SeRt1ZgxCfwTztwKMTnxW+mwmOtVZkvNvaz7Gw992g7PLNkT6ZxWh 5DmW52p3mwZbHjplCEh0MqNHK1F6icNRZCEcZy8U+Pd52WY4n23pMgwBHKNxJfWRY67w OQ2aTHcnX7m5RjccO2pcYdvcxDN7MQQEhlvbgC+ks/cDgh0vrQZyy6FqCg56LjiDAOtM 8tN35Ci4hK7iDye7ea6NOE5+MDH0bQoqSfzSmwaEI1rHAYPi8gFBCFiCW01YQMemkb+L qOYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694294440; x=1694899240; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MsfbQojz9xTtIDgsHgkqSemIVaRCsNfEfUnMVocDtYI=; b=kTPzwDUODYRK8xXYF0RffhnFCfq+wSVH8N9B7iWZqQCILLkAQzTPOwzuISPeLz/51i IzAxGSRlmpo+q2sEWh18RhYAKe1qGAMOPlQjQSHmYa6XuUr/3rF49KYVUaJg8EAg9v3D RwDg1QanNBY0EVmw0g2sHgQmnUWGfeaEALXVhkMSnzHeCysOKn3fwsePcafbq/weDiiK 786iLVYL8NOYHEAMsifxNrqQeroc0LwjIlUKfIaXfyuf/SmcCPM68KFLc9ZlMves4cly j6KxvMiAf/UaXbHYCnozgueeb0Ae8WARRgcf7EHsjlSdhjFKw671lunxeAd20dzb+yV8 rexg== X-Gm-Message-State: AOJu0YyKhYWnyPlah6GW2eJaPkyiREBmDdGPxrQXQ6++cf5rlQuDgZXu 3VxOQxiP7lyPz3rIQQvkcCEhvatsJo3Eg5UO5s2xvq5Y X-Google-Smtp-Source: AGHT+IEuNiFJkuu4ZvLlIMB6PKDRDtmRV9P/grxftt1GSJytXKij7T6b1XymWjFeTVtShoyGhJF1lSxyRbhhPIWia1E= X-Received: by 2002:a2e:bcc7:0:b0:2bf:7dac:a41 with SMTP id z7-20020a2ebcc7000000b002bf7dac0a41mr1306144ljp.13.1694294439867; Sat, 09 Sep 2023 14:20:39 -0700 (PDT) MIME-Version: 1.0 References: <87msxv1atn.fsf@wi.uni-muenster.de> In-Reply-To: <87msxv1atn.fsf@wi.uni-muenster.de> From: briangpowell Date: Sat, 9 Sep 2023 17:20:28 -0400 Message-ID: Subject: Re: Suggestions for Text-To-Speech (TTS) from Org sources? To: emacs-orgmode@gnu.org Content-Type: multipart/alternative; boundary="0000000000003bbd2c0604f3aaf7" Received-SPF: pass client-ip=2a00:1450:4864:20::229; envelope-from=briangpowellms@gmail.com; helo=mail-lj1-x229.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Scanner: mx0.migadu.com X-Migadu-Spam-Score: -1.61 X-Spam-Score: -1.61 X-Migadu-Queue-Id: 3BFF65095E X-TUID: tegHu9Ix+3q7 --0000000000003bbd2c0604f3aaf7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I've turned OrgMode files into audio desktops It was pretty simple Just find the code that reveals what an icon is when you hover over it & pipe it to some text-to-speech engine & then on to usual routes On Sat, Sep 9, 2023 at 2:06=E2=80=AFPM Jens Lechtenboerger < lechten@wi.uni-muenster.de> wrote: > Dear all, > > does someone here produce audio via Text-To-Speech (TTS) from Org > sources? I plan to do that in the context of emacs-reveal to > generate voice-over for reveal.js presentations, with open questions > [1] concerning my initial, experimental approach. > > Currently, I like the default model of Coqui-AI TTS [2] and > Microsoft SpeechT5 [3] best. Any suggestions for free and open TTS > implementations that produce even better results? Other models of > Coqui-AI? The solution should work without GPU support, which seems > to rule out Suno Bark [4]. > > The above models do not pronounce numbers/digits, and they fail to > pronounce most acronyms. In a preprocessing step I could replace > those. I use preprocessing anyways to get rid of Org markup that > might confuse the language models. Anyone here who did that > already? Maybe gruut [5] in conjunction with SSML [6] handling? > > Any other suggestions? > > Best wishes > Jens > > [1] https://gitlab.com/oer/emacs-reveal/-/issues/20 > [2] https://github.com/coqui-ai/TTS/ > [3] https://huggingface.co/microsoft/speecht5_tts > [4] https://github.com/suno-ai/bark > [5] https://github.com/rhasspy/gruut > [6] https://www.w3.org/TR/speech-synthesis11/ > > --0000000000003bbd2c0604f3aaf7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I've turned OrgMode files into audio desktops

=
It was pretty simple

Just find the code= that reveals what an icon is when you hover over it & pipe it to some = text-to-speech engine & then on to usual routes

On Sat, Sep 9, 202= 3 at 2:06=E2=80=AFPM Jens Lechtenboerger <lechten@wi.uni-muenster.de> wrote:
Dear all,

does someone here produce audio via Text-To-Speech (TTS) from Org
sources?=C2=A0 I plan to do that in the context of emacs-reveal to
generate voice-over for reveal.js presentations, with open questions
[1] concerning my initial, experimental approach.

Currently, I like the default model of Coqui-AI TTS [2] and
Microsoft SpeechT5 [3] best.=C2=A0 Any suggestions for free and open TTS implementations that produce even better results?=C2=A0 Other models of
Coqui-AI?=C2=A0 The solution should work without GPU support, which seems to rule out Suno Bark [4].

The above models do not pronounce numbers/digits, and they fail to
pronounce most acronyms.=C2=A0 In a preprocessing step I could replace
those.=C2=A0 I use preprocessing anyways to get rid of Org markup that
might confuse the language models.=C2=A0 Anyone here who did that
already?=C2=A0 Maybe gruut [5] in conjunction with SSML [6] handling?

Any other suggestions?

Best wishes
Jens

[1] https://gitlab.com/oer/emacs-reveal/-/issues/20
[2]
https://github.com/coqui-ai/TTS/
[3] https://huggingface.co/microsoft/speecht5_tts [4] https://github.com/suno-ai/bark
[5] https://github.com/rhasspy/gruut
[6] https://www.w3.org/TR/speech-synthesis11/

--0000000000003bbd2c0604f3aaf7--