From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id cPCQHS+0/GRUsgAAauVa8A:P1 (envelope-from ) for ; Sat, 09 Sep 2023 20:06:39 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:478a::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id cPCQHS+0/GRUsgAAauVa8A (envelope-from ) for ; Sat, 09 Sep 2023 20:06:39 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 12DB25F937 for ; Sat, 9 Sep 2023 20:06:39 +0200 (CEST) Authentication-Results: aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1694282799; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=yr1GWI59FLSW8PKuX3GjwsdKQXRphWLFdTgaPfnbRbM=; b=LK320+a3nrUfC667dZsepRaPjJLkiCHBCoeRKuirF3rl6PkvvkNbOMc0KBbSLR2KGBhDF/ 52//80Rstn5CupugD5fJeQjixITpfa/Ggu/lbVcANF/KqY5xoxKKruHIVpqyvGRCVOKPYL A2AiljDhiYQbE4VpJTTBy8aWMv2ge0Sla5u+sJ6apTkcXGQvuXBsxzyq7aofPJx7dngwTP UIi9r/tW9ksyzoj264J+kttoAm1PKiyvFjDskCSjCzIMKm7acMDkagmpU/Bm3LOgEzpN/0 v/Amiz6cvKyQ0a9icV0ZfpVEljkZpY15WHo5Ia6qRsM7sLgzOoYFxbExZ92OTw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1694282799; a=rsa-sha256; cv=none; b=OtvLS1AGD8C8UcXf7C7H/1fGKN4szKVNTjdXuR/bvSjiPOWxQ1HKNWc8kOKGXV7p0mgb8D Deg6f/+9uwSzQBnkSpUfHoYCFTyDDOq34DfZMPwi7BhN3v0yGSaFruhV7AZ6rSGsZIlLAV OEQNlOkddHTtZ1osw/VuAHrtlMnSb8uwhtNtT8T6k/FZrnxtG9qzRoNkylm+gRnejQAIm6 xTtCCp904SAQFx62gTUioyxTda5lYjcbFu9j0seQ8AmD+iJ6g79WHjUp+S4o2B6IfHyYqj 6+uWwM51v3R5iTLf51PzRIAfhUUUhdUZu1yi6K1Z6nmYqnJSJ1E92n/ixjwAtw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org"; dmarc=none Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qf2LF-00025n-NU; Sat, 09 Sep 2023 14:05:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qf2L7-00025a-Pk for emacs-orgmode@gnu.org; Sat, 09 Sep 2023 14:05:51 -0400 Received: from udcm-wwu1.uni-muenster.de ([128.176.118.7]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qf2L4-0003E6-Sw for emacs-orgmode@gnu.org; Sat, 09 Sep 2023 14:05:49 -0400 X-IronPort-AV: E=Sophos;i="6.02,239,1688421600"; d="scan'208";a="289071681" Received: from wiwi-ex-1.wiwi.uni-muenster.de (HELO mail.wiwi.uni-muenster.de) ([10.2.21.33]) by UDCM-RELAY1.UNI-MUENSTER.DE with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 09 Sep 2023 20:05:42 +0200 Received: from D-3170L03 (128.176.63.9) by WIWI-EX-1.WIWI.UNI-MUENSTER.DE (10.2.21.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Sat, 9 Sep 2023 20:05:41 +0200 From: Jens Lechtenboerger To: Subject: Suggestions for Text-To-Speech (TTS) from Org sources? Mail-Followup-To: emacs-orgmode@gnu.org Date: Sat, 9 Sep 2023 20:05:40 +0200 Message-ID: <87msxv1atn.fsf@wi.uni-muenster.de> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [128.176.63.9] X-ClientProxiedBy: WIWI-EX-2.WIWI.UNI-MUENSTER.DE (10.2.21.35) To WIWI-EX-1.WIWI.UNI-MUENSTER.DE (10.2.21.33) Received-SPF: pass client-ip=128.176.118.7; envelope-from=Jens.Lechtenboerger@wi.uni-muenster.de; helo=UDCM-WWU1.UNI-MUENSTER.DE X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: emacs-orgmode-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Scanner: mx0.migadu.com X-Migadu-Spam-Score: -0.61 X-Spam-Score: -0.61 X-Migadu-Queue-Id: 12DB25F937 X-TUID: PPt28uf90IfZ Dear all, does someone here produce audio via Text-To-Speech (TTS) from Org sources? I plan to do that in the context of emacs-reveal to generate voice-over for reveal.js presentations, with open questions [1] concerning my initial, experimental approach. Currently, I like the default model of Coqui-AI TTS [2] and Microsoft SpeechT5 [3] best. Any suggestions for free and open TTS implementations that produce even better results? Other models of Coqui-AI? The solution should work without GPU support, which seems to rule out Suno Bark [4]. The above models do not pronounce numbers/digits, and they fail to pronounce most acronyms. In a preprocessing step I could replace those. I use preprocessing anyways to get rid of Org markup that might confuse the language models. Anyone here who did that already? Maybe gruut [5] in conjunction with SSML [6] handling? Any other suggestions? Best wishes Jens [1] https://gitlab.com/oer/emacs-reveal/-/issues/20 [2] https://github.com/coqui-ai/TTS/ [3] https://huggingface.co/microsoft/speecht5_tts [4] https://github.com/suno-ai/bark [5] https://github.com/rhasspy/gruut [6] https://www.w3.org/TR/speech-synthesis11/