Text to Speech Synthesis in Celebrity’s Voice

Ajinkya P. Gaddime; Dhananjay P. Mane; Ruchita K. Vehale; Vaishnavi S. Khawale; D. G. Bhalke

doi:10.18090/samriddhi.v12iS2.6

View Abstract PDF download PDF View PDF

DOI https://doi.org/10.18090/samriddhi.v12iS2.6

Published Nov 30, 2020

DOI https://doi.org/10.18090/samriddhi.v12iS2.6

Ajinkya P. Gaddime

Electronics and Telecommunication Department, All India Shri Shivaji Memorial Society's College of Engineering, Pune, Maharashtra, India

Dhananjay P. Mane

Electronics and Telecommunication Department, All India Shri Shivaji Memorial Society's College of Engineering, Pune, Maharashtra, India

Ruchita K. Vehale

Electronics and Telecommunication Department, All India Shri Shivaji Memorial Society's College of Engineering, Pune, Maharashtra, India

Vaishnavi S. Khawale

Electronics and Telecommunication Department, All India Shri Shivaji Memorial Society's College of Engineering, Pune, Maharashtra, India

D. G. Bhalke

Electronics and Telecommunication Department, All India Shri Shivaji Memorial Society's College of Engineering, Pune, Maharashtra, India

Abstract

This paper is proposed for text to speech synthesis. It uses neural network architecture for generation of speech and its synthesis directly from text in celebrity’s voice. The device is fitted with a recurring sequence-to-sequence prediction that graphs the embedding characters into mel scale spectrograms, followed by an updated WaveNet model that functions as a vocoder to create time-domain waveforms from those spectrograms. Here, project evaluation of the impact of mel spectrograms as the conditioning input to WaveNet rather than linguistic features, length, and F0. This paper further would be showing that utilizing this compact acoustic intermediate representation allows a significant reduction in the size of the WaveNet architecture.
Using this technique, we are going to modulate the output of the vocoder according to the frequency and pitch of a specific celebrity. Using a unit selection method of concatenation synthesis, a database of prerecorded voice is collected. This paper includes creating a database of an Indian celebrity, clustering, indexing, and synthesizing it for creating a voice output with respect to the text as input. Also worked on normalization of text which includes abbreviations, acronyms, and linguistic analysis. This paper gives output for phonemic features, like vowel length, vowel height, frontness, consonant voicing, consonant poi, and position in the syllable and word.

Downloads

Download data is not yet available.

How to Cite

Gaddime, A., Mane, D., Vehale, R., Khawale, V., & Bhalke, D. (2020). Text to Speech Synthesis in Celebrity’s Voice. SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology, 12(SUP 2), 27-30. https://doi.org/10.18090/samriddhi.v12iS2.6

Issue

Vol 12 No SUP 2 (2020): SAMRIDDHI: A Journal of Physical Sciences, Engineering and Technology

Section

Research Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

[1] Jonathan Shen1, Ruoming Pang1 and Ron J. Weiss (2018), Natural
TTS synthesis by conditioning Wave-Net on Mel spectrogram
predictions, University of California, Berkeley,.
[2] Aaronvanden Oord, Sander Die leman and Heiga Zen
(2016), Wave-Net: A Generative model for raw audio, Google
DeepMind, London, UK.
[3] Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang
(2016), Fast Wave-Net generation model, University of Illinois at
USA.
[4] Nalluri, S. K., & Parasaram, V. K. B. (2015). Automating
Software Builds with Jenkins: Design Patterns and Failure
Handling. International Journal of Technology, Management
and Humanities, 1(01), 16-33.
https://doi.org/10.21590/ijtmh.01.02.03
[5] P. Taylor (2009), Text-to-Speech Synthesis, Cambridge University
Press, New York, NY, USA, 1st edition,.
[6] N. Swetha and K. Anuradha (2013), Text to speech
conversion, International Journal of Advanced Trends in
Computer Science and Engineering, Vol .2, No.6, Pages :
269-278
[7] H. Zen, A. Senior, and M. Schuster (2013), “Statistical parametric
speech synthesis using deep neural networks,” in Proceedings
of ICASSP, pp. 7962–7966.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References