[Previous entry: "worth1k"] [Main Index] [Next entry: "Earthman."]
04/10/2006 Entry: "Tom Baker sings."
Tom Baker sings. Kind of. British Telecom got the old Dr. Who to record 11,000 words and made the service available for subscribers. I guess its for reading text messages outloud, but its been used for general wackiness since its release. I'm digging him doing Bohemian Rhapsody right now. I wish BT would release this freely so Baker's voice will be the new default voice for text to speech as opposed to the old hawking-like DEC talk.
Replies: 2 comments
The article is phrased a bit misleadingly; it makes it sound as though the resulting speech is simply pasted together from a bunch of recorded words, which isn't the case. The training session does generate a table of redundant morphemes, but more importantly it uses Hidden Markov Models (evocative of a misanthropic midget, for RAW fans) to build a relational model that makes inferences about how timbre and (most importantly) prosody (inflection and intonation)would be applied by the speaker to the input text based on how they enunciated the training text. For example, if an Eastern Canadian trained the synthesizer, it would take into account that, usually, if a dipthong comes before a voiceless consonant, its pitch will be slightly higher than it would otherwise.
It also captures subtle personal characteristics, such as the droll way that Tom Baker often treats the last syllable of multisyllabic words when they appear at the end of sentence; giving them a bit of sustain and then a sharp falling tone.
F*cking marvelous!
Posted by Larry Mudd @ 04/10/2006 10:36 AM CST
Yeah I notice some post-processing weirdness with the song lyrics that end with a question mark. The raise the pitch of his voice just a little bit. Sounds like BT when all out for this.
Posted by lowbot @ 04/10/2006 11:38 PM CST