Messenger - Vol. 2, No. 3, Page 13 Summer 1993 On Research Transitional sound improves synthetic voice World-renowned physicist Stephen William Hawking, like many people with amyotropic lateral sclerosis, a progressive disease of the nervous system, uses a special device to communicate in a synthetic voice. In the past, unfortunately, people like Hawking have been forced to choose between a pleasant, pre-recorded voice with a very limited vocabulary or a jarring, computerized voice that could say anything. To address this problem, researchers at the Center for Applied Science and Engineering in Rehabilitation, a joint program of the University of Delaware and the A.I. du Pont Institute, developed a way to create natural-sounding voices that don't limit what the speaker can say. The new technology works so well, in fact, that it recently was licensed to ACS Technologies of Pittsburgh, Pa., and Echo Speech Products of Carpinteria, Calif., who plan to market it to the public. Thanks to the efforts of center researchers and the University's patent and research office, improved synthetic voices will soon be available for men, women and children who speak either English or Spanish, reports center director Richard Foulds, who also is a research professor in the Department of Computer and Information Sciences. How does the new speech synthesis technology work? Traditionally, Foulds explains, "natural-sounding" synthetic voices have been based on audio recordings of words that can be strung together in a limited number of sentences. Telephone companies often use such recordings to let a caller know, for instance, when a number has been changed. More intelligent voice systems use synthetic versions of the sounds of the 44 phonemes in the English language. A phoneme is a class of closely related speech sounds that are represented linguistically by the same symbol. By typing each phoneme of a word, such as c-a-t for "cat," a speaker can say anything. Synthetic speech based on phonemes is versatile, Foulds points out, but it's so hard on the ears that it can create emotional barriers to one-on-one communication. This is because human speech actually flows from one sound to the next. Between two phonemes like "ee" and "oo," there is a transitional sound known to researchers as a diphone. "Saying that each of the 44 phonemes is completely separate from each other is like saying that dance is a bunch of static positions," Foulds explains. "Obviously, what makes dance beautiful is the flowing of one movement into the next. Speech is the same way." Using diphones instead of phonemes, center researchers produced natural-sounding voices without compromising vocabulary. Diphones were first extracted from audio recordings, then "digitized" or translated to an algorithm that could be recognized by a computer. In the future, Foulds says, researchers would like to generate a more diverse menu of voices, as well as individualized voices for those with amyotropic lateral sclerosis and others who slowly lose their speech. The center is also developing a method to let non-speakers control the tone and inflection of a synthetic voice-expressing anger, perhaps, by typing in red letters or underlining certain passages. Better synthetic speech is just one of the successful technologies produced at the center, which is housed at A.I. du Pont Institute in Wilmington, Del. Also under way are computer programs that speed the rate of synthetic speech by 30 to 40 percent, camera systems that some day could help a non-speaker type by simply gazing at a keyboard and "intuitive" robots and programs to transmit sign language pictures over standard telephone lines. With private, state and federal support, the center is removing barriers that prevent individuals with disabilities from achieving their full potential. -Ginger Pinholster