Mechatronics and Controls lab - Part A

Ulugbek Akhmedo

My main focus was in identifying, installing, and running open source software for human voice synthesis and recognition. While voice synthesis is pretty straightforward voice recognitions has a lot of issues.  

Voice Synthesis involves coding (Festival open source:

http://www.cstr.ed.ac.uk/projects/festival/  and http://www.festvox.org/flite/) along with extensive, sometimes massive, data files that are collection of various voice patterns typical to the pronunciation and accent of different English speaking male and female speakers. Other languages are also available (German, Russian, French, Spanish etc.) and additional resources allow to submit a small list of typed words that would be coded by an online software that returns a few files: text file that can be used to include into Festival, and some other files. I am still working on testing this.

Voice recognition is full of problems. One item that is required for proper experimentation is a quiet room; second, a native English speaker would be a better option for experimentation. This, however, may still be an issue later if a different speaker speaks to the robot, it may have hard time understating the commands. Another issue is that even in ideal conditions recognition is less than 80%. Typically it is lower than 50% for an inexperienced speaker. Open source software that we will try to use is Open Ears (http://www.politepix.com/openears/) that also requires massive data files to compare the incoming sound patterns with the previously recognized patterns. It typically compares form 20,000 to 300,000 variations of pronunciation of combinations of sounds. The process can takes from a few seconds to a few minutes to produce a text out of pronounced words.

There is another option that people have tried to experiment. The personal assistant program called JARVIS (modeled after AI Jarvis from "Iron Man.") It is a free program available on multiple platforms and can be freely developed (specifically profile, voice commands, responses, and more in depth coding of programs that would work in the background). Projects that people attempted to do involved home automation where voice commands would turn on and off the lights, electronic equipment, electronic locks, and perform various tasks on a computer from checking the weather or a calendar to creating files and typing the dictated text. With proper coding, software support, and porting to the cloud, it could potentially grow into a very decent personal assistant that resembles AI. I have been trying to expand the list of available Jarvis commands. With a certain level of success, I was able to make it open and close the internet as well as to give appropriate answers to my questions. Jarvis is not AI, but the background code can be sophisticated enough that it will mimic AI.

Overall, I leaned a lot about the ways to manipulate sounds and use Linux/Ubuntu to recognize and synthesize human voice command.

Further development: I have to search for engineering or most likely computer science articles that may involve similar research using similar or different tools. Additionally, we have to make sure that the software functions properly and develop a more user friendly environment. And finally, we should be trying to link the commands for gestures with the voice reproduction. Voice recognition would be the next level of research if the above mentioned is successfully accomplished.