Name:: BanglaSR
Summary::
This projects aims to develop a Speech Recognizer that can understand the spoken units (characters, words and sentences) uttered in the context of Bangla Language. We used Hidden Markov Model (HMM) technique for pattern classification and also incorporate stochastic language model with the recognizer. Hidden Markov Model Toolkit (HTK) is used to develop and implement the Speech Recognizer.
Details::
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. Research in this area has attracted a great deal of attention over the past five decades where several technologies are applied and the efforts were made to increase the performance up to marketplace standard so that the users will have the benefit in a variety of ways. During this long research period several key technologies were applied where the combination of hidden Markov Model (HMM) and the stochastic language model produces high performance. Till now, most of the research effort on recognizing Bangla Speech is performed using the ANN based classifier. No research work has been reported yet that uses the DTW technique and HMM based classifier and no language model is included with the existing research works.
The area of Automatic Speech Recognition (ASR) is classified into Isolated speech recognition (ISR) and Continuous speech recognition (CSR). An ISR system requires that the speaker pause briefly between words, whereas a CSR system does not. For Isolated word the assumption is that the speech to be recognized comprised a single word or phase and to be recognized as complete entity with no explicit knowledge or regard for the phonetic content of the word or phase. The notion of ISR can be extended for connected speech recognition if we consider a small vocabulary and solve the co-articulation problem that arises between words. In continuous speech recognition, continuously uttered sentences are recognized. In CSR it is very important to use sophisticated linguistic knowledge. The most appropriate units for enabling recognition success depend on the type of recognition and on the size of the vocabulary. Various units of reference templates/models from phonemes to words have been studied. When words are used as units, word recognition can be expected to be highly accurate; however it requires larger memory and more computation. Using phonemes as units does not greatly increase memory size requirements and also computation. In our research project we used word as a unit for ISR and phoneme as a unit for CSR.
The project goal of Speech Recognition system for Bangla language is to enhance the interaction between users and computer. This will help to overcome the literacy barrier, and hence encourage the users to use the technology through interactive voice response. The outcome of this project will be usable for command & control and different data entry applications.
Team::
Status::
· Implemented version of Isolated Speech Recognizer is ready to release.
· Prototype version of Continuous Speech Recognizer is implemented, however the experiments on training procedure and language models are continuing.
Research Scope::
· Experiment on Training issues for Continuous Speech Recognition.
· Experiment on Language Models for Continuous Speech Recognition.
· Experiment with other available techniques and tools.
· Move towards Audio Visual Speech Recognition.
Development Scope::
· Implement the existing developed versions using different language.
· Implement speech recognizer for specific domain applications.
Timeline:: Not Defined.