Studying and Simulating Human abilities in Speech Recognition
Motivation
Current speech recognition systems usually fail to work in real environments. While the main body of speech community is working on more complex signal processing methods (Wavelet, recognizing and canceling reverberation noise, ...), we believe that a fuzzy method must be able to solve the problem. Speech coding has shown that a few bits are sufficient to code speech (e.g. MELP). This implies that the current features (E.g. Filter bank coefficients) are sufficient as much as intelligibility is concerned. On the other hand, current fuzzy methods could not improve the recognition rate. So we are trying to simultaneously tackle the speech recognition problem and proposing a new fuzzy recognition method.
Summary
In this project we want to make a system based on these notions:
  1. Instance based learning. we are not to normalize features in order to eliminate the differences between men and women. instead we try to learn each type of speaking separately.
  2. Fuzzy/Possibility theory. our experiments shows that phonemes are fuzzy set. In other terms, there is no difference between some instances of two phonemes. It is the context that determines which phoneme is correct. In this approach we try to have better modeling of phonemes.
  3. Unsupervised and supervised learning. The places of using these methods is exactly the inverse of that in HMM. Speech segmentation is to be performed using unsupervised methods, whilst features are weighted using a supervised method).
  4. Be precise, but not more than humans. speech coding has shown that a few bits are sufficient to transmit speech signals as much as intelligibility is concerned.
People

Advisor: Saeed Bagheri Shouraki  mail: (bagheri@sharif.edu)

Advisor: Hossein Sameti mail:(sameti@sharif.edu)

Student: Sayed kamal aldin ghiathi mail:(ghiathi@mehr.sharif.edu)

Publications:
   
Presentations:
Studying and simulating humans' abilities in speech recognition
  What is not meant by fuzzy
Speech segmentation
References:
1

Zadeh, L. A., "Fuzzy sets as a basis for a theory of possibility", Fuzzy Sets and Systems 1(1), 3-28. 1978.

2

 Dubois D., Prade H., Possibility theory. New York, London. 1988.

3

Hermansky H. ,"Should recognizers have ears?". In Proc. ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 1-10, France 1997.

4

Young S., J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK BOOK, HTK 2.1 Manual, 1997

5

HA-JIN YU, YUNG HWAN OH. "Fuzzy Expert System for Continuous Speech Recognition". Expert Systems With Applications, Vol. 9. No. 1, pp. 81-89, 1995.

6

Mori R., Computer models of speech using fuzzy algorithms, New York: Plenum Press, 1983.

7

Oppizzi O., Fournier D., Gilles P., Meloni H. "A fuzzy acoustic-phonetic decoder for speech recognition". Proc. ICSLP '96

8

Yoneyama K., " Segmentation Strategies For Spoken Language Recognition: Evidence From Semi-Bilingual Japanese Speakers Of English". Proc. ICSLP, 1996.

9

Deller, J. R., Proakis J. G., Hansen J. H. L., Discrete-Time Processing of Speech Signals. Macmillan publishing company. page 409, 1993.

10 G. Shafer, A Mathematical Theory Of Evidence (Princeton University Press, Princeton, NJ, 1976).
Links to other people working on similar ideas:
Steven Greenberg. Berkeley university.
Research Group on Artificial Intelligence, Attila University, Szeged, Hungary.
SPREX Co.
Mark Steedman, university of Edinburgh.