Speech recognition with hidden markov model (HMM)

Recommended link/lecture (extremely well-written)
1. ASR with HMM
John-Paul Hosom, CSLU, Oregon Health & Science Univ, click here

2. Statistical Data Mining tutorials:
– Clustering with Gaussian Mixtures
– Probability Densities in Data Mining
– Learning Gaussian Bayes Classifiers
– Learning with Maximum Likelihood
– others
Andrew W. Moore, CS, CMU, click here


Terminologi HMM

Bedanya Lexicon, Dictionary, Grammar, dan Language Model (LM)?

Disarikan dari sinisini, sini, sini

Untuk Continuous Digit Recognition (CDR), menurut dokumen OGI CSLU Toolkit Package,
Grammar: finite-state task grammar (word network)
Lexicon: pronunciation/dictionary

Untuk LVCSR (Large Vocab Continuous Speech Recognition),
– Task Grammar –> Language Model: making into utterance
(Task Grammar is a simple kind of Language Model, for much simpler task, e.g., CDR)
– Dictionary –> Lexicon (pronunciation): making into word

Kumpulan Error di HTK

Udah telat banget sih, tapi mencoba mencatat error yang ditemui selama troubleshooting.
Sekalian nyoba bikin tabel, tehe 😀

Kumpulan Error di HTK

Error Penjelasan Analisis
+7321 CreateInsts: Unknown label q Phoneme q yang ditemui pada data tidak ditemui pada list phoneme
+5021 GetChkedInt: Integer Arg Required for m option Saat passing argumen ke HInit, opsi -m tidak diikuti integer apapun
+2121 HInit: Too Few Observation Sequences [2] Jumlah observation sequences (jumlah utterance) kurang. Ketika dicek, minimum utterance adalah 3

The fundamentals of HTK

Recognizer = to effect a mapping between sequences of speech vectors and the wanted underlying symbol sequences.

Two problems make this very difficult.
Firstly, the mapping from symbols to speech is not one-to-one since different underlying symbols can give rise to similarspeech sounds. Furthermore, there are large variations in the realised speech waveform due tospeaker variability, mood, environment, etc.
Secondly, the boundaries between symbols cannot be identified explicitly from the speech waveform. Hence, it is not possible to treat the speechwaveform as a sequence of concatenated static patterns.

Isolated word recognition

Objective: To overcome problem of not knowing the word boundary locations

Means: the speech waveform corresponds to a single underlying symbol (e.g. word) chosen from a fixed vocabulary.
Limitation: this simpler problem is somewhat artificial, real life: continuous speech case.

The isolated word recognition problem can thenbe regarded as that of computing

arg max{P(wi|O)}

How to compute probability: Bayes Rule:
P(wi|O) = (P(O|wi)P(wi))   /P(O)

prior probabilities P(wi),