Recommended link/lecture (extremely well-written)
1. ASR with HMM
John-Paul Hosom, CSLU, Oregon Health & Science Univ, click here
2. Statistical Data Mining tutorials:
– Clustering with Gaussian Mixtures
– Probability Densities in Data Mining
– Learning Gaussian Bayes Classifiers
– Learning with Maximum Likelihood
Andrew W. Moore, CS, CMU, click here
Bedanya Lexicon, Dictionary, Grammar, dan Language Model (LM)?
Disarikan dari sini, sini, sini, sini
Untuk Continuous Digit Recognition (CDR), menurut dokumen OGI CSLU Toolkit Package,
Grammar: finite-state task grammar (word network)
Untuk LVCSR (Large Vocab Continuous Speech Recognition),
– Task Grammar –> Language Model: making into utterance
(Task Grammar is a simple kind of Language Model, for much simpler task, e.g., CDR)
– Dictionary –> Lexicon (pronunciation): making into word
Udah telat banget sih, tapi mencoba mencatat error yang ditemui selama troubleshooting.
Sekalian nyoba bikin tabel, tehe 😀
Kumpulan Error di HTK
||CreateInsts: Unknown label q
||Phoneme q yang ditemui pada data tidak ditemui pada list phoneme
||GetChkedInt: Integer Arg Required for m option
||Saat passing argumen ke HInit, opsi -m tidak diikuti integer apapun
||HInit: Too Few Observation Sequences 
||Jumlah observation sequences (jumlah utterance) kurang. Ketika dicek, minimum utterance adalah 3
Recognizer = to effect a mapping between sequences of speech vectors and the wanted underlying symbol sequences.
Two problems make this very difficult.
Firstly, the mapping from symbols to speech is not one-to-one since different underlying symbols can give rise to similarspeech sounds. Furthermore, there are large variations in the realised speech waveform due tospeaker variability, mood, environment, etc.
Secondly, the boundaries between symbols cannot be identified explicitly from the speech waveform. Hence, it is not possible to treat the speechwaveform as a sequence of concatenated static patterns.
Isolated word recognition
Objective: To overcome problem of not knowing the word boundary locations
Means: the speech waveform corresponds to a single underlying symbol (e.g. word) chosen from a fixed vocabulary.
Limitation: this simpler problem is somewhat artificial, real life: continuous speech case.
The isolated word recognition problem can thenbe regarded as that of computing
How to compute probability: Bayes Rule:
P(wi|O) = (P(O|wi)P(wi)) /P(O)
prior probabilities P(wi),