The Institute of Statistical Mathematics, Tokyo

$Id: index.html,v 1.3 2021/10/24 03:03:13 daichi Exp $

bhmm.py is a simple Python implementation of Bayesian (discrete) hidden Markov model
(HMM).

It is written basically for educational and research purposes, and implements
standard forward filtering-backward sampling (Bayesian version of forward-backward
algorithm, Scott (2002)) as well as Gibbs sampling in Python.

- bhmm.py-0.3.tar.gz
- Added "-G" option to indicate Gibbs sampling (not using FFBS).

If you need to run it by Python 2.x, please use bhmm.py-0.1.tar.gz.

- bhmm.py (main driver)
- hmm.py (Bayesian HMM module)
- alice.{dat,txt,lex} (sample data)
- alice.{zz,emit,trans} (sample output of HMM)
- flatten.awk (sample script)
- index.html (this file)

% ./bhmm.py -h bhmm.py: Bayesian HMM in Python. usage: % bhmm.py OPTIONS train model OPTIONS -G use Gibbs sampling instead of dynamic programming -K states number of states in HMM -N iters number of MCMC iterations -a alpha Dirichlet hyperparameter on transitions (default 0.1) -b beta Dirichlet hyperparameter on emissions (default 0.01) -h displays this help $Id: bhmm.py,v 1.8 2021/10/24 02:02:42 daichi Exp $train is a data file: each line consists of a sequence of integers from 0, which usually corresponds to a sequence of word id in a sentence.

model is the basename of output of learned parameters.

**model**.zz- Learned hidden states at the end of MCMC iterations.
**model**.trans- (K+1)x(K+1) state transition matrix. K is the number of hidden states. Note that state (K+1) is a special state that corresponds to BOS/EOS.
**model**.emit- VxK emission matrix for p(v|k). V is the number of words in vocabulary.

% bhmm.py -K 7 -N 400 alice.dat alice % paste =(flatten.awk alice.zz) =(flatten.awk alice.txt) > alice.states % for n in `seq 0 6`; do echo $n awk "\$1==$n" alice.states | sort | uniq -c | sort -nr | head -25 | awk '{printf("%4s %s\n",$1,$3)}' > alice.$n done

state 0 | state 1 | state 2 | state 3 | state 4 | state 5 | state 6 |
---|---|---|---|---|---|---|

1642 the 631 a 162 her 96 his 63 this 59 no 58 my 57 its 57 an 49 their 40 very 37 your 37 some 37 one 31 any 20 another 14 that 12 every 12 alice's 10 great 8 which 8 these 8 our 8 four 7 those |
678 and 257 as 170 but 150 that 115 so 96 if 82 what 79 when 75 then 70 or 54 i'm 51 how 50 for 48 of 47 well 45 it's 41 oh 40 why 38 before 34 that's 33 now 32 two 31 no 25 up 25 round |
729 to 466 of 362 in 341 said 212 at 180 with 168 on 103 for 72 about 70 like 67 into 65 all 58 by 51 not 50 quite 50 down 46 be 40 very 40 over 36 from 35 after 34 such 31 looking 26 without 26 upon |
357 was 178 had 121 said 94 is 85 were 83 went 77 could 65 would 62 know 61 don't 56 thought 56 began 48 did 45 looked 45 got 44 must 43 are 40 came 38 have 38 all 37 never 35 can 30 think 29 replied 28 might |
121 little 68 queen 61 king 58 time 57 turtle 56 way 56 mock 55 hatter 55 gryphon 49 head 48 voice 47 rabbit 43 mouse 40 other 39 dormouse 38 duchess 37 tone 35 cat 34 thing 33 very 33 march 31 hare 30 white 30 door 29 moment |
541 she 410 i 277 it 261 you 193 and 161 alice 130 they 119 he 75 there 57 who 53 that 48 what 31 i've 30 we 30 i'll 28 very 25 your 24 which 20 this 19 then 19 not 17 how 12 it's 10 you'd 10 they're |
253 it 225 alice 117 out 102 be 88 you 88 them 86 her 83 herself 83 again 79 all 75 up 73 off 68 me 63 that 63 not 59 see 51 this 50 one 50 go 47 down 46 come 44 much 44 just 43 him 42 have |

daichi<at>ism.ac.jp Last modified: Sun Oct 24 12:04:28 2021