Count-vector-based distributional semantic approaches
todo ah ah ah
Download some data:
./download_data.sh
Count the words!
./count.sh data/text8 | head -n 20
1061396 the
593677 of
416629 and
411764 one
372201 in
325873 a
316376 to
264975 zero
250430 nine
192644 two
183153 is
131815 as
125285 eight
118445 for
116710 s
115789 five
114775 three
112807 was
111831 by
109510 that
ah ah ah