We’re pleased to announce a new release of the CMU ARK Twitter Part-of-Speech
Tagger, version 0.3.
- The new version is much faster (40x) and more accurate (89.2 -> 92.8) than
before. - We also have released new POS-annotated data, including a dataset of one
tweet for each of 547 days. - We have made available large-scale word clusters from unlabeled Twitter data
(217k words, 56m tweets, 847m tokens).
Tools, data, and a new technical report describing the release are available at:
www.ark.cs.cmu.edu/TweetNLP.
0100100 a 1111100101110 111100000011, Brendan