We’re pleased to announce a new release of the CMU ARK Twitter Part-of-Speech
Tagger, version 0.3.
- The new version is much faster (40x) and more accurate (89.2 -> 92.8) than
- We also have released new POS-annotated data, including a dataset of one
tweet for each of 547 days.
- We have made available large-scale word clusters from unlabeled Twitter data
(217k words, 56m tweets, 847m tokens).
Tools, data, and a new technical report describing the release are available at: