tokenizers.bpe: Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <> which is an implementation of fast Byte Pair Encoding (BPE) <>.

Version: 0.1.0
Depends: R (≥ 2.10)
Imports: Rcpp (≥ 0.11.5)
LinkingTo: Rcpp
Published: 2019-08-02
Author: Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))
Maintainer: Jan Wijffels <jwijffels at>
License: MPL-2.0
NeedsCompilation: yes
Materials: README
In views: NaturalLanguageProcessing
CRAN checks: tokenizers.bpe results


Reference manual: tokenizers.bpe.pdf
Package source: tokenizers.bpe_0.1.0.tar.gz
Windows binaries: r-devel:, r-devel-gcc8:, r-release:, r-oldrel:
OS X binaries: r-release: tokenizers.bpe_0.1.0.tgz, r-oldrel: tokenizers.bpe_0.1.0.tgz


Please use the canonical form to link to this page.