udpipe: Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Version: 0.8.11
Depends: R (≥ 2.10)
Imports: Rcpp (≥ 0.11.5), data.table (≥ 1.9.6), Matrix, methods, stats
LinkingTo: Rcpp
Suggests: knitr, rmarkdown, topicmodels, lattice, parallel
Published: 2023-01-06
DOI: 10.32614/CRAN.package.udpipe
Author: Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
Maintainer: Jan Wijffels <jwijffels at bnosac.be>
License: MPL-2.0
URL: https://bnosac.github.io/udpipe/en/index.html, https://github.com/bnosac/udpipe
NeedsCompilation: yes
Materials: README NEWS
In views: NaturalLanguageProcessing
CRAN checks: udpipe results

Documentation:

Reference manual: udpipe.pdf
Vignettes: UDPipe Natural Language Processing - Annotating text
UDPipe Natural Language Processing - Parallel
UDPipe Natural Language Processing - Model Building
UDPipe Natural Language Processing - Try it out
UDPipe Natural Language Processing - Universe
UDPipe Natural Language Processing - Basic Analytical Use Cases
UDPipe Natural Language Processing - Topic Modelling Use Cases

Downloads:

Package source: udpipe_0.8.11.tar.gz
Windows binaries: r-devel: udpipe_0.8.11.zip, r-release: udpipe_0.8.11.zip, r-oldrel: udpipe_0.8.11.zip
macOS binaries: r-release (arm64): udpipe_0.8.11.tgz, r-oldrel (arm64): udpipe_0.8.11.tgz, r-release (x86_64): udpipe_0.8.11.tgz, r-oldrel (x86_64): udpipe_0.8.11.tgz
Old sources: udpipe archive

Reverse dependencies:

Reverse imports: cleanNLP, corpustools, finnsurveytext, MadanText, MadanTextNetwork, TextForecast
Reverse suggests: aifeducation, BTM, crfsuite, doc2vec, nametagger, pseudobibeR, ruimtehol, text2vec, textplot, textrank, textrecipes, topicmodels.etm, word2vec
Reverse enhances: NLP

Linking:

Please use the canonical form https://CRAN.R-project.org/package=udpipe to link to this page.