1 Why amcat
1.1 What is amcat?
The amcat
-suite consists of several packages for text analysis. It has two main goals: to standardize text analysis tasks with easy to use software, while offering quality-of-life features for power users. It consists of several different software packages, which are usually used together:
amcat4
takes care of document storage, provides fine-grained data access control (e.g., to restrict access to the parts of a dataset which are copyrighted, proprietary or (privacy-)sensitive) and supports fast queries using Elasticsearchmiddlecat
provides authentication methods to support the fine-grained data access control built intoamcat4
(e.g., to make datasets available for which data owners have restricted full text access)amcat4client
offers a user interface, which makes it easy to query documents fromamcat4
via a web interface, share data with collaborators or the public and present your corpora to stakeholders, the community or the publicamcat4apiclient
provides bindings to manage and query corpora from thePython
programming language viaamcat4
’s REST APIamcat4r
provides bindings to manage and query corpora from theR
programming language viaamcat4
’s REST API
These core packages can be extended by powerful addons which provide additional features:
annotinder
which let’s you manually annotate documents with an appealing web interface (which also looks great on mobile!) and the possibility to deploy it to the web. There is also an R client!nlpipe
can be used for advanced document pre-processing and machine learning tasks. You can’t share your full text? How about lettingnlpipe
apply word embeddings on your corpus and share the embedding instead of the full text!
You can use many of these packages individually or you create a full setup, which would look something like this:
It might seem like this is overly complicated, given that all of the features are also available in other software packages, some of which you will also be familiar with already. However, the main reason that functions are split between different software modules is to make development easier and more transparent. If you are only interested in amcat’s capabilities, you can use it on our servers or conveniently install everything at once through docker.
If you want to learn more about the project, have a look at the about chapter.