2 Getting started

Will change soon

This is work in progress. Passages highlighted in red are likely to change soon.

In this chapter, we show you how to set up the data layer and front-end of the amcat suite.

amcat instance after following this chapter

In the Data Layer section, it is sufficient if you choose one of the sub-sections to follow. We explain how you can run amcat

on our servers, which we recommend for testing purposes;
through a Docker image, which we recommend for most people who want to conduct a research project and/or share data online;
or install amcat directly on your system, which we only recommend for advanced users, who want a customised setup.

In the Frontend section, it makes sense to cover the amcat4client, which provides a react web interface to query data. Then you can select to either install the R or Python client.

2.1 Data Layer

2.1.1 Run on our servers

Coming soon…

2.1.2 Setup through Docker

Why do we use Docker for installation?

Functionally, Docker containers are a cross-platform installation format that makes it trivially easy to install software packages on Linux, MacOS and Windows without needing users to deal with dependency issues or installation quirks on different systems.¹ A side effect is that we can easily develop amcat for all operating systems at once and you can be sure that we do not fall behind on developing amcat for your operating system of choice.

If you have never used Docker before, the first step is to install the infrastructure on your system. Head over to the Docker website to get Docker Desktop or the Docker Engine and Docker Compose to use Docker from the command line. We do not really need Docker Desktop, but it comes with both the Docker Engine and Docker Compose, which makes installation easier if you do not use a package manager (which many Windows and MacOS users do not).

To install the amcat data layer, you should use our Docker Compose file. You can get it from here (save the file as docker-compose.yml).

We tried to write the file to contain sensible defaults for testing it locally. If you plan to work with amcat for a research project and/or plan to update the images in the future, you should have a look at the customization option. Oterwise you can continue.

Customization

The current default docker-compose.yml looks like this:

version: "3.8"
services:
  web_server:
    image: ccsamsterdam/ngincat:4.0.4
    build: ./ngincat
    container_name: ngincat
    restart: unless-stopped
    networks:
      - amcat-net
    environment:
      - amcat4_client=http://amcat4client:3000
      - amcat4_host=http://amcat4:5000/
    ports:
      - 80:80 # [local port]:[container port]
    depends_on:
      - "web_client"
      - "api"
  web_client:
    image: ccsamsterdam/amcat4client:4.0.4
    build: ./amcat4client
    container_name: amcat4client
    restart: unless-stopped
    networks:
      - amcat-net
    environment:
      # this can be changed later, it is just the suggested default
      - amcat4_host=http://localhost/amcat
    depends_on:
      - "api"
  api:
    image: ccsamsterdam/amcat4:4.0.4
    build: ./amcat4
    container_name: amcat4
    restart: unless-stopped
    networks:
      - amcat-net
    environment:
      # note that these take precedence over values set in `amcat4 config``
      - amcat4_elastic_host=elastic7:9200
      - amcat4_host=http://localhost/amcat
    depends_on:
      - "db"
  db:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
    container_name: elastic7
    restart: unless-stopped
    # for security reasons, the database is only exposed to the other containers in the amcat-net network
    # If you want to be able to access it locally, uncomment the following two lines
    # ports:
    # - 9200:9200
    networks:
      - amcat-net
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms4g -Xmx4g"
      - xpack.security.enabled=false
    # your database should have a folder on the host machine to permanently store data
    # run: `mkdir -p /path/to/elastic-data && sudo chown -R 1000:1000 ~/.elasticsearch/database`
    # to make it accessible for docker. Then uncomment the lines below (using a real path)
    # volumes:
    #   - /path/to/elastic-data:/usr/share/elasticsearch/data # [local path]:[container path]

networks:
  amcat-net:

Setting the containers up from R or Python uses the same settings. You can leave most lines as they are, but we want to draw your attention to a couple of settings you might want to change:

In lines 13, 14, we set the port to 80 on the host machine. This means you will be able to access the amcat client without specifying a port (80 is the default port your browser uses to access an address). If the port is already in use, the container will crash. In this case, change 80to a different port and access amcat through, for example, localhost:5000.
In lines 54, 55, 56, we configured Elasticsearch to form a single-node cluster and use a maximum of 4GB memory.
In lines 61, 62 we suggested a setting so Elasticsearch will store your data on a volume on the host machine. If you do not use this, your data will be destroyed when you remove the Elasticsearch container!. We recommend this to make it easier to back up your database and reuse it with a different installation of Elasticsearch (e.g., after an update) in the future. However, the container will not run if it does not have proper access to this folder. See the comment to solve this. Note that the suggested local path is just an example. Learn more about this in the chapter on backups and updates.

To download and start the amcat containers (Elasticsearch, amcat4, and amcat4client) use one of the approaches below:

Start a terminal and navigate to the directory where you downloaded the docker-compose.yml file²:

docker-compose up --pull="missing" -d

Check if the containers are running with:

docker ps
#> CONTAINER ID   IMAGE                                                  COMMAND                  CREATED              STATUS          PORTS                                   NAMES
#> 0628bc852c79   ccsamsterdam/amcat4client:0.0.1                        "nginx -g 'daemon of…"   About a minute ago   Up 58 seconds   0.0.0.0:80->5000/tcp, :::80->5000/tcp   amcat4client
#> 8134bcd1cbe8   ccsamsterdam/amcat4:0.0.1                              "./wait-for-it.sh el…"   About a minute ago   Up 59 seconds   5000/tcp                                amcat4
#> 2d59e128e748   docker.elastic.co/elasticsearch/elasticsearch:7.17.7   "/bin/tini -- /usr/l…"   About a minute ago   Up 59 seconds   9200/tcp, 9300/tcp                      elastic7

# install the required packages first:
# remotes::install_github("JBGruber/dockr")
# remotes::install_github("ccs-amsterdam/amcat4r")
amcat4r::run_amcat_docker()

This pulls downloads the default docker-compose.yml file from the same link as above. If you want to change the file first, just supply the path to the function afterwards:

amcat4r::run_amcat_docker("docker-compose.yml")

Check if the containers are running with:

amcat4r::docker_lc()
#> # A tibble: 3 × 5
#>   name          image                                                status        id                                       ports
#>   <chr>         <chr>                                                <chr>         <chr>                                    <list>
#> 1 /amcat4client ccsamsterdam/amcat4client:0.0.1                      Up 58 seconds 0628bc852c79b88a9047034cc5a03867e1d00fc… <list>
#> 2 /amcat4       ccsamsterdam/amcat4:0.0.1                            Up 59 seconds 8134bcd1cbe82bb6c0cb8beb10df1e3a03ddc8f… <list>
#> 3 /elastic7     docker.elastic.co/elasticsearch/elasticsearch:7.17.7 Up 58 seconds 2d59e128e748fa6ac4f5023945b5b7ab28f7ac2… <list>

# coming soon

If you are using Docker Desktop (which we recommend only for local installations on e.g., Windows), you can also monitor the containers there:

It might take a couple of seconds for Elasticsearch to start up. Then you can navigate to http://localhost/ in your browser to access the amcat client.

First view of the amcat react app in your browser

Before you access your newly created amcat suite, you should first check the settings and change to your needs. You can do this with amcat4 config:

docker exec -it amcat4 amcat4 config

If you choose anything but no_auth for authentification options, you should also add a global admin user via:

docker exec -t amcat4 amcat4 add-admin admin@example.com

# coming soon

# coming soon

You can also create an example data collection (which are called index in Elasticsearch):

docker exec -t amcat4 amcat4 create-test-index

docker_exec(id = "amcat4", "amcat4 create-test-index")

# coming soon

You can now use Middlecat to authenticate as the admin user:

After logging into the amcat react app in your browser

Now you can access the test index at http://localhost/:

2.1.3 Setup on your own server

If you decide not to go with Docker, for example, because you feel you need more control over what is happening, you can also run amcat on your system directly. We do not recommend this anymore and have, in fact, switched our own servers over to use the docker image. If there is something wrong with the images or you simply want to customise the setup, we suggest you head over to the GitHub repo and change the files as you like.

If you still want to go without docker, feel free to use the example configuration below. We assume that if you are going this route, you are running a Linux server. Below we show one example setup. Obviously feel free to replace the suggested Linux tools like systemd or nginx with your own choice.

2.1.3.1 amcat4 – aka amcat server

The first piece to set up is the amcat server and the Elasticsearch database it interacts with. To download and install Elasticsearch, refer to their website, or, preferably, install it through a package manager. For example, if you are running Debian or Ubuntu or another distro which uses apt you can install Elasticsearch 7.x (which we are currently working with) like this:

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update
sudo apt install elasticsearch

In the next step, you need to configure Elasticsearch:

sudo nano /etc/elasticsearch/elasticsearch.yml

Configure the database to your own liking in terms of user management and exposure. Since we are controlling it through amcat4, the only two things that really matter is the address and port (and that amcat4 still has access after you’re done configuring Elasticsearch). So within elasticsearch.yml, we only look for two lines:

network.host: localhost
...
http.port: 9200

You can configure the memory usage of Elasticsearch

echo "-Xms4g" | sudo tee -a /etc/elasticsearch/jvm.options.d/memory.options

Leaving the values at their defaults here, we can enable the systemd service (skip this step if you’ve installed Elasticsearch through Docker):

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

You can check if everything is working with:

curl -X GET 'http://localhost:9200'
#> {
#>   "name" : "amcat-opted-trekdrop0",
#>   "cluster_name" : "amcat-opted",
#>   "cluster_uuid" : "Sx-D89zmSx2zAcwl62u32A",
#>   "version" : {
#>     "number" : "7.17.6",
#>     "build_flavor" : "default",
#>     "build_type" : "deb",
#>     "build_hash" : "f65e9d338dc1d07b642e14a27f338990148ee5b6",
#>     "build_date" : "2022-08-23T11:08:48.893373482Z",
#>     "build_snapshot" : false,
#>     "lucene_version" : "8.11.1",
#>     "minimum_wire_compatibility_version" : "6.8.0",
#>     "minimum_index_compatibility_version" : "6.0.0-beta1"
#>   },
#>   "tagline" : "You Know, for Search"
#> }

Next, you want to setup the amcat server. You can do this wherever you like, but will will set things up at /srv/amcat:

sudo git clone https://github.com/ccs-amsterdam/amcat4 /srv/amcat
sudo chown -R $USER:$USER /srv/amcat
cd /srv/amcat
python3 -m venv env
env/bin/pip install -e .[dev]

To test if it runs as expected, you can use:

env/bin/python -m amcat4 run
#> /srv/amcat/env/lib/python3.9/site-packages/elasticsearch/connection/base.py:200: ElasticsearchWarning: Elasticsearch #> built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See #> https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
#>   warnings.warn(message, category=ElasticsearchWarning)
#> [INFO   :root           ] Starting server at port 5000, debug=True
#> INFO:     Started server process [1001112]
#> [INFO   :uvicorn.error  ] Started server process [1001112]
#> INFO:     Waiting for application startup.
#> [INFO   :uvicorn.error  ] Waiting for application startup.
#> INFO:     Application startup complete.
#> [INFO   :uvicorn.error  ] Application startup complete.
#> INFO:     Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
#> [INFO   :uvicorn.error  ] Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)

To check and adapt the settings of amcat use:

env/bin/python -m amcat4 config

Since you probably don’t want to run amcat in an open ssh tab all the time, you should set it up as a service, for example with systemd. So head over to /etc/systemd/system and create a new file, for example, amcat.service.

Here is a small example to set things up:

[Unit]
Description=Amcat4 API
After=network.target
Requires=elasticsearch.service

[Service]
Type=simple
User=amcat
Group=amcat
WorkingDirectory=/srv/amcat/amcat4

Environment=AMCAT4_ELASTIC_HOST=http://localhost:9200
Environment=AMCAT4_DB_NAME=/srv/amcat/amcat4.db

ExecStart=/srv/amcat/env/bin/uvicorn \
        --proxy-headers \
        --forwarded-allow-ips='*' \
        --workers=2 \
        --no-access-log \
        --uds /tmp/amcat.socket \
        --root-path /api \
        amcat4.api:app

ExecReload=/bin/kill -HUP ${MAINPID}
RestartSec=1
Restart=always

[Install]
WantedBy=multi-user.target

In the above service, we run the amcat server as the user amcat. To create this user and hand over the ownership of the amcat server folder to it use:

sudo useradd amcat
sudo chown -R amcat:amcat /srv/amcat

Then you can start the service and enable it to run on startup:

sudo systemctl daemon-reload
sudo systemctl start amcat.service
sudo systemctl enable amcat.service

Now you can check if everything is working with

systemctl status amcat.service
● amcat.service - Amcat4 API
#>      Loaded: loaded (/etc/systemd/system/amcat.service; enabled; vendor preset: enabled)
#>      Active: active (running) since Thu 2022-11-03 10:39:23 CET; 3min 29s ago
#>    Main PID: 197173 (uvicorn)
#>       Tasks: 4 (limit: 33532)
#>      Memory: 86.8M
#>         CPU: 1.770s
#>      CGroup: /system.slice/amcat.service
#>              ├─197173 /srv/amcat/env/bin/python3 /srv/amcat/env/bin/uvicorn --proxy-headers --forwarded-allow-ips=* --workers=2 --no-access-log --uds /tmp/amcat.socket --root-path /api amcat4.api:app
#>              ├─197174 /srv/amcat/env/bin/python3 -c from multiprocessing.resource_tracker import main;main(4)
#>              ├─197175 /srv/amcat/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
#>              └─197176 /srv/amcat/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=9) --multiprocessing-fork

If something went wrong, you can troubleshoot with sudo journalctl -eu amcat.service.

2.2 Frontend

2.2.1 amcat4client

Note

If you are using amcat on our servers or through docker, you can skip this section and move on to install an API client to start managing amcat from either the R or Python .

If you have checked port 5000 of your new amcat server while testing it above (i.e., http://0.0.0.0:5000), you were probably disappointed by a simple {"detail":"Not Found"} message. This is because the client has been split from the main package to make it easier to develop. You can install the React client next to amcat in/srv/amcat4client using:

cd /srv/
sudo git clone https://github.com/ccs-amsterdam/amcat4client.git
sudo chown -R $USER: amcat4client
cd amcat4client
npm install

If you get error messages about outdated versions of dependencies (which is likely on Ubuntu and Debian) you should update Node.js. On Debian, you can do this likes so:

su
curl -fsSL https://deb.nodesource.com/setup_19.x | bash - &&\
apt-get install -y nodejs
exit

And the equivalent on Ubuntu:

curl -fsSL https://deb.nodesource.com/setup_19.x | sudo -E bash - &&\
sudo apt-get install -y nodejs

See this repository for instructions for other Linux flavours.

After that, you can build the React app:

npm run build

If your amcat instance will be publicly reachable, you can build the React app permanently attached to only your instance of amcat:

REACT_APP_FIXED_HOST=https://example.com/api npm run build

Once this has finished, you should hand over ownership of the React application to the previously created amcat user

sudo chown -R amcat:amcat .

Now we have an Elasticsearch and amcat4 running. But they are currently not accesible. To solve this, we can use, for example, nginx to provide users access to the React frontend and the amcat API. Create a new nginx config file with, for example, nano:

sudo nano /etc/nginx/sites-available/amcat.conf

Below is a minimal example of the amcat.conf file, which you can copy and paste: For more information, visit the uvicorn documentation website.

server {
    client_max_body_size 4G;

    listen 5000;

    location /api/ {
      rewrite  ^/api/(.*) /$1 break;
      proxy_set_header Host $http_host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;
      proxy_redirect off;
      proxy_buffering off;
      proxy_pass http://amcat;
    }

    location / {
      root /srv/amcat4client/build;
      index index.html;
      try_files $uri $uri/ /index.html;
    }

}

map $http_upgrade $connection_upgrade {
    default upgrade;
    '' close;
}

upstream amcat {
    server unix:/tmp/amcat.socket;
}

warning about http

This setup assumes that your amcat sever will only be available in the local network. If it should be accesible via the internet, we strongly recommend to enable https. You can find more information about that on the nginx website or this guide.

To enable the site, use:

sudo ln -s /etc/nginx/sites-available/amcat.conf /etc/nginx/sites-enabled/amcat.conf

Then simply restart nginx, for example, through systemd:

sudo systemctl restart nginx.service

To test if the API is reachable, use this:

curl http://localhost:5000/api/
#> {"detail":"Not Found"}

If everything works, you can now access the client at http://localhost:5000, or the address of your server, if you installed amcat remotely:

The React app is always running locally in your browser, even if you’ve accessed it on another computer. So the appropriate host needs to be the route to the amcat server. In the example above, I set up an amcat instance in my local network on a computer with the IP address 192.168.2.180 and port 5000. To access that host, you need to enter:

Note

Host: “http://192.168.2.180:5000/api”

Email: “admin”

Password: “admin”

Just replace 192.168.2.180 with the address of the machine you set up amcat on.

Success! However, the interface doesn’t show much at this point, since we added no data yet. We will do that in the storage chapter.

2.2.2 API Client

The R client is called amcat4r and can be installed via the following command in R (install remotes first if you don’t have it yet):

remotes::install_github("ccs-amsterdam/amcat4r")

If you have set up the amcat suite as shown above, you should be able to log into the database:

library(amcat4r)
login("http://localhost/api",  username = "admin", password = "supergeheim")

If this does not throw an error, you have set everything up correctly.

Install amcat4py from the command line through pip:

pip install amcat4py

The you can open Python and log in:

from amcat4py import AmcatClient

amcat = AmcatClient("http://localhost/amcat")

If this does not throw an error, you have set everything up correctly.

Technically, it is a little more complicated, as Docker containers have many similarities to virtual machines. However, for most users that technical background is not really important. If you want to learn more, have a look here.↩︎
The yaml file is written for Docker Compose V2. If you are having trouble, check your version with docker-compose --version. Get the newest version as described here.↩︎