version: "3.8"
services:
web_server:
image: ccsamsterdam/ngincat:4.0.4
build: ./ngincat
container_name: ngincat
restart: unless-stopped
networks:
- amcat-net
environment:
- amcat4_client=http://amcat4client:3000
- amcat4_host=http://amcat4:5000/
ports:
- 80:80 # [local port]:[container port]
depends_on:
- "web_client"
- "api"
web_client:
image: ccsamsterdam/amcat4client:4.0.4
build: ./amcat4client
container_name: amcat4client
restart: unless-stopped
networks:
- amcat-net
environment:
# this can be changed later, it is just the suggested default
- amcat4_host=http://localhost/amcat
depends_on:
- "api"
api:
image: ccsamsterdam/amcat4:4.0.4
build: ./amcat4
container_name: amcat4
restart: unless-stopped
networks:
- amcat-net
environment:
# note that these take precedence over values set in `amcat4 config``
- amcat4_elastic_host=elastic7:9200
- amcat4_host=http://localhost/amcat
depends_on:
- "db"
db:
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
container_name: elastic7
restart: unless-stopped
# for security reasons, the database is only exposed to the other containers in the amcat-net network
# If you want to be able to access it locally, uncomment the following two lines
# ports:
# - 9200:9200
networks:
- amcat-net
environment:
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms4g -Xmx4g"
- xpack.security.enabled=false
# your database should have a folder on the host machine to permanently store data
# run: `mkdir -p /path/to/elastic-data && sudo chown -R 1000:1000 ~/.elasticsearch/database`
# to make it accessible for docker. Then uncomment the lines below (using a real path)
# volumes:
# - /path/to/elastic-data:/usr/share/elasticsearch/data # [local path]:[container path]
networks:
amcat-net:
2 Getting started
In this chapter, we show you how to set up the data layer and front-end of the amcat suite.
In the Data Layer section, it is sufficient if you choose one of the sub-sections to follow. We explain how you can run amcat
- on our servers, which we recommend for testing purposes;
- through a Docker image, which we recommend for most people who want to conduct a research project and/or share data online;
- or install amcat directly on your system, which we only recommend for advanced users, who want a customised setup.
In the Frontend section, it makes sense to cover the amcat4client, which provides a react web interface to query data. Then you can select to either install the R or Python client.
2.1 Data Layer
2.1.1 Run on our servers
2.1.2 Setup through Docker
If you have never used Docker before, the first step is to install the infrastructure on your system. Head over to the Docker website to get Docker Desktop or the Docker Engine and Docker Compose to use Docker from the command line. We do not really need Docker Desktop, but it comes with both the Docker Engine and Docker Compose, which makes installation easier if you do not use a package manager (which many Windows and MacOS users do not).
To install the amcat data layer, you should use our Docker Compose file. You can get it from here (save the file as docker-compose.yml
).
We tried to write the file to contain sensible defaults for testing it locally. If you plan to work with amcat for a research project and/or plan to update the images in the future, you should have a look at the customization option. Oterwise you can continue.
To download and start the amcat containers (Elasticsearch, amcat4, and amcat4client) use one of the approaches below:
Start a terminal and navigate to the directory where you downloaded the docker-compose.yml file2:
docker-compose up --pull="missing" -d
Check if the containers are running with:
docker ps
#> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
#> 0628bc852c79 ccsamsterdam/amcat4client:0.0.1 "nginx -g 'daemon of…" About a minute ago Up 58 seconds 0.0.0.0:80->5000/tcp, :::80->5000/tcp amcat4client
#> 8134bcd1cbe8 ccsamsterdam/amcat4:0.0.1 "./wait-for-it.sh el…" About a minute ago Up 59 seconds 5000/tcp amcat4
#> 2d59e128e748 docker.elastic.co/elasticsearch/elasticsearch:7.17.7 "/bin/tini -- /usr/l…" About a minute ago Up 59 seconds 9200/tcp, 9300/tcp elastic7
# install the required packages first:
# remotes::install_github("JBGruber/dockr")
# remotes::install_github("ccs-amsterdam/amcat4r")
::run_amcat_docker() amcat4r
This pulls downloads the default docker-compose.yml
file from the same link as above. If you want to change the file first, just supply the path to the function afterwards:
::run_amcat_docker("docker-compose.yml") amcat4r
Check if the containers are running with:
::docker_lc()
amcat4r#> # A tibble: 3 × 5
#> name image status id ports
#> <chr> <chr> <chr> <chr> <list>
#> 1 /amcat4client ccsamsterdam/amcat4client:0.0.1 Up 58 seconds 0628bc852c79b88a9047034cc5a03867e1d00fc… <list>
#> 2 /amcat4 ccsamsterdam/amcat4:0.0.1 Up 59 seconds 8134bcd1cbe82bb6c0cb8beb10df1e3a03ddc8f… <list>
#> 3 /elastic7 docker.elastic.co/elasticsearch/elasticsearch:7.17.7 Up 58 seconds 2d59e128e748fa6ac4f5023945b5b7ab28f7ac2… <list>
# coming soon
If you are using Docker Desktop (which we recommend only for local installations on e.g., Windows), you can also monitor the containers there:
It might take a couple of seconds for Elasticsearch to start up. Then you can navigate to http://localhost/ in your browser to access the amcat client.
Before you access your newly created amcat suite, you should first check the settings and change to your needs. You can do this with amcat4 config
:
docker exec -it amcat4 amcat4 config
If you choose anything but no_auth
for authentification options, you should also add a global admin user via:
docker exec -t amcat4 amcat4 add-admin admin@example.com
# coming soon
# coming soon
You can also create an example data collection (which are called index in Elasticsearch):
docker exec -t amcat4 amcat4 create-test-index
docker_exec(id = "amcat4", "amcat4 create-test-index")
# coming soon
You can now use Middlecat to authenticate as the admin user:
Now you can access the test index at http://localhost/
:
2.1.3 Setup on your own server
If you decide not to go with Docker, for example, because you feel you need more control over what is happening, you can also run amcat on your system directly. We do not recommend this anymore and have, in fact, switched our own servers over to use the docker image. If there is something wrong with the images or you simply want to customise the setup, we suggest you head over to the GitHub repo and change the files as you like.
If you still want to go without docker, feel free to use the example configuration below. We assume that if you are going this route, you are running a Linux server. Below we show one example setup. Obviously feel free to replace the suggested Linux tools like systemd or nginx with your own choice.
2.1.3.1 amcat4 – aka amcat server
The first piece to set up is the amcat server and the Elasticsearch database it interacts with. To download and install Elasticsearch, refer to their website, or, preferably, install it through a package manager. For example, if you are running Debian or Ubuntu or another distro which uses apt
you can install Elasticsearch 7.x (which we are currently working with) like this:
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update
sudo apt install elasticsearch
In the next step, you need to configure Elasticsearch:
sudo nano /etc/elasticsearch/elasticsearch.yml
Configure the database to your own liking in terms of user management and exposure. Since we are controlling it through amcat4
, the only two things that really matter is the address and port (and that amcat4
still has access after you’re done configuring Elasticsearch). So within elasticsearch.yml
, we only look for two lines:
network.host: localhost
...
http.port: 9200
You can configure the memory usage of Elasticsearch
echo "-Xms4g" | sudo tee -a /etc/elasticsearch/jvm.options.d/memory.options
Leaving the values at their defaults here, we can enable the systemd service (skip this step if you’ve installed Elasticsearch through Docker):
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
You can check if everything is working with:
curl -X GET 'http://localhost:9200'
#> {
#> "name" : "amcat-opted-trekdrop0",
#> "cluster_name" : "amcat-opted",
#> "cluster_uuid" : "Sx-D89zmSx2zAcwl62u32A",
#> "version" : {
#> "number" : "7.17.6",
#> "build_flavor" : "default",
#> "build_type" : "deb",
#> "build_hash" : "f65e9d338dc1d07b642e14a27f338990148ee5b6",
#> "build_date" : "2022-08-23T11:08:48.893373482Z",
#> "build_snapshot" : false,
#> "lucene_version" : "8.11.1",
#> "minimum_wire_compatibility_version" : "6.8.0",
#> "minimum_index_compatibility_version" : "6.0.0-beta1"
#> },
#> "tagline" : "You Know, for Search"
#> }
Next, you want to setup the amcat server. You can do this wherever you like, but will will set things up at /srv/amcat
:
sudo git clone https://github.com/ccs-amsterdam/amcat4 /srv/amcat
sudo chown -R $USER:$USER /srv/amcat
cd /srv/amcat
python3 -m venv env
env/bin/pip install -e .[dev]
To test if it runs as expected, you can use:
env/bin/python -m amcat4 run
#> /srv/amcat/env/lib/python3.9/site-packages/elasticsearch/connection/base.py:200: ElasticsearchWarning: Elasticsearch #> built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See #> https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
#> warnings.warn(message, category=ElasticsearchWarning)
#> [INFO :root ] Starting server at port 5000, debug=True
#> INFO: Started server process [1001112]
#> [INFO :uvicorn.error ] Started server process [1001112]
#> INFO: Waiting for application startup.
#> [INFO :uvicorn.error ] Waiting for application startup.
#> INFO: Application startup complete.
#> [INFO :uvicorn.error ] Application startup complete.
#> INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
#> [INFO :uvicorn.error ] Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
To check and adapt the settings of amcat use:
env/bin/python -m amcat4 config
Since you probably don’t want to run amcat in an open ssh tab all the time, you should set it up as a service, for example with systemd
. So head over to /etc/systemd/system
and create a new file, for example, amcat.service
.
Here is a small example to set things up:
[Unit]
Description=Amcat4 API
After=network.target
Requires=elasticsearch.service
[Service]
Type=simple
User=amcat
Group=amcat
WorkingDirectory=/srv/amcat/amcat4
Environment=AMCAT4_ELASTIC_HOST=http://localhost:9200
Environment=AMCAT4_DB_NAME=/srv/amcat/amcat4.db
ExecStart=/srv/amcat/env/bin/uvicorn \
--proxy-headers \
--forwarded-allow-ips='*' \
--workers=2 \
--no-access-log \
--uds /tmp/amcat.socket \
--root-path /api \
amcat4.api:app
ExecReload=/bin/kill -HUP ${MAINPID}
RestartSec=1
Restart=always
[Install]
WantedBy=multi-user.target
In the above service, we run the amcat server as the user amcat
. To create this user and hand over the ownership of the amcat server folder to it use:
sudo useradd amcat
sudo chown -R amcat:amcat /srv/amcat
Then you can start the service and enable it to run on startup:
sudo systemctl daemon-reload
sudo systemctl start amcat.service
sudo systemctl enable amcat.service
Now you can check if everything is working with
systemctl status amcat.service
● amcat.service - Amcat4 API
#> Loaded: loaded (/etc/systemd/system/amcat.service; enabled; vendor preset: enabled)
#> Active: active (running) since Thu 2022-11-03 10:39:23 CET; 3min 29s ago
#> Main PID: 197173 (uvicorn)
#> Tasks: 4 (limit: 33532)
#> Memory: 86.8M
#> CPU: 1.770s
#> CGroup: /system.slice/amcat.service
#> ├─197173 /srv/amcat/env/bin/python3 /srv/amcat/env/bin/uvicorn --proxy-headers --forwarded-allow-ips=* --workers=2 --no-access-log --uds /tmp/amcat.socket --root-path /api amcat4.api:app
#> ├─197174 /srv/amcat/env/bin/python3 -c from multiprocessing.resource_tracker import main;main(4)
#> ├─197175 /srv/amcat/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=7) --multiprocessing-fork
#> └─197176 /srv/amcat/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=5, pipe_handle=9) --multiprocessing-fork
If something went wrong, you can troubleshoot with sudo journalctl -eu amcat.service
.
2.2 Frontend
2.2.1 amcat4client
If you have checked port 5000 of your new amcat server while testing it above (i.e., http://0.0.0.0:5000
), you were probably disappointed by a simple {"detail":"Not Found"}
message. This is because the client has been split from the main package to make it easier to develop. You can install the React client next to amcat in/srv/amcat4client
using:
cd /srv/
sudo git clone https://github.com/ccs-amsterdam/amcat4client.git
sudo chown -R $USER: amcat4client
cd amcat4client
npm install
If you get error messages about outdated versions of dependencies (which is likely on Ubuntu and Debian) you should update Node.js
. On Debian, you can do this likes so:
su
curl -fsSL https://deb.nodesource.com/setup_19.x | bash - &&\
apt-get install -y nodejs
exit
And the equivalent on Ubuntu:
curl -fsSL https://deb.nodesource.com/setup_19.x | sudo -E bash - &&\
sudo apt-get install -y nodejs
See this repository for instructions for other Linux flavours.
After that, you can build the React app:
npm run build
If your amcat instance will be publicly reachable, you can build the React app permanently attached to only your instance of amcat:
REACT_APP_FIXED_HOST=https://example.com/api npm run build
Once this has finished, you should hand over ownership of the React application to the previously created amcat user
sudo chown -R amcat:amcat .
Now we have an Elasticsearch and amcat4 running. But they are currently not accesible. To solve this, we can use, for example, nginx
to provide users access to the React frontend and the amcat API. Create a new nginx
config file with, for example, nano:
sudo nano /etc/nginx/sites-available/amcat.conf
Below is a minimal example of the amcat.conf
file, which you can copy and paste: For more information, visit the uvicorn documentation website.
server {
client_max_body_size 4G;
listen 5000;
location /api/ {
rewrite ^/api/(.*) /$1 break;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_redirect off;
proxy_buffering off;
proxy_pass http://amcat;
}
location / {
root /srv/amcat4client/build;
index index.html;
try_files $uri $uri/ /index.html;
}
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream amcat {
server unix:/tmp/amcat.socket;
}
To enable the site, use:
sudo ln -s /etc/nginx/sites-available/amcat.conf /etc/nginx/sites-enabled/amcat.conf
Then simply restart nginx
, for example, through systemd:
sudo systemctl restart nginx.service
To test if the API is reachable, use this:
curl http://localhost:5000/api/
#> {"detail":"Not Found"}
If everything works, you can now access the client at http://localhost:5000, or the address of your server, if you installed amcat remotely:
The React app is always running locally in your browser, even if you’ve accessed it on another computer. So the appropriate host needs to be the route to the amcat server. In the example above, I set up an amcat instance in my local network on a computer with the IP address 192.168.2.180
and port 5000
. To access that host, you need to enter:
Just replace 192.168.2.180
with the address of the machine you set up amcat on.
Success! However, the interface doesn’t show much at this point, since we added no data yet. We will do that in the storage chapter.
2.2.2 API Client
The R
client is called amcat4r
and can be installed via the following command in R
(install remotes
first if you don’t have it yet):
::install_github("ccs-amsterdam/amcat4r") remotes
If you have set up the amcat suite as shown above, you should be able to log into the database:
library(amcat4r)
login("http://localhost/api", username = "admin", password = "supergeheim")
If this does not throw an error, you have set everything up correctly.
Install amcat4py
from the command line through pip
:
pip install amcat4py
The you can open Python and log in:
from amcat4py import AmcatClient
= AmcatClient("http://localhost/amcat") amcat
If this does not throw an error, you have set everything up correctly.
Technically, it is a little more complicated, as Docker containers have many similarities to virtual machines. However, for most users that technical background is not really important. If you want to learn more, have a look here.↩︎
The yaml file is written for Docker Compose V2. If you are having trouble, check your version with
docker-compose --version
. Get the newest version as described here.↩︎