Docker debugging
The goal of this document is to give troubleshooting methods for our Docker environment. This applies to when a service managed by Docker is not running properly.
Get the list of running containers
First, one should check if the containers needed for the service are running. To know the name of containers associated with a service, go to the service list and look for the documentation or the source code of the service. For example, if Wekan is not working, the documentation points you to this repository. There you can look for the README.md, or the ansible playbook. The playbook or the README will give the name of the docker containers associated with the service. For example, the playbook for wekan shows here and here that the containers responsible for wekan have the names wekandb and wekan, respectively. Once you know what you're looking for, you should check the list of containers.
You can then get the list of Docker containers in two ways:
From the Docker status page
Please go to docker.fsfe.org. You'll find a list of docker containers along with their logs.
Directly on lund
On the lund server, type the following command:
docker ps -a docker logs <contains name>
Without the -a flag the stopped containers are excluded from the list.
If the container is running, check the logs of the container for errors.
If the container is stopped, then you will have to figure out why it does not start. Below is a list of methods to troubleshoot why a container is not starting.
Simply start the container
If the container is stopped and you want to start it, use the following command:
docker start <container name>
Then you can use the docker ps command to make sure the container has been started. If the service does not work after using the docker start command, there is two possibilities:
1- docker start reports an error, then the error is related to the docker configuration / the container.
2- docker start works, then the error is in the application running in the container, and the application make the container crash (exit code != 0).
Recreate the new the container
If 1, then docker start will give you indications about the error. The error is probably in the configuration of the container. If the container was deployed on lund, it was probably deployed using an ansible playbook. Fix the playbook (depending on the error you got from the docker start command) and redeploy the playbook. The command to do this is in the README file of the service. It must be something like ansible-playbook -i hosts drone.deploy.yml.
Run the container manually
If 2., the docker start command worked but the container crashes immediately, then you will have to run the container manually to get more information about the error. One way to do that is to find out how what were the parameters used to run the container, and use them in a docker run command so you can know what is going on. All parameters for containers are given by the docker inspect <container name> command, but the best way to get them is to use the ansible playbook, and run the corresponding docker run command.
For example, if the docker container art13 does not start, you can find all the parameters required to run it in the playbook:
- name: run the wekan container docker_container: name: wekan image: wekanteam/wekan:v0.75 state: started restart: yes restart_policy: always networks: - name: wekan-net ipv4_address: '192.168.201.20' links: - wekandb:wekandb env: VIRTUAL_HOST: kan.fsfe.org LETSENCRYPT_HOST: kan.fsfe.org LETSENCRYPT_EMAIL: max.mehl@fsfe.org MONGO_URL: mongodb://wekandb/wekan ROOT_URL: https://kan.fsfe.org MAIL_URL: smtp://mail.fsfe.org:25/ MAIL_FROM: admin@fsfe.org
Then use docker run --help to find the correct arguments:
docker run --name wekan wekanteam/wekan:v0.75 \ --restart always \ --network wekan-net \ --ip 292.168.201.20 \ --link wekandb:wekandb \ -e VIRTUAL_HOST=kan.fsfe.org \ -e LETSENCRYPT_HOST= kan.fsfe.org \ -e LETSENCRYPT_EMAIL=max.mehl@fsfe.org \ -e MONGO_URL=mongodb://wekandb/wekan \ -e ROOT_URL=https://kan.fsfe.org \ -e MAIL_URL smtp://mail.fsfe.org:25/ \ -e MAIL_FROM=admin@fsfe.org \ wekanteam/wekan:v0.75
This command will start the docker container manually, so you can see all errors and hopefully understand what is causing the errors.