Docker
Docker is a great tool, but you have to understand it first. Otherwise, you may get into a lot of troubles!
Generally is docker virtualisation tool. It means for you a possibility to train on a different version of packages, clear environment and you in the charge of everything!
Image vs Container
In the beginning, you may get confused about what is what. Also, their purpose may be confusing.
Image
Image is like a clear installation of your OS. There may be some other packages and tools installed, but it is a starting point for your container. You can also imagine that it is like a template.
Let's get into example... Imagine you have a clean installation of Ubuntu LTS 18.04. You want to train new deep super-resolution gan network. What do you need?
- cuda
- python (probably 3.x)
- tensorflow-gpu
What would you do? I guess you will use apt-get install -y python=2.7.0 to install python. Then you can open the NVidia website to find install guide for Cuda. After that use pip install tensorflow-gpu to install tensorflow-gpu. It is a long way. Later after a few months, you need another version of Cuda. For instance downgrade to 9.0. Again install Cuda? Pain! Let's meet docker!
With docker, you can use others work as a starting point! There are two ways to do it. One is using official tensorflow-gpu image and the other is your own build.
build
Create Dockerfile in some empty directory.
FROM nvidia/cuda:10.1-base-ubuntu18.04 # see https://hub.docker.com/r/nvidia/cuda/
RUN pip install --yes tensorflow-gpu
Now you need to build your image. It is super easy! Only run docker build command with few parameters.
docker build -t dobromi1:cuda-tensorflow .
Let's explain it.
docker build \ ### base command
-t dobromi1:cuda-tensorflow \ ### your own imagename:tag
. ### build directory context
Thats all! For more options see official build documentation.
Container
For now, no matter what image you decide to use, you can run your first container. Running container means to start your virtual machine. Lest do it!
docker run -u $(id -u):$(id -g) --gpus all --rm nvidia/cuda nvidia-smi
This command will start the container with Cuda installer and it will show you an information about GPU usage, as you know it from Jupiter examples.
Let’s start some training!
docker run \
-u $(id -u):$(id -g) \
--gpus all \
-it \
-v /home/dobromi1/Experiments/device-list/:/workingdir \
-w /workingdir \
dobromi1:cuda-tensorflow \
python ./tf_device_list.py
PERSISTENCE
Always use docker with persistent storage (-v param). If you store anything in a docker container in a not mounted folder, you will LOSE it.
Below are some useful tips for your training.
run
You may get confused from lots of parameters used in run command. Their explanation is below.
parameters
docker run ### base command
-u $(id -u):$(id -g) ### map user name into container
--gpus all ### GPU visible in docker container
--rm ### remove container after training (execution) is done, use it fo cleaning space!
-v /path/on_alisa/:/path/in/container ### maping directory or file from alisa to path inside container
-w /wd ### working dir of container
dobromi1:cuda-tensorflow ### image:tag
--name dobromi1-class-tumor-rmi ### container name
python -c "from tensorflow.python.client import device_lib; device_lib.list_local_devices()" ### command to run inside container
interactive mode
docker run ### base command
-u $(id -u):$(id -g) ### map user name into container
--gpus all ### GPU visible in docker container
--rm ### remove container after training (execution) is done, use it fo cleaning space!
-it ### -i is for connection input, -t is for connection tty
dobromi1:cuda-tensorflow ### image:tag
bash ### run bash console inside container
log
If you are running a container on the background (with -d parameter) and you need to see console output. There is an easy way with docker logs.
docker logs -f container_name
connecting to a running container
Sometimes you want to connect to the running container and execute some command.
docker exec -it container_name_or_id bash