## Containerizing your geospatial applications with Docker #### FOSS4G Boston 2017 Note: Hello everyone and thanks for attending my presentation on geospatial containers and how they can help you develop, deploy, and manage applications more easily. In this talk I hope to make clear why containers are popular in software development but also underscore the ways in which Docker and programs like it can reduce rework and cut mundane tasks from your workflows. Let's begin.
### About me * Software Architect @ [Reinventing Geospatial, Inc. (RG
i
)](http://rgi-corp.com) * Level 20 CLI Wizard (_can cast 9th level GDAL commands_) * Geospatial API development in Java, Android, Javascript * My [GitHub](https://github.com/stevendlander) and [GitLab](https://gitlab.com/stevendlander) * See my [previous](https://github.com/GitHubRGI/FOSS4G-Seoul-2015/blob/master/Friday/1125-1150_BuildingCI/BuildingCI.pdf) [talks](https://github.com/GitHubRGI/FOSS4G-Seoul-2015/blob/master/Thursday/1125-1150_OGCGeoPackageInPractice/OGCGeoPackageInPractice.pdf) from FOSS4G Seoul 2015 Note: My name is Steve Lander, and I have been in the geospatial domain for going on 7 years now in various roles, most recently as a project technical lead and architect. Applied research is a good way of describing the sorts of projects I have been working on recently, for example, API development for tile caching and storage. I also happen to dabble in the dark art of GDAL command line utilities. --- ### Outline * Why even use containers? How will they benefit me? * Ok...what are some ways I can make one? How do I get started? * What are some more advanced topics? Note: What will we be discussing today: How are containers fundamentally different from other technologies and why is that a good thing? What are some easy ways to get started making containers that help me do my job? And finally, for those of you well-versed in Docker and similar programs already, what are some advanced topics to take you further.
### What is a container? The delta between the files on your system and another file system running in a separate space. Note: Through file permissions, SELinux, and I am sure many other methods, containers exist as separate filesystems that use as much of your base operating system as possible. --- ### Container vs. VM A VM is a _copy_ of an entire computer _within_ your computer. Efficiency is lost in this duplication. Note: Do you really need everything a VM brings with it? With VMs you end up paying for things you don't necessarily need. --- ### Relevant use cases Note: From here on, I am going to assume Docker is your tool of choice for manipulating containers. There are other options such as Vagrant, however, should you be interested in other offerings. --- ### A build environment Build your applications consistently and reliably ```Dockerfile # Start from this base linux image... FROM debian:8.9 # aka "jessie", but be specific! # Install this software to the container from a pkg manager RUN apt-get update -y && apt-get install -y \ curl \ wget \ openjdk-8 # Customize my environment they way I need it ENV JAVA_HOME /usr/bin/java ``` Note: * Containers work well as build environments. If you have ever used TravisCI through GitHub, you have been using containers already. * Your Dockerfile is a recipe on how to make the same thing over and over exactly the same way, so it pays to ensure specific versions of software are used instead of the latest ones. You wouldn't want updates to your base OS breaking your software unexpectedly! --- ### A build environment (cont'd) Avoid polluting your desktop environment with dependencies you don't need outside of your development environment ```Dockerfile FROM debian:8.9 # Move to a new folder to work in, creating it if necessary WORKDIR /workspace # I install so many packages, it isn't worth listing them all COPY install_packages.sh install_packages.sh RUN /bin/bash install_packages.sh # Build GDAL from scratch (let's say I need MrSID)... COPY gdal_build_script_of_doom.sh gdal_build_script_of_doom.sh RUN /bin/bash gdal_build_script_of_doom.sh ``` Note: Maybe your program needs tons of python libraries that you would not otherwise use on other projects. Perhaps you build a library like GDAL from source to get additional driver support not otherwise provided due to licensing restrictions along with some SWIG bindings to other languages. In either of these cases the computer you rely on to do your job stays neat and tidy, and when you don't need the dependencies, simply delete your container. --- ### A build environment (cont'd) Reduce the time it takes for other people to build, run, test, and contribute to your project ```Dockerfile # A Docker container with my code FROM debian:8.9 WORKDIR /workspace RUN git clone https://github.com/stevendlander/foss4g2017 \ && cd foss4g2017 COPY helperScript.sh helperScript.sh EXPOSE 8080 # My web service port EXPOSE 8000 # Debugging port ENV JPDA_ADDRESS 8000 ENV JPDA_TRANSPORT dt_socket ``` Note: Create a container for other developers to be dropped into a command prompt, ready to start debugging from their IDE over HTTP. If you ever have had to spend days getting your environment configured, I hope you can appreciate how much time and frustration this can save. --- ### Local builds Use a simple container to build your code and artifacts ```bash docker run --rm \ -v /your/code:/workspace \ --entrypoint cd /workspace && mvn build \ maven:3.5-jdk-7 ``` Note: This example will mount your code directory to the Docker container workspace folder, then run a maven build command within it. Once it is done, the container will be cleaned up. Since you mounted your code directory to the container, all the files it created during your build will stick around on YOUR file system. --- ### Continuous integration * Create a build-ready container with static analysis tools, security scan tools, code coverage libraries * Save the container with a friendly name ```bash docker commit
my_java_app:1.0 ``` * Publish it to [DockerHub](https://hub.docker.com) ```bash docker push my_java_app:1.0 ``` Note: Load up a machine with everything you need to automate your project to give you as much information on its health as possible. Save your machine and push it to either DockerHub or another repository of your choosing. --- ### Continuous integration (cont'd :D) Source that container into your .gitlab-ci.yml file ```yaml test:my_test_phase: image: my_java_app:1.0 script: - ./gradlew build - ./gradlew javadocs ``` Note: Use your new container to automate how you build your application in the future. If you forget something, container versions can be updated by changing to 1.0 to a 1.1.
### Container creation Do this ```Dockerfile ... RUN git clone
\ && cd
\ && ./gradle assemble ... ``` Not this ```Dockerfile ... RUN git clone
RUN cd
RUN ./gradlew assemble ... ``` Note: Docker likes to make what it calls layers. Layers can be rolled back if they fail to build so you don't have to start a long complicated build from scratch. This sounds useful, and it is if you plan for it, but normally it just creates unnecessary bloat. --- ### Container creation (cont'd) Do this ```Dockerfile COPY filename.sh new_filename.sh ``` Not this ```Dockerfile ADD filename.sh new_filename.sh ``` Note: The ADD command can behave unexpectedly in some instances, when you probably just wanted to simply add a file to a container. If that is your intent, use copy. If you really did want local-only tar file extraction or remote URL support however, then by all means, ADD away. --- ### Container creation (cont'd) * Use the FROM command to inherit from an image that is already useful, such as Tomcat, GDAL, PostGIS, etc * Stick to one purpose per container, so scaling becomes easier. More on that later... * Treat your containers as ephemeral Note: Don't do extra work if you don't have to, right? If you don't need a GDAL driver that you can only get by compiling from source, then just start with a known good GDAL container. Keeping your container's purpose modest becomes very important when you start using them together, as we will talk about later. Lastly, don't expect a new container to be able to pick up where you last left off with the last one. Unless you mount a directory with state information, it won't work like you would hope.
### Advanced topics Note: By now you should all be drinking the container kool-aid. But how can you take your Level 19 CLI Wizarding skills to the 20th level? --- ### Orchestration Many containers all working together in a coherent way. Examples: * GeoServer + PostGIS * NodeJS front-end webapp + Tomcat REST endpoint + Postgres data store * Tomcat serving Leaflet + MapProxy + RabbitMQ Note: Once you separate out your containers by responsibility, you can combine them in ways that make sense. Maybe your development environment extends GeoServer and uses OSM data stored in PostGIS. Perhaps you have a NodeJS webapp and want Tomcat and GDAL to handle some raster manipulation. Maybe Tomcat serves out a Leaflet front end that you have added some custom functionality to create tile packages by sending jobs to a messaging server, which can get picked up by multiple instances of MapProxy. --- ### 12 factor applications * A dozen helpful considerations to keep in mind when making containers that scale * Some range from the obvious (use SCM, duh) to the kinda-scary (put your credentials in ENV variables) * Quick read: [12factor.net](https://12factor.net) Note: That brings me to my next point: we should really start to think about building scalable apps from the get-go. These might not all apply to the application you are building but I would urge you to at least keep them in the front of your mind when making a container you think might end up as a legit application. --- ### Alpine * Small footprint linux distribution, perfect for a production container * No, really, it is SUPER small (_Under 4MB_) * Based on Busybox (apk instead of apt) Note: To get the most efficiency out of your containerized applications, you want them to be as small as possible. This is where Alpine, a stripped down linux distribution, comes in. Alpine is based on busybox, so it kinda-sorta acts like android linux. Many official docker images such as Postgres, Tomcat, and Node, for example, also publish an alpine version of their releases so docker users downstream can use their images in production environments. --- ### Gripes >_< Build environment containers typically run as _root_ which a linux host will honor Note: So I find it frustrating when I run my Java build in my build environment container and once it is done, I have a build directory full of files that belong to root. This is because docker containers typically run as the root user. Windows hosts don't have that problem but I still won't switch, no offense. --- ### Gripes >_< (cont'd) Containers can only run on like-platforms, i.e., Windows-Windows, Linux-Linux Note: If you want to run a Windows host on your linux machine, I am afraid you are out of luck unless you get spicy with a Virtualbox set up. Windows can get around this problem because the Docker platform there uses Boot2Docker as a base image with Virtualbox. --- ### Gripes >_< (cont'd) Windows containers are _getting there_... Note: Windows containers do exist but I admit to having little experience with them. As far as using them for Windows builds or continuous integration, I can't say they would substantial benefit over a Jenkins machine running your builds. --- ### Gripes >_< (cont'd) ...but they still cause headaches with HyperV and other emulation platforms like Android when your host is Windows Note: Even when you are on Windows, rocking a linux container, expect some headaches with virtualization bios settings even in modern hardware. Android emulators don't place nice in the same sandbox. My coworkers have switched to linux on their laptops to avoid this issue.
### Questions? --- ### Thanks!