thwidge

Home | Blog | Thoughts | Contact

Maren Beam is thwidge on the internet

Feel free to send me an email, or message me on keybase if it needs to be secure.

You can look at things I'm programming on GitHub, things I've written about on my blog, and things I've spouted off about on my thoughts page.

Contact

You can also send me a letter or a package:

Maren Beam
PO Box 678
New York, NY 10031

All original content and source code on this site is licensed under CC BY-SA and MIT respectively, unless otherwise noted.

Blog

What if I just wrote HTML?

June 6th, 2020

I went down a rabbit hole the last few days. As is tradition, the rabbit hole started with ugh, I don't like my website.

This feeling kind of pervades my life. As an ops-skewing gal I both don't know how browsers work and find too much magic hard to swallow. But as a tech-employed gal I feel obligated to have an internet presence with some degree of intentionality. So, a website

I hadn't ever written HTML until last year. Honestly I'm not sure I'd ever right-clicked inspect until the year before that — though that was certainly the result of my own hardheadedness. I got on the Browsers Are Dumb And The Internet Shouldn't Have Happened train early. I've since decided that browsers are definitely dumb but the internet is important and browsers are what we have so you gotta just do it, too bad.

As I soon came to learn, one person doesn't typically produce a pretty website without deploying a substantial amount of magic. And it is truly magical! Just npm install something and then type your website words and then probably npm something else (I have no idea what I'm talking about) and then npm just poofs fifteen more magical things and then firefox localhost:1234 and oh my goodness where did that stunning vision of a website come from.

But I couldn't do it. I had to know. I had to muck enough in the details that I could trick myself into believing that I knew what was going on when that sweet, sweet 200 came back.

So I went straight for HTML from the start. Since I don't know how browsers work and I'd never written HTML or right-clicked inspect before it was extremely slow going. But I made a thing. And then I remade it. I think I remade it a third time before I realized that what I was calling 'learning' was actually 'iterating uselessly on my brand' and I needed to pull myself together.

So of course I remade it one more time but this time I shamelessly prioritized look and was able to let it sit for a bit.

But something was still eating at me and I couldn't figure out what.

I moved on with my life — hanging out with friends and mothering clusters with my coworkers. Then the pandemic hit and my gaze turned navel and I started fussing with more side projects. One of these side projects was a bash script that got completely out of control and turned into a microblog. It's called Thoughts and there's definitely no "plan" but since you type opinions in vim and get back a website I started actually learning how HTML works.

And then one day, I realized. It was my blog. My blog was making my But What Is It Really alarms go off veeery quietly.

Now don't get me wrong, my blog was fine — a smattering of pandoc and python and bash that made it so I could just type markdown and receive an aesthetically coherent, syntax-highlighted webpage — but when I right-clicked inspect there was so much...stuff. Where was it coming from? What was it doing? How could I sleep when my brand depended on all this magic?

So I went down the rabbit hole. I wanted to be able to type markdown but get dumb HTML back. The kind of HTML that makes you say yep, that's a website.

First I found Txti. I'm still convinced it's one of the best things on the internet, but it doesn't support codeblocks and this is mostly a programming blog so that wouldn't work.

Then I remembered that rwtxt exists and it's honestly beautiful, but: still a decent amount of magic, not clear if you can self-host publicly while limiting public domain creation, and designed for the "wiki" use case. I decided to pass there, too.

Then I remembered that I had just written an AWK function which, when it's low tide and Mercury's in retrograde, turns fake-markdown into very dumb HTML. Maybe I could expand this a bit to cover the other essential markdown things I'd want for a blog, and then I'd have a really opinionated fake-markdown parser that gave me the exact HTML I want. But I'd probably need to open source it and that means I'd have to worry about portability, and I'd have to learn so much more AWK and am I really trying to sink tens of hours into learning AWK right now? Because I'm not sure that's a very Career Oriented Decision but also it's pretty important not to limit my learning based on career utility so maybe I could just—

And then it hit me. Bricks, etc. What if I just...wrote HTML?

So I did. I am right now. It's 2020, and I'm just writing HTML. And honestly, it's perfect.

Hosting a static site with Docker, Traefik v2, SSL, and cron

February 25th, 2020

I hope this post might be helpful for someone using Traefik for the first time, someone moving from Traefik v1 to v2, or someone who's getting familiar with Docker compose.

My use case and constraints

I want to host many different things on one box. Currently, the most boring way to do that is with Docker. My previous setup, though technically simple, felt overwhelming because I was holding state in my head rather than in files. Docker would force me to put more system state in files.

Parts

Why swarm mode and Traefik? I think swarm gets you most of the declarative things that make deployment easy, without the wild overhead of Kubernetes. I'm using Traefik because it's what I know and it hasn't let me down yet! Also, the SSL story is very straightforward.

Prep

SSH into the host and decide where you want all your Docker service configurations to live. I put mine in ~/docker. They could all go in one docker-compose.yaml file, but I put mine in different directories because I have unrelated services running on the same host. If you adhere to this framework, then you'll want a parent folder for everything, a folder for the ingress controller configuration, a folder for the website configuration, and a subfolder for the source code of the website itself.

$ mkdir -p ~/docker/traefik
$ mkdir -p ~/docker/mywebsite.com/site

And for our last piece of host-setup, we need to enable swarm mode and create a Docker network for our services to use.

$ docker swarm init
$ docker network create --driver overlay proxy

We're creating an overlay network because this is a swarm node and we'll be deploying swarm services to it. All services will connect to this network so that they can talk to Traefik, and Traefik will be the only thing that can talk to the internet. You can find more information about overlay networks here.

One way to deploy swarm services is to write a docker-compose.yaml configuration for each service, and then deploy them with docker stack deploy. This is well-supported in the docker documentation, so it's what we're going to do.

$ vim ~/docker/traefik/docker-compose.yaml

Now we can really start doing stuff!

Configure Traefik

First we're going to set up Traefik. Paste this configuration into the file you just opened, and edit as necessary for your use case. At the very least, you'll need to change the email address. I've included comments explaining most lines.

version: "3"

services:
  traefik:
    # specify the docker image we're deploying as a service
    image: "traefik:latest"
    # this specifies the name of the network the service will connect to
    networks:
      - "proxy"
    # these commands override service configuration defaults
    command:
      # set the service port for incoming http connections
      - "--entrypoints.web.address=:80"
      # set the service port for incoming https connections
      - "--entrypoints.websecure.address=:443"
      # enable the traefik api. this would be used by the traefik dashboard if we set that up
      - "--api=true"
      # tell traefik that it's connecting to a swarm, rather than regular docker
      - "--providers.docker.swarmMode=true"
      # traefik automatically finds services deployed in the swarm ("service discovery").
      # this setting restricts the scope of service discovery to services that set traefik.enable=true
      - "--providers.docker.exposedbydefault=false"

      ### these three lines configure the thing inside of traefik that's going to get/renew/manage SSL certificates for us.
      ### It's called a "certificate resolver"
      # 'leresolver' ("Let's Encrypt resolver") is just the name we're giving to the certificate resolver.
      # The name you choose can be different.
      # set the email address to give Let's Encrypt. we should give them a real email address whose inbox gets checked by a human
      - "--certificatesresolvers.leresolver.acme.email=myemail@mailbox.org"
      # set the location inside the container to store all certificates
      - "--certificatesresolvers.leresolver.acme.storage=/acme.json"
      # tell the certificate resolver the method we want to use to get an SSL certificate.
      # you can read about challenge types here:  https://letsencrypt.org/docs/challenge-types/
      - "--certificatesresolvers.leresolver.acme.tlschallenge=true"

    # because traefik is the ingress controller and thus must talk directly to the internet,
    # we want to bind ports on the traefik container to ports on the debian host. this does that
    ports:
      # container-port:host-port
      - "80:80"
      - "443:443"
    # make things on the host accessible to the container by mounting them in the container
    # /host/path:/container/path
    volumes:
      # mount the docker unix socket inside the traefik container.
      # this is essential for traefik to know about the services it's sending traffic to.
      # we mount it read-only for security. if traefik were compromised, and the docker socket were mounted read/write,
      # the attacker could send instructions to the docker daemon.
      # you can learn about unix sockets here:  https://en.wikipedia.org/wiki/Unix_domain_socket
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      # mount this file inside the traefik container. this is where SSL certificates are stored.
      # if we don't do this, when traefik reboots (which is guaranteed), we'll lose all our SSL certificates
      - "./acme.json:/acme.json"
    # the deploy block is here because this is a swarm service.
    # other than setting labels, we're using all the swarm mode defaults for this service
    # more information is here: https://docs.docker.com/compose/compose-file/#deploy
    deploy:
      labels:
        # redirect all incoming http requests to https.
        # this will apply to all services sitting behind traefik. for us, that's all services
        - "traefik.http.routers.http-catchall.rule=hostregexp(`{host:.+}`)"
        - "traefik.http.routers.http-catchall.entrypoints=web"
        - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"

        # define a traefik 'middleware' to perform the actual redirect action.
        # more information about traefik middlewares:  https://docs.traefik.io/middlewares/overview/
        # more information about the RedirectScheme middleware:  https://docs.traefik.io/middlewares/redirectscheme/
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"

# this is necessary because we're connecting to a pre-existing network that we made ourselves. in this case, the 'proxy' network
networks:
  # the name of the network
  proxy:
    # this tells docker, "Don't make this network yourself, because I've already made it." It's 'external' to docker-compose
    external: true

And that's the main Traefik configuration! This may seem like a lot, but we'll never have to touch this configuration again -- even if we deploy 50 unrelated services behind this Traefik instance.

Before starting Traefik, let's create and set permissions for the acme.json file where our certificates will be stored. This file will be full of mission-critical secrets, so it's important to do this right.

$ touch ~/docker/traefik/acme.json
$ chmod 600 ~/docker/traefik/acme.json

And that's actually the end of the Traefik configuration.

One thing that Traefik has is a fancy dashboard. For clarity of configuration, we've not set that up. Since we've not set that up, and we haven't deployed our website yet, we don't have a good way to test our setup at the moment. The best you can do is deploy Traefik and then check to see if it's running. We're deploying with docker stack deploy because this is a swarm service.

$ docker stack deploy --compose-file ~/docker/traefik/docker-compose.yaml traefik
$ docker container ls

If you see one Traefik container running, that's great! You could unplug your server from the wall right now (not recommended), plug it back in, and the Traefik service would automatically come back up as soon as Docker was able to make it happen.

Now let's configure our actual website.

Configure the website

First let's work on the Docker service configuration. Open a new compose file:

$ vim ~/docker/mywebsite.com/docker-compose.yaml

And paste in the following configuration. Again, edit as necessary for your use case:

version: '3'

services:
  nginx:
    # we specify that we want to use the alpine-based nginx image
    image: "nginx:alpine"
    # connect to this network in order to connect to traefik
    networks:
      - "proxy"
    # mount the directory containing the source code for our website inside the container.
    # this is the directory that the default nginx configuration automatically serves content from.
    # by putting our site here, we avoid having to write any nginx configuration ourselves
    volumes:
      - "./site:/usr/share/nginx/html:ro"
    deploy:
      labels:
        # tell traefik that it can automatically "discover" this service
        - "traefik.enable=true"
        # tell traefik that all requests for 'mywebsite.com' should be sent to this service
        - "traefik.http.routers.mywebsite.rule=Host(`mywebsite.com`)"
        # only allow incoming https connections
        - "traefik.http.routers.mywebsite.entrypoints=websecure"
        # tell traefik which certificate resolver to use to issue an SSL certificate for this service
        # the one we've created is called 'leresolver', so this must also use 'leresolver'
        - "traefik.http.routers.mywebsite.tls.certresolver=leresolver"
        # tell traefik which port *on this service* to connect to.
        # this is necessary only because it's a swarm service.
        # more info is here: https://docs.traefik.io/providers/docker/#port-detection_1
        - "traefik.http.services.mywebsite.loadbalancer.server.port=80"

# again, we have to specify that we've already created this network
networks:
  proxy:
    external: true

Now we can do a quick test to see whether everything's working up to this point.

$ echo 'hello world' > ~/docker/mywebsite.com/site/index.html
$ docker stack deploy --compose-file ~/docker/mywebsite/docker-compose.yaml mywebsite

Wait for 30 seconds, just for good measure. Consider making some tea! Then, visit mywebsite.com in a browser, or on your local machine:

$ curl https://mywebsite.com

If you get a response (or a page) containing only hello world, success!

Now we can do the last step: setting up automatic deployments with GitHub and cron. If you don't already have a static site you'd like to use for this, you can use this template to start with.

Automatic deployment

Our end-goal workflow for making changes to our site is:

  1. Make changes to our website on our local machine
  2. Assuming our source code is in a public repo on GitHub, commit our changes and run git push
  3. At the top of the next hour, our changes are visible on the internet

We're going to use a cron job running on the host to achieve this. This is a pretty funny combination of new computer (traefik, swarm mode, etc.), and old computer (cron). But do you want to set up a whole CI pipeline for a personal, static website? Me neither! I think cron is perfect for something like this.

We're going to skip over creating a new repo and just work with this template which you should absolutely feel free to use for your own website!

Assuming you've forked my repo, or are otherwise set up with a git repo you'd like to use, now we just need to set up a cron job on our host that'll pull the repo each hour and copy it into ~/docker/mywebsite.com/site for Nginx to serve.

First, SSH into the host. Then:

$ mkdir ~/cronjobs
$ mkdir ~/.mywebsite.com
$ vim ~/cronjobs/update-my-website-dot-com.sh

In the file you just opened, paste the following:

#!/bin/bash
cd ~/.mywebsite.com/mywebsite.com
git pull
# we only want to give nginx the files that we actually want to serve.
# we include the --delete flag so that if we permanently remove a file from our site's source code,
# it's removed from the directory that nginx is serving.
# basically, a true "sync" with rsync requires the --delete flag
rsync -a --delete --exclude '.*' --exclude 'README.md' --exclude 'LICENSE' . ~/docker/mywebsite.com/site/

Make the update script executable, and for good measure be sure rsync and git are installed:

$ chmod +x ~/cronjobs/update-my-website-dot-com.sh
$ sudo apt update
$ sudo apt install rsync git

Now get the repo onto the host, and into the right place — we only have to do this once.

$ cd ~/.mywebsite.com
$ git clone https://github.com/thwidge/mynamedotcom.git

Run the update script once manually to sync the repo right now:

$ ~/cronjobs/update-my-website-dot-com.sh

Finally, cron it:

$ crontab -e

And in that file add the line:

@hourly ~/cronjobs/update-my-website-dot-com.sh

And that's it! You've now got a single node Docker swarm cluster; Traefik accepting incoming requests, routing them to the appropriate service, and programmatically handling SSL provisioning and termination; an Nginx container serving your static site over HTTPS; and a simple cron job reliably syncing and deploying all changes merged to the main branch at the top of each hour.