Absurdist Arcana Ben Weintraub's blog

What's a URI?

There are some things that almost everyone is familiar with, but which relatively few people understand deeply. You might go your whole life never awakening to your own lack of knowledge on a topic that you deal with in a passing manner daily.

For me, URIs were like this (and still are, to some extent). During a recent work discussion about how to structure internally-used URIs, a co-worker helped me realize this.

RFC 3986 formally defines the structure of a URI, along with related concepts like relative references. If you’ve never tried, it’s worth reading (or at least skimming the table of contents)! I’ll admit to only having tried for the first time a few weeks ago, despite working as a software engineer for the last 15 years.

One of the most surprising things I learned while reading it was that URIs do not need to contain two slashes (//). In order to understand the alternative constructions of URIs that looked different from what was in my head, I spent some time making a railroad diagram from (as subset of) the ABNF definitions in Appendix A of RFC3986. They’re pretty, and you might enjoy them:

segment segment segment-nz segment segment-nz scheme : authority segment ? query # fragment scheme : authority ? query # fragment WHAT I THOUGHT A URI WAS: WHAT I LEARNED A URI WAS: path-abempty (begins with '/' or is empty) path-absolute (begins with '/' but not '//') path-rootless (begins with a segment) path-empty (zero-length path)

Sticky Badness: Metastability and you

In the physical sciences, there’s a concept called metastability which is an apt metaphor for understanding certain kinds of software engineering failure modes. My grasp of this physics concept is entirely based on Wikipedia’s description, so take this with a grain of salt, but imagine you have a ball sitting in a well, like the red one labelled ‘1’ below:

An image showing a ball in three different positions: in a local minimum (1), atop a local maximum (2), and at a global minimum (3)

Undisturbed, it will sit happily. Nudge it a bit and it’ll roll right back into the bottom of the depression on the left and eventually come back to a rest. Nudge it enough, though, and it’ll hit position 2 and then tip down into the deeper well of position 3, from which it’ll be more difficult to dislodge.

Software systems sometimes exhibit behavior like this: small perturbations (a spike in load, an interruption in network availability, a slow disk access) might cause a temporary degradation of service, but generally will resolve without intervention. Sometimes, however, a critical threshold is crossed wherein a feedback loop takes over, holding the system in a bad state until a radical intervention (rebooting, load shedding, etc) is undertaken.

The paper Metastable Failures in Distributed Systems by Bronson, Aghayev, Charapko, and Zhu discusses these kinds of failures in more detail, and provides a succinct definition:

Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed.

The authors stress that it is the feedback loop (the depth of the second well in our ball analogy), rather than the trigger (the nudge) that should rightly be thought of as the ‘cause’ of a metastable failure:

It is common for an outage that involves a metastable failure to be initially blamed on the trigger, but the true root cause is the sustaining effect.

A case study

This failure mode might sound rare, but my team at work recently spent some time tracking down a metastable failure state in a very standard looking system: a pool of stateless HTTP servers, each with a database connection pool, which looked something like this:

cluster_app1 App cluster_app2 App lb Load balancer pool1 Connection Pool conn0 conn1 conn2 lb:e->pool1:w pool2 Connection Pool conn0 conn1 conn2 lb:e->pool2:w db Database pool1:e->db:w pool1:c1->db:w pool1:c2->db:w pool2:c0->db:w pool2:c1->db:w pool2:c2->db:w

This system was replacing an older one, but before the replacement we ran an extended load test by mirroring all production traffic from the old system to the new one (silently discarding the results from the new system). In so doing, we encountered a failure mode that only manifested occaisionally, but was catastrophic when it did. The symptoms:

  • Near 100% CPU utilization on the target database
  • Application latency pegged at our app-layer timeout (2 seconds)
  • 100% error rate for the application (with all failures being timeouts)
  • Observed across multiple application instances concurrently
  • No obvious changes in inbound application throughput or access patterns

Once triggered, the application would remain in this bad state until it was stopped and restarted (scaled to zero and then back up to the target instance count).

Closer examination revealed that the application instances were cycling through database connections at a very high rate. In fact, each request was attempting to establish a new database connection, but timing out before it was able to finish doing so, leaving the connection pools effectively empty.

Reproducing the issue

A failure mode like this can be hard to believe until you’ve successfully triggered it on-demand. While investigating this failure, we relied on a slimmed down test case involving a single application instance and a local database. Running locally, the cost differential between establishing a new DB connection and issuing a query over an existing one was smaller (we did local testing without TLS on the database connection, and without utilizing a database proxy component that was present in production), so it took some finagling to get a suitable reproduction case.

In order to better simulate the cost of new connection establishment, we wrapped a rate limiter around DB connection creation, forcing new connection attempts to block once the threshold rate was exceeded. In reality, the database CPU effectively functioned as this rate limiter. A full TLS handshake takes an appreciable amount of CPU time. Add to that other database-specific CPU work needed to prepare a new connection for use, and you can begin to see how a small number of instances, each with a modest connection pool might saturate the CPU of a database with reconnection attempts.

Inducing the issue in production

Simultaneously, my team had been working on ways to induce the issue under production load levels. The initial idea was to consume all available database CPU resources for a brief period of time (long enough for us to trigger application timeouts and thus connection reestablishment). This proved to be more challenging than it might sound (unsurprisingly, most material you’ll find online about consuming database CPU resources is written with the premise that you want to use less not more of them).

The thing that did finally work to trigger this failure mode was to hold an ACCESS EXCLUSIVE lock against a table read by most application requests for a few seconds, like this:

BEGIN;
LOCK TABLE t IN ACCESS EXCLUSIVE MODE;
SELECT pg_sleep(5);
COMMIT;

Using this approach, we were able to reliably trigger the failure state, which was critical to building our confidence that we had actually addressed the issue once we implemented a mitigation strategy.

Mitigation

Metastable Failures in Distributed Systems discusses several strategies for dealing with metastable failure states. The one we employed on my team to address this particular problem was a simple circuit breaker around all database interactions.

Our circuit breaker works by keeping a running count of the number of consecutive query failures it has observed, and immediately aborting any database queries for a certain amount of time once a configurable number of failures has been observed. The amount of time queries are disabled for scales exponentially with the number of failures (up to a fixed upper bound), and any successful query resets the failure count back to zero.

The state diagram looks like this:

cluster_ok Queries enabled cluster_notok Queries disabled closed Closed closed->closed Query succes ho Half open closed->ho Query failure ho->closed Query success ho->ho Query failure open Open ho->open Failure threshold tripped open->ho Trip duration elapses

A circuit breaker breaks the feedback loop: requests that arrive during the period when the circuit breaker is open are immediately aborted, shedding load on the database. The consecutive failure counter ensures that we don’t needlessly open the circuit breaker unless we have strong evidence that the next query is likely to fail.

Results

Without the circuit breaker, we were able to repeatedly trigger a persistent failure state using the locking approach described above. Once the circuit breaker was implemented, the same reproduction steps yielded recovery within seconds after the release of the lock. We’ve not seen any organic recurrences of the issue since the addition of the circuit breaker either.

Simulation

We can better understand this failure mode using a simplified simulation. I created one using SimPy, a discrete event simulation framework for Python. (Discrete event simulation just means that instead of simulating every ‘tick’ of simulated time, the simulation jumps between discrete events, which may trigger or schedule other events.)

The simulation models a fixed number of application instances, each with a fixed-size database connection pool, all connecting to a single database instance. Requests are randomly distributed amongst instances, and each request involves executing a single database query. The database query consumes a small amount of database CPU time, and simulates network latency between the application and the database by holding the connection from the pool for some additional time.

Requests have a timeout, and if the timeout fires during the execution of the database query, the connection that was being used for that request is marked as dead, and will be refreshed on the next use. The refresh process incurs a higher database CPU cost.

In order to simulate a trigger, the model holds all database CPU resources for a few seconds after an initial delay. A well-behaved system is expected to experience application timeouts during this period, since no other database queries get CPU time to execute. At lower throughput values, this is indeed what happens: the application sees a brief period of timeouts that is similar in duration to the triggering event itself. However, at higher throughputs, the application never recovers from its error state, as repeated timeouts lead to reconnection attempts, which in turn consume database CPU resources, and cause further timeouts.

The addition of a circuit breaker to the system allows for recovery from this state. In the charts below, you can see the results of simulating 70 seconds worth of time both without (left column) and with (right column) a circuit breaker.

The y-axis in each chart shows the database CPU utilization, and the x-axis shows time. The pink shading indicates the time during which the database CPU is artificially pinned at 100% usage by the simulation in order to induce the failure mode.

A figure showing two columns of graphs. All graphs have time on the x-axis, and database CPU utilization on the y-axis. The left-hand column shows results from the simulation without a circuit breaker, and the right-hand column shows results from the simulation with a circuit breaker. The highest-throughput simulation without the circuit breaker never recovers after being perturbed, but the simulations with the circuit breaker recover in all cases.

You might have a metastable system!

If your system seems to ‘enter a bad state’ under load from which it can’t recover without operator intervention, you too might have a metastable system. They’re not as rare as you might think! Check out the section 3 (‘Approaches to Handling Metastability’) of the paper I linked to previously to get some ideas as to how you might break the feedback loop in your system.

Acknowledgements

Many thanks to my friend Bill Dirks, who was a sounding board and consistent source of ideas and encouragement during the process of tracking this issue down!

Rendering Graphviz in your terminal

This post was inspired by a Twitter thread by @thingskatedid:

You should read the whole thread, there are lots of beautiful examples! Her main point is that you should enable your programs to dump their internal state representations visually. Graphviz is a convenient way to do this, since it’s easy to generate and read, and handles the actual rendering for you, but it can feel clunky to repeatedly cycle through the following workflow:

  1. Make code change
  2. Run your program (with flags to dump its internal state in Graphviz form)
  3. Redirect output to a temporary .dot file
  4. Remember the right Graphviz incantation to turn the .dot file into a .png or something (dot -T png in.dot >out.png, FWIW, but it took me years to commit that to memory)
  5. Open the resulting .png in your graphics viewer of choice

Kate talks about how to shorten this feedback cycle in this thread. This is just my distillation of her suggestions.

I use a terminal called kitty which, among other things, offers the ability to display high-quality graphics inline. The kitty documentation explains how to do it:

kitty +kitten icat path/to/some/image.png

That’s a bit wordy, so I’ve created a shell alias with the same name as Kate’s (idot), like this:

alias idot="dot -T png | kitty +kitten icat"

Now I can take some program that emits Graphviz, and just pipe it to idot, like this:

echo 'digraph { rankdir=LR; a -> b -> c }' | idot

Accommodating dark terminal background colors

If you use any terminal background color other than white, you’ll find that by default, your graphs don’t look as nice as Kate’s examples:

A screenshot showing Graphviz rendered in the terminal, with a white background

To remove the background colors and make the text, borders, and edges white, you can set a few attributes:

digraph {
  margin=0.75
  bgcolor="#ffffff00"
  color=white
  fontcolor=white
  node [color=white, fontcolor=white]
  edge [color=white]

  // ... the rest of your graph ...
}

If you don’t want to bother with setting these in every chunk of dot you produce, you can modify your idot alias to set them as defaults, using the -G, -N, and -E flags to dot to set default graph, node, and edge attributes on the command line, like this:

alias idot="dot -Gmargin=0.7 '-Gbgcolor=#ffffff00' -Gcolor=white -Gfontcolor=white -Ncolor=white -Nfontcolor=white -Ecolor=white -T png | kitty +kitten icat"

A screenshot showing Graphviz rendered in the terminal, with a transparent background and white text / lines

You can also throw whatever other defaults you like into your alias. Check out my previous post about making Graphviz’s output prettier for some ideas.

Prettier Graphviz Diagrams

Graphviz is an incredible tool. It allows you to visualize complex graphs by writing a simple declarative domain-specific language that’s equally easy to write by hand or generate programatically. Unfortunately, Graphviz’s defaults leave something to be desired. Without any attention to styling, the an example graph might look something like this:

client Client lb Load balancer client->lb backend1 Backend lb->backend1 backend2 Backend lb->backend2 backend3 Backend lb->backend3 db DB backend1->db cache Cache backend1->cache backend2->db backend2->cache backend3->db backend3->cache

Compare that to the same graph rendered using a newer tool, Mermaid.js:

Client
Load balancer
Backend
Backend
Backend
DB
Cache

To my eyes, Mermaid makes more visually pleasing results without any need to tweak the defaults. I use it for most simple diagrams that I need to make, but sometimes I really do need some of the additional flexibility that Graphviz provides. When I want to use Graphviz and have my results look not-terrible, here are the most important tips that I use.

Layout direction

I find that left-to-right graphs usually look better than top-down graphs. To achieve that, just set rankdir=LR attribute on your graph:

client Client lb Load balancer client->lb backend1 Backend lb->backend1 backend2 Backend lb->backend2 backend3 Backend lb->backend3 db DB backend1->db cache Cache backend1->cache backend2->db backend2->cache backend3->db backend3->cache

Node shape

Graphviz’s default node shape (oval) is ugly. There are lots of alternatives. box is a sensible default. You can set it as the default for all nodes in your graph with node [shape=box].

client Client lb Load balancer client->lb backend1 Backend lb->backend1 backend2 Backend lb->backend2 backend3 Backend lb->backend3 db DB backend1->db cache Cache backend1->cache backend2->db backend2->cache backend3->db backend3->cache

Ports

By default, Graphviz draw edges by connecting the centers of each node pair and then clipping to the node boundary. I find graphs often easier to read if I force edges to originate from and arrive at specific ports using the headport and tailport edge attributes. For a left-to-right graph, you can set reasonable defaults for all edges in your graph by adding edge [headport=w, tailport=e]:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w

Note that there’s a shorthand for setting these attributes too. Instead of saying:

a -> b [headport=w, tailport=e]

You can equivalently say:

a:e -> b:w

Fonts

Everyone’s got a favorite. For technical diagrams, sans serif fonts look better to me. You can set a default font for node labels using the fontname node attribute. Set the default for all nodes via node [fontname=<your-favorite-font>]:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w

The fontname attribute can be applied to edges (for edge label text) and graphs (for subgraph labels) as well.

Colors

Graphviz recognizes lots of built-in color names. If you just want colors that look OK together, use one of the Brewer color schemes. You can set the style=filled and colorscheme default node attributes, and then assign individual nodes colors with color=N, where N is just a numeric index into the color scheme of your choice:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w

Ranksep

The default layout can get a little squishy, especially with high edge densities. Set ranksep=0.8 (or higher, to taste) to push nodes of different ranks a bit further apart:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w

Dealing with backwards edges

Sometimes adding an edge that points ‘backwards’ in your graph will really mess up the layout:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w streamer Streamer cache:e->streamer:w streamer:e->client:w

There are several ways to fix this, but I usually start by giving the backwards edge weight=0 and assigning edge ports:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w streamer Streamer cache:e->streamer:w streamer:n->client:n

Grouping

Sometimes it’s nice to emphasize one particular flow within your graph. One way to do that is by ensuring that all of the nodes within that flow are co-linear with each other. The group node attribute tells Graphviz you want that, although sometimes it’ll come up with, um … creative ways of satisfying your request:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w streamer Streamer cache:e->streamer:w streamer:n->client:n

One way to convince Graphviz into laying things out in the way you wanted is to include ‘extra’ edges in your graph, and force the nodes connected by those edges to have the same rank by putting them into a subgraph with rank=same:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w backend1:s->backend2:n db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:s->backend3:n backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w streamer Streamer cache:e->streamer:w streamer:n->client:n

If you don’t want those edges to render, you can then hide them by setting style=invis on them:

client Client lb Load balancer client:e->lb:w backend1 Backend lb:e->backend1:w backend2 Backend lb:e->backend2:w backend3 Backend lb:e->backend3:w db DB backend1:e->db:w cache Cache backend1:e->cache:w backend2:e->db:w backend2:e->cache:w backend3:e->db:w backend3:e->cache:w streamer Streamer cache:e->streamer:w streamer:n->client:n

Graphviz sources for these examples

Here’s the Graphviz code used to produce the final diagram in this sequence, which demonstrates all of these techniques combined together:

digraph {
  rankdir=LR
  node [shape=box]
  edge [headport=w, tailport=e]
  node [fontname="Courier New"]
  node [style=filled colorscheme=dark26]
  ranksep=0.8

  client [label="Client", color=1, group=main]
  lb [label="Load balancer", color=2, group=main]
  backend1 [label="Backend", color=3]
  backend2 [label="Backend", color=3, group=main]
  backend3 [label="Backend", color=3]
  db [label="DB", color=4]
  cache [label="Cache", color=5, group=main]
  streamer [label="Streamer", color=6, group=main]

  subgraph f {
    rank=same
    edge [style=invis, headport=s, tailport=n]
    backend1 -> backend2 -> backend3
  }

  client -> lb
  lb -> {backend1, backend2, backend3}
  {backend1, backend2, backend3} -> db
  {backend1, backend2, backend3} -> cache
  cache -> streamer [weight=1]
  streamer:n -> client:n [weight=0]
}

Acknowledgements

Thanks to @johanneshoff for pointing out a bug in one of the examples in this post!