Surviving Chaos
  • Surviving Chaos
  • A Brief Intoduction to Chaos
    • Principles of Chaos
    • Kinds of Failure
    • Goals and Non-goals
  • Infrastructure Familiarization
    • Service Resilience
    • Monitoring and Logging
    • Generating Work & Data
  • Assembling Your Kit
    • Using a Cloud Node
    • Using a Private Node
  • A Menagerie of Tools
    • 1000 Ways to Die (`kill`)
    • Failing the Network (`ip`)
    • Controlling Traffic (`tc`)
    • Isolating and Parititioning (`iptables`)
    • A Fuzzy Schedule (`nmz`)
    • A Disfunctional Docker (`pumba`)
  • Failure as a Feature
  • Continous Chaos (CI/CD)
    • Example: Schrödinger
  • Resources / References
Powered by GitBook
On this page
  1. A Menagerie of Tools

A Disfunctional Docker (`pumba`)

Pumba lets you take ideas from previous tools and apply them in a container setting. This is useful since it can be used as part of existing tooling in your infrastructure.

Start off by spinning up some Docker containers to represent a system:

git clone https://github.com/pingcap/tidb-docker-compose
cd tidb-docker-compose
docker-compose up -d

Then you can access system:

mycli -h 127.0.0.1 -P 4000 -u root

At this point, you can use Pumba to apply some 'tweaks' to the system.

Pause a Container

Here we pause 2/3rds of the storage containers in a PingCAP's distributed database:

pumba pause -d 15s tidbdockercompose_tikv0_1 tidbdockercompose_tikv1_1

Prior to running this command running CREATE DATABASE example; will succeed nearly immediately. Running the same during the 15s failure window will delay this write operation until the system recovers.

Kill a Container

Send the main process inside a container the KILL signal:

pumba kill tidbdockercompose_tikv0_1

Network Emulation

If your image doesn't have tc installed, you need to run docker pull gaiadocker/iproute2 and use netem --tc-image gaiadocker/iproute2 instead of just netem.

Introducing a delay to two of the storage nodes:

pumba netem --tc-image gaiadocker/iproute2 \
    --duration 15s \
    delay \
      --time 3000 \
      --jitter 40 \
      --distribution normal \
    tidbdockercompose_tikv0_1 tidbdockercompose_tikv1_1

Introducing 99% packet loss on all three storage nodes:

pumba netem --tc-image gaiadocker/iproute2 \
    --duration 15s \
    loss \
      --percent 99 \
    tidbdockercompose_tikv0_1 tidbdockercompose_tikv1_1 tidbdockercompose_tikv2_1

Exercises

  • Try using the --random parameter to randomly select a node from passed set.

  • Try using the regex syntax to target specific nodes.

  • Compare a true distributed system (like TiDB) against a traditional system (Postgres + Client).

PreviousA Fuzzy Schedule (`nmz`)NextFailure as a Feature

Last updated 6 years ago