The Bot Who Cried Wolf

At Radify, we practise Continuous Delivery and have all sorts of tools and processes that support us in this. Many of these tools have notification abilities, and it’s tempting to turn on ALL THE NOTIFICATIONS! Unfortunately, this led to a case of “the Bot who cried wolf”, in that they grew to be ignored, and viewed merely as annoying noise. It’s not so much about crying “wolf” when there is no wolf, rather the bot could be seen as as essentially going to a zoo, standing by the wolf cage, and screaming WOLF! WOLF! THERE’S A WOLF! LOOK! RIGHT THERE IN THE WOLF CAGE! IT’S A WOLF!

Now, if you were in your local store shopping for carrots, you’d want to know if there was a wolf. At the wolf enclosure in a zoo, not so much. The bot is dumb, though, and by default it just screams at every wolf it sees. How can we make this bot smarter without discouraging it and making it cry? After all, it exists to alert us of wolves...

It was getting crazy up in here

Before we started addressing our notification strategy, a typical merge to master could trigger the following barrage notifications in our IRC channel:

Github: HEY I GOT A COMMIT, U JELLY?
Pivotal Tracker: THIS IS RELEVANT TO MY INTERESTS, AH YES, THIS IS SOMETHING ON THIS HERE TICKET, IS IT NOT?
Jenkins: I’m starting a build!
Jenkins: AWWW YIS this commit is now all UP IN MY PIPELINE… I’m starting a build!
Jenkins: OK it’s fixed now, good work, and now I’m pushing it out to dev via Ansible! Bet you can’t wait!
Jenkins: I’ve deployed code to dev!
Github: This has been merged!
Pivotal: I am updated as well!
Jenkins: RIGHT! Time to deploy to staging!
Jenkins: … OK it’s deployed to staging!

Noise annoys and this was way too much! This meant that important messages were getting lost amongst the chatter of the bots. It’s pretty tempting to simply turn on ALL THE MESSAGES, but is that really a good idea? This made us think; what’s actually a USEFUL notification?

What is the best policy?

One thing we realized is that, quite often, notifications became a proxy for things that we wanted to know the status of; information that our other tools did not readily present us with. One example: the build status of in-development branches. Notifications are a transient medium, which makes them a poor fit for persistent information, like the status of something. Resolving this was one of the driving forces behind the creation of StationMaster, our branch status tool (as promised, we’ll talk more about StationMaster in an upcoming post).

After some consideration, our notification policy came to be; “if you receive a notification, it should be for (a) something that you need to do something about, OR (b) it should be letting you know that something important has happened that, if questioned upon, you should be aware of”. Otherwise, we have too much wolf-crying!

Once we’d discussed our strategy, we centralised our notifications into Leeroy (which is what I, and probably a thousand other devs, named our Jenkins IRC bot). Only Leeroy (and occasionally Pivotal) speaks in the channel now. Other notifications come via email or pull request build statuses in Github fed back from Jenkins - that way, only the people involved have to see them, others are not distracted, and the channels become less noisy. Here are the notifications that we commonly use:

Team notifications

These are notifications that go to the entire team. These include:

New code on demo! We auto-deploy each feature branch so that it can be tested by manual QA (see our post “Four Principles of DevOps” for more on our workflow).
Something has been merged to staging and deployed!
A release to production is about to happen! Everyone pay attention! When we do a release to a customer, everyone in that project’s channel should be aware.
Build failure of releases. No release should be partial or incomplete due to our policy of immutable infrastructure for our API and UI nodes.
A new release went out. Phew!

All of these are of relevance to two or more members of each project team.

Direct to user notifications

These are notifications that only go to the person who they are of interest to.

Email of build failure/fixed. We generally use feature branches for development - Alice doesn’t need to know if Bob broke the build. Therefore, only Bob gets broken build notifications for builds that Bob broke. Alice (and the rest of the team) don’t need to know about what’s going on with that feature branch until Bob is ready to merge it.
Pull request builder. So, if a pull request build fails, only the people involved need to know about it. If, however, a release is in progress and fails, everyone on the team needs to know, because that’s headline news (due to our immutable infrastructure policy, if it does fail, nothing will happen from the client’s perspective, but the dev team definitely need to jump to attention!).

Notify us!

What is your organisation’s notification strategy? How do you balance signal and noise? What are the weaknesses in our approach? Let us know in the box below!

The Bot Who Cried Wolf