SignalFX & AWS EventBridge

Shortly after I started at Steamhaus as an SRE it was decided we should move to using SignalFX for our monitoring, rather than relying solely on AWS CloudWatch

There were a number of reasons behind this move, which I won’t go into great detail in this post, but for those curious the highlights are:

  • Well-considered historical metric aggregation, allowing for superb anomaly detection
  • The ability to create and configure as many custom metrics and dashboards as you please
  • A wide and rapidly expanding selection of integrations
  • Flexible pricing model

And last, but not least:

  • It has a Dark Theme, which is greatly appreciated when you’re trying to interpret an alert having just been woken up at 3 AM

Recently we came across a memory leak in one of my customer’s applications. This isn’t too uncommon, memory leaks are a fact of life, but unfortunately this coincided with what every blanket email I receive calls “These Uncertain Times™”, and the customer’s developers were suddenly confronted with an urgent and seemingly bottomless to-do list. 

Enter your friendly neighbourhood DevOps engineer, keen to try out the latest addition to his toolkit, and even more keen to sleep through the night without PagerDuty waking him up to deal with a known issue.

At this point, I should introduce AWS EventBridge

At first glance, this may seem to be just another Event Bus service, and those familiar with CloudWatch events may be wondering what’s new here, but AWS have added some crucial features that caught our attention.

Of these new changes, the one that attracted our attention the most is the addition of Partner Event Sources, which as the name suggests is a list of sources approved by AWS to trigger events in EventBridge, without having to do the usual faffing around with webhooks. 

You can see where this is going. SignalFX detects memory leak, informs EventBridge, which in turn triggers a lambda to deal with the EC2 instance in question, with extreme prejudice.

Now for the good bit. 

Setting up the integration between SignalFX and EventBridge is straightforward. Log into the SignalFX portal, go to Integrations, and then search for the EventBridge integration option. Select ‘New Integration’, input the AWS account ID, ensure the region is correct (it looks greyed out here but actually worked fine), and then Save and Enable.

We have some bad news here: At time of writing the AWS provider for Terraform has no resource for EventBridge yet (Boo!) so we’ll have to do the next step in the AWS console (there are plans to include EventBridge in the AWS provider soon-ish, so might be worth checking the registry first if you’re from the future). 

Go into the Event Source in the ‘Partner Event Sources’ section, and choose to ‘Associate with Event Bus’. You’ll have options to grant access to the Event Bus to other sources, either by listing individual AWS accounts, or by selecting an Organization as a whole.

Now that we have the integration between EventBridge and SignalFX setup, everything else is relatively straightforward.

There’s one ‘gotcha’ to consider here: I’d recommend starting by creating your Lambda Function with just a simple logging setup to start with, so we can see how the message from SignalFX will be formatted once it has passed through EventBridge. Initially, I wrote my lambda based on how the JSON was formatted in Postman, which led to much weeping and gnashing of teeth when it didn’t work like I wanted it to.

Having set all this up and tested it I was delighted to find that from the alert being raised in SignalFX to the Lambda resolving it via an API call now took less than a minute. This was good progress, but there were more improvements to be made to my pH 14 basic Boto Python script. 

Well first of all, SignalFX have put together a wonderful Lambda Wrapper for Python, allowing you to record better metrics on your Lambda, which I used to configure an alert for if the Lambda should ever misfire.

Another contender for that was AWS Lambda Destinations, which could be configured to go to an SNS topic for SignalFX in the event of an error in the Lambda.

That pretty much wraps it up. Stay tuned for more adventures in AWS and SignalFX in future.

SH.SAE S01.35.44.32

0 Comments

Leave a reply

Your email address will not be published.

SH.SAE S02.88.25.53