Sperry Rail

For 85 years, Sperry has been the world leader in Rail Health® solutions, helping railroads achieve continuous safety and performance improvements. Their integrated full-coverage solutions are designed to detect more defects in less time, so customers can resume operations quickly. Inspired by the passion of their founder, Dr. Elmer Sperry, the inventor of the first non-destructive method for testing rail, Sperry is committed to delivering fit-for-purpose Rail Health® solutions that make railway travel safer and more reliable for everyone.

The Brief

A combined Well-Architected and engineering review was carried out with the primary aim of mapping out the Sperry estate and confirming that it adheres to best practices.

Given that the Sperry engineers wanted to be removed from monitoring, fixing, and getting distracted by the day-to-day activities of operating a large-scale machine learning workload, Steamhaus identified the implementation of CI/CD pipelines and defined the infrastructure as code as actions to benefit Sperry in the long term.

In conjunction with this, we also identified a rebuild of the infrastructure to incorporate these recommendations as well as make use of new technologies such as Serverless and Fargate for stability of the platform and cost optimisation.

The Solution

We rebuilt the Elmer platform using a combination of services, including Lambda and ECS Fargate. AWS Step Functions had been used in the original design, however, was extended on the new platform to coordinate these services into a robust serverless workflow. 

The Step Functions state machines are triggered via S3 event notifications. We used a number of state types, including map states, which allow us to run a number of Fargate tasks in parallel, each with different inputs.

AWS CodePipeline and CodeBuild are used for CI/CD. The pipelines are managed in a central AWS account and deployments are carried out in a cross-account manner.

You can see Sperry’s infrastructure diagram below:

The Results

The main benefit of the new design is that the platform is now entirely serverless. As well as reduced operational overheads, the platform is also more cost-effective; when there are no files to process, the platform is idle. The platform was built in line with well-architected principles, which included reviewing the workload and right-sizing services such as RDS, and Fargate to further optimise costs.

The platform is also more efficient; when the workflow is triggered via S3, capacity is available immediately. This has increased the throughput of the application, meaning that Sperry Rail can process more jobs concurrently than they could previously. Steamhaus closely monitors AWS service limits to ensure availability of the platform.

The platform, along with a managed service by Steamhaus, have reduced the time Sperry engineers were spending on systems management and deployment by 15%, freeing them to focus on adding new features.