Monday, January 29, 2018
Beyond simply including more data, a total rewrite of the heatmap code permitted major improvements in rendering quality. Highlights include twice the resolution, rasterizing activity data as paths instead of as points, and an improved normalization technique that ensures a richer and more beautiful visualization.
The heatmap is now available on Strava Labs and the Strava Route Builder. The rest of this post is a technical deep dive on the details of this update.
The heatmap generation code has been fully rewritten using Apache Sparkand Scala. The new code leverages new infrastructure enabling bulk activity stream access and is parallelized at every step from input to output. With these changes, we have fully conquered all scaling challenges. The full global heatmap was built across several hundred machines in just a few hours, with a total compute cost of only a few hundred dollars. Going forward, these improvements will enable updating the heatmap on a regular basis.
The remaining sections in this post describe in some detail how each step of Spark job for building the heatmap works, and provides details on the specific rendering improvements that we have made.