Rendering 2 million data points on an interactive map

I put together a short demo of loading and displaying 2 million waypoints (GPS locations) on an interactive map, to visualize all the places I have been riding my bike. All data processing and rendering is implemented in Racket:

The visualization scales up to at least 4.36 million points (which is the most I have to test with, if I include running, sailing and other activities), but it should be able to handle much larger number of waypoints.

Enjoy,
Alex.

14 Likes

That's really cool! Consider sharing on other (even non-Racket) forums, like:

Lobsters: https://lobste.rs/
Reddit: https://www.reddit.com/r/Racket/ (and maybe r/Lisp and r/programming)
Hacker News: https://news.ycombinator.com/
Racket Stories: racket-stories.com
Twitter

2 Likes

It has been shared on Reddit and Twitter here

Bw
Stephen

3 Likes

Wow. That's impressive, Alex. Great job.

@alexh For Hacker News, I recomend that you post it, becuase usualy people is more friendy if the author is the submitter and answer the questions in the comments. If you don't have an account I can post it gladly.

It's difficult to guess what will get more traction, but sometimes a blog post about a detail gets more tractions. Not too deep, not too shallow, a few relevant photo/graphics. You can post the video now and later post a discussion of one technical detail.

Some ideas:

  • How do you choose the color of the trails? I remember that when I had to draw some fake heigh maps using greyscale, I had to tweeak a lot the mapping, for example I had to apply a power before transforming the number to grey.

  • How difficult was to integrate the map? OpenStreetMap? Are you assuming thet the Earth is flat or you need something interesting to transform the coordinates into dots? (Is this more important for plane trips than bike trips?)

  • Do you draw each dot or you group nearby dots? What about intersections? Something weird when the municipality moved a street? GPS glitches?

And obviously include the video at the bottom!

1 Like

I don't have a Hacker News account, so I cannot post it there myself. In the past, others have posted my previous blog posts on HN and they didn't get much response, so I don't think this one would be any different.

In any case, I already wrote a blog post about this: Heat Maps Revisited, and a previous one explains the UI changes in a bit more detail: Interactive Heat Maps.

Not sure if you are actually interested in the questions you asked or just showed them as examples, but I will try to answer them briefly:

This was a huge time sink, as I experimented with a lot of options and I'm still not completely happy with the result. Currently, the color scheme uses a magenta base with linear luminance changes (it is magenta, because this color does not appear to be used on the map itself). The colors are allocated according to the number of data points in each dot.

I already had a package for this, written several years ago, map-widget

Something more interesting: Home · alex-hhh/geoid Wiki · GitHub, which is a Racket implementation of this idea: https://s2geometry.io/. This library is the difference between being able to show interactively ~200k points in the first version and being able to draw 4.36 million in the current version (which is all the GPS points I have in my database, including running , cycling, etc)

Dots represent a small area (small relative to the zoom level of the map) around them and all GPS points in that area are grouped inside the dot. Color of the dot changes with the number of points inside it.

Each point is drawn independently of all the others, so there are no intersections, unless you mean something else?

The visualization shows the data that was collected by the user. To give an example, if you ride your bike on a road in 2020 and the council tears down the road in 2021 to build a park, the fact that you rode the bike there in 2020 does not change and will be shown on the map, which might show a park there.

As for "GPS glitches", they are shown too, as the application displays the data that the user has. The glitches could be corrected, but I didn't have too many problems in my own data, so I didn't implement such a correction. Interestingly, elevation data is more prone to errors and I do have a correction algorithm for that.

Alex.

1 Like

Both. I'm curious and also I think they may be good starting points for a more detailed post. In particular showing how the naive scales are bad and your scale is better looks like an interesting topic that is easy to explain and easy to show. (Nevertheless, writing it and picking the right image to show each problem and solution will take a lot of time.)

More questions: Are you saturating at 100% or at the 95% percentile? Is the lowes level 0 or there is a minimal ammount of color for tracks used once?)

I was imagining drawing small segments, and the intersection of the segment could be problematic.

More questions: If you are counting the number of events in a pixel, do you distingish a track that was used matny different days and a track driven at a low speed? What about a stop to eat in the middle of a journey?

The saturation of the colors does not change (it is about 80%), it is the luminance that changes from 12.5% to 75%. These values were found through experimentation, and they don't represent the best values, but simply the point where I declared that it looks good enough and stopped experimenting.

Another interesting thing is that the mapping rank to color uses histogram normalization, meaning that the rank of each point is the sum of the rank of the point itself and that of all the points of lesser rank. This produces better contrast of the routes travelled more frequently.

The code does not distinguish between moving speed and stationary subjects. These are valid concerns, but in practice it does not seem to make a difference for my data set.

Sampling rate for the GPS device is usually 1 second and most of them have auto-stop, so they don't record of the owner is not moving. These assumptions would not be correct for everyone, and there are ways to correct for problems of this kind, but I didn't implement them.

If you are interested, the blog post below deals with the same problem at a much higher scale and they do deal with these problems (I got the idea of histogram normalization from there). One of the differences between Strava Heatmap and mine is that they run this in batch mode every few weeks and everybody looks at the same heat map. My code builds it in real time and allows the user to select the routes they want to see (e.g. "All cycling routes" or just "Build a heat map of all the places I raced with my Azzuri bike").


I think this post is now a bit off topic for the Racket group, so perhaps we should leave it here. This is a hobby project, mostly for visualizing my own data, so the implementation options were not thoroughly explored and the code might not be as robust to be able to handle all the corner cases that would show up in a wider use base -- I only fix problems that I encounter in my own data.

Alex.

1 Like

One of the common questions is if Racket is only useful to precessing lists or if it can handle also real world data. Your project is peak real word data handling, so it's a nice example to know. Nice work!

4 Likes