The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
Thanks to BigQuery, running complex correlations over billions or trillions of attributes is surprisingly easy. But don’t forget if your data does have some spatial component, a quick mapping visual can add some great context to your results.
Josh Livni works with maps at Google, where he helps developers tell compelling stories using the Google Maps APIs. As you read this, he's probably writing code, thinking about snowboarding, or both.