These crawls are part of an effort to archive pages as they are created and archive the pages that they refer to. That way, as the pages that are referenced are changed or taken from the web, a link to the version that was live when the page was written will be preserved.
Then the Internet Archive hopes that references to these archived pages will be put in place of a link that would be otherwise be broken, or a companion link to allow people to see what was originally intended by a page's authors.
Thanks to BigQuery, running complex correlations over billions or trillions of attributes is surprisingly easy. But don’t forget if your data does have some spatial component, a quick mapping visual can add some great context to your results.
Josh Livni works with maps at Google, where he helps developers tell compelling stories using the Google Maps APIs. As you read this, he's probably writing code, thinking about snowboarding, or both.