Posted by: Jeremy Fox | December 12, 2011

Darwin in space, or spurious correlation exemplified

Google Trends allows you to look at changes over time in the popularity of different search terms, and to find search terms whose popularity is correlated. Which is a great lesson in spurious correlation. For instance, the search term most correlated with “darwin” is…”satellites”! Graph here (r=0.89). Other terms whose popularity tightly correlates with that of “darwin” include “african history”, “composers”, and “pulleys”.

The reason for the spurious correlations is obvious–so many people search Google so often on so many terms that any term is bound to be tightly correlated with some other unrelated terms. But even if you know that, it’s really hard to look at a graph like the one linked to above and not see it as suggesting some sort of causal connection. I could totally see Google Trends being a great teaching tool for undergrad stats classes.

UPDATE: Turns out that searches on “niche” are tightly correlated with (among other things) searches on “important”! Clearly not a spurious correlation! ;-) Searches on “niche” also have strong seasonality, peaking in April and October, and plummeting around Christmas and in June and July. I guess that’s because most searches on “niche” come from undergraduate ecology students who’ve just been taught the concept in their classes.

I really need to stop playing with Google Trends and get to work…

About these ads

Responses

  1. Here’s another recent lesson in spurious correlation:

    http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html#

    • Yes, I’ve seen this. Discussion of it on another site included a link to Google Trends, which in turn inspired me to write this post.

  2. OK I admit, this is the best demonstration of test repetition and spurious correlation that I have ever seen. A great teaching tool indeed.

  3. This is not so spurious giben that Darwin was the name of a project by the European Space Agency. It would have involved many small satellites which together for an interferometer with which to scan the sky for Earth-like planets. Hence they’d have searched for origins of life in space.

    I wonder how google does the correlation. If it means that specific searchers have searched for both “Darwin” and “satellites” at the same time, then it wouold not be spurious but reflect something about the terms that searchers combine. If, however, one group of people searched for Darwin and another group for satellites, and the correlation is just a temporal coincidence, then it would be spurious of course.

    • It’s weekly data on different searches–i.e. each week, the number of searches on “darwin” is highly correlated with the number of (separate) searches on “satellite”).

      How old is the European Space Agency Darwin project? The data in the plot I linked to go back years. And even if that’s the explanation for the Darwin-satellite correlation, I don’t see how there could be a similar explanation for a Darwin-pulleys correlation.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 4,085 other followers

%d bloggers like this: