Posted by: Jeremy Fox | March 26, 2012

Cool new statistical method not so cool after all?

A while back I posted on a cool new nonparametric method, which goes by the acronym “MINE”, for detecting associations between variables in multivariate datasets. The method can detect even nonlinear (and non-monotonic!) relationships between pairs of variables, and it provides a measure of the strength of the relationship analogous to the familiar R^2.

Turns out that this approach has some drawbacks, though, perhaps quite serious. Andrew Gelman’s blog has a good summary of recent commentary. Not surprisingly for such a flexible nonparametric method, it seems to lack power. But there may be other issues as well, to do with things like the scope and rigor of the proofs of the method’s statistical properties. I’m not qualified to pass judgment on how serious these issues are. But if you’re thinking of using this method, you should definitely click through and check out the commentary.

UPDATE: The MINE authors themselves show up in the comments, briefly addressing the issues I’ve raised and noting that they’ve posted a detailed reply to the comments they’ve received over on Andrew Gelman’s blog. Great to see authors and their readers engaging in such a productive and substantial discussion. So if you’re interested in the MINE method, and alternative approaches, you really ought to click through to Andrew Gelman’s blog.

Advertisements

Responses

  1. This has all made me wonder much more about the Distance Correlation method – although I have not yet seen it used by ecologists. It seems to offer a simple and powerful way to test for association – whatever the function form – and, heck, it’s already implemented in R! Although, then again, it has some power requirements, but…reading the linked posts and commentary, it seems like it would be more robust to the kinds of noisy data that we often see in ecological settings.

  2. Oh that I had more time to read stuff. I recently devised a method for estimating the probability that a relationship exists between two variables, without the need to fit a curve to them. It’s based on a Monte Carlo evaluation of 2nd order differences (i.e. differences of differences) of successive values of the response variable. I have no idea if this is similar to what they’ve done or not. Seems not at first glance.

  3. Hi all,

    We’re happy to see so much discussion of MIC and MINE. We wanted to point out that we had posted a response to the concerns summarized in Andrew Gelman’s blog when they were originally posted online. We have re-posted these on Gelman’s blog as a comment, so you can go there to read the discussion.

    We do want to clarify here, since it came up in this post, the issue of the scope and rigor of our proofs. First, to the best of our understanding, no one has challenged the mathematical rigor of our proofs. As far as their scope: it was indeed mentioned that the results we proved in our paper were proven about MIC and not about the approximation algorithm for it that we use in practice. However, almost all of our results actually hold without modification for the approximation algorithm as well, a fact that we perhaps should have made clear in our manuscripts. (See our response for more details.)

    • Very much appreciate your comments on this, it’s great to see authors really engaging with the questions, feedback, and critiques they’ve received. I’ll update the post to further encourage our readers to click through and read the entire discussion going on over at Andrew’s blog.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: