As a complement to the previous post on “why do mathematical modeling”, I thought it would be fun to compile a list of all the reasons why one might conduct an experiment. But I am lazy* (though not as lazy as this man), and so rather than compiling my own list I’ll share the list from Wootton and Pfister 1998 (in Resetarits and Bernardo’s nice Experimental Ecology book).
To see what happens. At it’s simplest, an experiment is a way of answering questions of the form “What would happen if…?” Such experiments often are conducted simply out of curiosity. This sort of experiment teaches you something about how the system works that you couldn’t have learned through observation, it gives you a starting point for further investigation (e.g., you can develop a model and/or do follow-up experiments to explain what happened), and it can be of direct applied relevance (e.g., if you want to know what effect trampling has on a grassland you’re trying to conserve, go out and trample on randomly-selected bits of it).
There are limitations to such experiments, of course. Because they’re conducted without any hypothesis in mind, they’re typically difficult or impossible to interpret in light of existing hypotheses. And on their own, they don’t provide a good foundation for generalization (e.g., would the experiment come out the same way if you repeated it under different conditions, or in a different system?)
Interestingly, Wootton and Pfister suggest that experiments conducted just to see what happens are most usefully conducted in tractable model systems about which we already know a fair bit (analogous to developmental biologists focusing their experiments on C. elegans and a few other model species). They worry that curiosity-driven experiments, conducted haphazardly across numerous systems, leave us with not only with a very incomplete understanding of any given system, but with no basis for cross-system comparative work. This illustrates how the decision as to what kind of experiment to conduct often is best made in the context of a larger research program, an issue to which I’ll return at the end of the post.
As a means of measurement. These experiments are conducted to measure the quantitative relationship between two variables. Feeding trials to measure the shape of a consumer’s functional response are a common example: you provide individual predators with different densities of prey, and then plot predator feeding rate as a function of prey density. These experiments are a good way of isolating the relationship between two variables. For instance, in nature a predator’s feeding rate will depend on lots of things besides prey density, including some things that are likely confounded with prey density, making it difficult or impossible to use observational data to reliably estimate the true shape of the predator’s functional response. Or, maybe prey density just doesn’t vary that much in nature, so in order to measure how predator feeding rate would vary if prey density were to vary (which of course it might in future), you need to experimentally create variation in prey density. This is an example of a general principle: in order to learn how natural systems work, we’re often forced to create unnatural conditions (i.e. conditions that don’t currently exist, and may never exist or have existed).
Of course, the challenge with these experiments is to make sure that the controls needed to isolate the relationship of interest don’t also distort the relationship of interest. For instance, feeding trials conducted in small arenas are infamous for overestimating predator feeding rates because prey have nowhere to hide, and because prey and predators behave differently in small arenas than they do in nature.
To test theoretical predictions. Probably the most common sort of experiment reported in leading ecology journals. Again, often most usefully performed in tractable model systems**.
But as Wootton and Pfister point out, these kinds of experiments, at least as commonly conducted and interpreted by ecologists, have serious limitations that aren’t widely recognized. For instance, testing the predictions of only a single ecological model, while ignoring the predictions of alternative models, prevents you from inferring much about the truth of your chosen model. If model 1 predicts that experiment A will produce outcome X, and you conduct experiment A and find outcome X, you can’t treat that as evidence for model 1 if alternative models 2, 3, and 4 also predict the same outcome. It’s for this reason that Platt (1964) developed his famous argument for “strong inference“, with its emphasis on lining up alternative hypotheses and conducting “crucial experiments” that distinguish between those hypotheses.
There’s another limitation of experiments conducted to test theoretical predictions, which Wootton and Pfister don’t recognize, but which is well-illustrated by one of their own examples. Wootton and Pfister’s first example of an experiment testing a theoretical prediction is the experiment of Sousa (1979) testing the intermediate disturbance hypothesis (IDH). Which, as readers of this blog know, is a really, really unfortunate example. Experiments to test predictions are only as good as the predictions they purport to test. So if those predictions derive from a logically-flawed model that doesn’t actually predict what you think it predicts (as is the case for several prominent versions of the IDH), then there’s no way to infer anything about the model from the experiment. The experiment is shooting at the wrong target. Or, if the prediction actually “derives” from a vague or incompletely specified model, then the experiment isn’t really shooting at a single target at all–it’s shooting at some vaguely- or incompletely-specified family of targets (alternative models), and so allows only weak or vague inferences about those targets (this is what I think was going on in the case of Sousa 1979).
One way to avoid such ill-aimed experiments is for experimenters to rely more on mathematical models and less on verbal models for hypothesis generation. But another way to avoid such ill-aimed experiments is to quit focusing so much on testing predictions and instead conduct an experiment…
To test theoretical assumptions. It is quite commonly the case in ecology that different alternative models will make many similar predictions. For instance, models with and without selection (non-neutral and neutral models) infamously make the same predictions about many features of ecological and evolutionary systems. This makes it difficult to distinguish models by testing their predictions. So why not test their assumptions instead, thereby revealing which alternative model makes the right prediction for the right reasons, and which alternative is merely getting lucky and making the right prediction for the wrong reasons? For instance, I’ve used time series analysis techniques to estimate the strength of selection in algal communities (Fox et al. 2010), thereby directly testing whether algal communities are neutral or not (they’re not). In this context, this is a much more direct and powerful approach than trying to distinguish neutral and non-neutral models by testing their predictions (e.g., Walker and Cyr 2007 Oikos) (UPDATEx2: The example of Fox et al. 2010 isn’t the greatest example here, because while it is an assumption-testing study, it’s not actually an experiment. Probably should’ve stuck with Wootton and Pfister’s first example of testing evolution by natural selection by conducting experiments to test for heritable variation in fitness-affecting traits, which are the conditions or assumptions required for evolution by natural selection to occur. And as pointed out in the comments, the Walker and Cyr example isn’t great either because they actually were able to reject the neutral model for many of the species-abundance distributions they checked, in contrast to many similar studies).
A virtue of focusing on assumptions as opposed to predictions is that it forces you to pay attention to model assumptions and their logical link to model predictions, rather than treating models as black boxes that just spit out testable predictions. Because heck, if all you want is predictions, without caring about where they come from, you might as well get them here.
Another virtue of tests of assumptions, especially when coupled with tests of predictions, is learning which assumptions are responsible for any predictive failures of the model(s) being tested. This is really useful to know, because it sets up a powerful iterative process of modifying your model(s) appropriately, and then testing the assumptions and predictions of the modified models.
Another reason to test assumptions rather than predictions is that it might be easier to do. Of course, in some situations it could be easier to test predictions than assumptions. And in any case, you all know what I think of doing science by simply pursuing the path of least resistance (not much).
Of course, testing assumptions has its own limitations. Since theoretical assumptions are rarely if ever perfectly met, we’re typically interested in whether the a model’s predictions are robust to violations of its assumptions. Does the model “capture the essence”, the important factors that drive system behavior, and over what range of circumstances does it do so? So you run into the issue of how big a violation of model assumptions is worth worrying about. I don’t have any great insight to offer on how to deal with this; sometimes it’s a judgment call. Sometimes one person’s “capturing the essence” is another person’s “wrong”. For what it’s worth, we make similar judgment calls in other contexts (e.g., how big a violation of statistical assumptions of normality and homoscedasticity is worth worrying about?)
Wootton and Pfister conclude their chapter by discussing how to choose what kind of experiment to conduct. For instance, if you’re studying a system about which not much is known (and assuming you have a good reason for doing that!), you may have no choice but to conduct a “see what happens experiment” (“kick the system and see who yells”, as my undergrad adviser David Smith put it). You might want different experiments depending on whether you’re seeking a general, cross-system understanding of some particular phenomenon, vs. intensively studying a particular system. Or different experiments depending on whether you’re setting out to test mathematical theory, or to identify the likely consequences of, say, a dam or some other human intervention in the environment. Problems arise when you don’t think this through. For instance, conducting an experiment just to see what happens, and then retroactively trying to treat it as a test of some theoretical model, hardly ever works (but it’s often tempting, which is why people keep doing it).
So what do you think? Is this a complete list?
*Or, depending on your point of view, “resourceful”.
**Someday, I need to do a post on what makes for a good “model system”.