Knowledge representation and species interactions



The food web

One of the most fascinating aspect of ecology is the complex web of interactions between individuals (or populations, or species), and how it shapes the structure of communities. But what is the best way to represent these interactions?

The classical representation is a directed network. Here's one from the mangal database [2]. We can have predator-prey interactions, parasitism, mutualism, competition, etc etc...

However, it's a somewhat limited representation. A good way to evaluate a knowledge representation is to look at the possible queries. What questions can be answered? What questions cannot be answered? How expressive can the queries be? How expressive are the answers?

Despite their widespread use, these interactions networks can only answer one question: is there a binary interaction between X and Y? The binary part is important, because these networks cannot deal with indirect relationships such as: X eats Y except when W is there.

It doesn't scale well either: two species might only interact in a given region, a problem solved with distinct networks, but we would need distinct networks for every region with distinct interactions and a lot of information will be repeated.

Going stochastic

Instead of saying X eats Y, we could say X eats Y with probability Z. Adopting a probabilistic perspective does solve quite a few issues. We can now handle uncertainty and to some extent spatio-temporal variations. That said, we still cannot handle indirect relationships, and it doesn't scale that well.

For example, if species A and B compete on the west coast with probability 0.9 but only with probability 0.1 on the east coast, we would lose important information by having a network say they compete with probability 0.5. Dividing the network in two would improve our model's accuracy, but we are again getting in the messy network-of-networks business.

My supervisors, Timothée Poisot and Dominique Gravel, recently proposed a metaweb concept that includes, among many other ideas, probabilistic links, but in their case it's more than just adding probabilities to links [3].

A unified path to unification

To criticize an approach to ecology for missing some features is too easy. All representations have limitations, and any approach to science relies on some simplification of reality. However, in this case we can have our cake and eat it too by looking at what is going on in the field of knowledge representation (KR), which has been defined as [0]:

[...] the scientific domain concerned with the study of computational models able to explicitly represent knowledge by symbols and to process these symbols in order to produce new ones representing other pieces of knowledge. Systems built upon such computational models are called knowledge-based systems. Their main components are a knowledge base and a reasoning engine.

To illustrate, let's look at two lizards: P. cinereus and P. hoffmani [1]. The two species essentially hunt the same preys but, when they are found in the same region, character displacement pushes them to specialize on different preys. This fact cannot be easily express with standard knowledge representations for food webs, but we can with a bit of logic:

\[presence(\mbox{cinereus}, r) \land presence(\mbox{hoffmani}, r) \Rightarrow eat(\mbox{cinereus}, A, r).\]

The formula reads: the presence of cinereus in region \(r\) and the presence of hoffmani in \(r\) implies that cinereus eats prey A in \(r\). Formulas like this be seen as templates: presence(cinereus) will be either true or false at any given place, but a probability is assigned to the entire formula so it can be revised with new evidence [4]. This is an important point, it means we can use the standard food webs as evidence to build these logical formulas, but we still get a probabilistic model capable of handling uncertainty.

The formula is readable, can be used with modern inference tools, and is very flexible, but there's better. What if the presence of cinereus in \(r\) could be predicted with a few conditions, we could have something like this:

\[northAmerica(r) \land (presence(A, r) \lor presence(B, r)) \Rightarrow presence(\mbox{cinereus}, r).\]

If the region \(r\) is in north America and if either species A or B are present, then cinereus will be present. Again, this equation would be assigned a probability. This is a simplistic model of presence/absence, but the point is that this knowledge representation allows different formula to be combined. Various evidence can be used together instead of having a wall between ecological facts, and such database of probabilistic logical formula supports sophisticated queries.

There are many discussions about having integrative frameworks and theories in ecology. This is especially true for interaction networks, which are often present in one form or another in ecology. Ironically, these discussions of integration are often ignoring modern research in other fields. Knowledge representation offers powerful tools with mature software that are worth checking, especially if we want to understand how different ideas and evidence can be combined.


[0] M Chein and M-L Mugnier. Graph-based Knowledge Representation. Springer, 2009.

[1] JB Losos. Ecological character displacement and the study of adaptation. PNAS, 2000.

[2] T Poisot, B Baiser, JA Dunne, S Kéfi, F Massol, N Mouquet, TN Romanuk, DB Stouffer, SA Wood, and D Gravel. Mangal - making complex ecological network analysis simpler. bioRxiV, 2014.

[3] T Poisot, DB Stouffer, and D Gravel. Beyond species: why ecological interactions vary through space and time. bioRxiV, 2014.

[4] M Richardson and P Domingos. Markov logic networks. Machine Learning 62 (1-2): 107–136, 2006.

let world = "世界" in print $ "Hello " ++ world ++ "!"