The other day I noticed in my own weird way that Darwinian evolution, distinguished by natural selection acting on random mutation, has interesting parallels to Object Oriented Programming (OOP). I’ve been thinking about this problem lately because by day, I write code to build data systems and in my free time, I’m writing a book that touches on Christian apologetics. Combining programming and evolution, we see information sharing occurring. The parallels are striking and worth exploring, so I wrote this article. In some ways, this isn’t new territory. Software engineering and information theory are often employed analysis tools in evolutionary biology. Using them, we can learn something about evolution and its competing hypotheses. Though it is the current dogma to unequivocally accept Darwinian evolution, and more importantly universal common descent, I prefer to take the actual Popperian route and submit reality to questions through our theories, in hopes that we can get closer to the truth. Not only will the analogy we explore here help us reason about the shared information in genetics, but also some surprisingly similar scientific evidence that we discuss later on.
Object Oriented Programming
OOP is a popular programming paradigm centered around software objects and their associated behavior. OOP is used widely in industry to help tame the inherent complexity in software engineering. By bundling data and methods (read: functions) acting on that data in a shared framework, OOP ideally lowers the mental effort needed to understand, maintain, and extend a software system. Emphasis on the ideally.
One technique that all programming languages that support OOP give you is inheritance. One object, called a class, can inherit virtually any data or method from another class. This allows for levels of abstraction in the program architecture which in turn allow for code reuse more easily and for changes to be made to a system without having to change everything else. In another word, the system can adapt. The popular example is building a software system to model a species of Animal, apt that we’re talking about shared genetic information. This child class would inherit from a higher level parent class, which may in turn have inherited from another. There is usually no limit to how many levels there can be. In practice, it often balloons out of control. Below is an example of this hierarchy in python.
class Animal:
def __init__(self, name, age):
self.name = name
self.age = age
def describe(self):
print(f"{self.name} is {self.age} years old.")
def eat(self):
print("Eating food...")
def digest_food(self):
print(f"{self.name} is digesting their food.")
class Dog(Animal):
def __init__(self, name, age, breed):
super().__init__(name, age)
self.breed = breed
def fetch(self):
print(f"{self.name} is fetching the ball!")
my_dog = Dog("Buddy", 3, "Labrador")
my_dog.describe()
my_dog.make_sound()
my_dog.eat()
my_dog.fetch()
This approach is analogous to sexual reproduction in living organisms. The genetic information in the parent is passed on to the child. In fact, any class that inherits from another in OOP is called a child class.
In nature, we see that different organisms share sections of their DNA. The famous example is the similarity of ape and human DNA. While this has recently been shown to be an over-exaggeration based on recent findings, there still remains some overlap. The point where this becomes contentious is the explanation put on what we observe. What caused this shared DNA? For the Darwinian, it is because of universal common ancestry. For those who question that worldview, it is something else.
How can OOP help us choose between these two explanations? I think that looking at what works and what doesn’t in practice helps illuminate what would and wouldn’t work in nature, given that the two processes share many features. You may think at first glance that there is only one way to use inheritance in OOP, but that is not the case. Let me outline the two major approaches
Inheritance chains
The first way is to create deep chains of abstraction using direct inheritance. This approach is show graphically below. This approach is characterized by the phrase “Is a”. Every child IS A descendant of its parent class.
A -> B -> C -> D -> so on
In order to reach the intended software behavior, classes are continuously modified through a direct chain of inherited data and behavior. This creates a highly specialized code that can accomplish the task. However the code is tightly coupled and the programmer will soon find themselves neck deep when they want to make a modification based on the requirements of the project. Inheritance is analogous to reproduction and the fitness landscape of natural selection can be seen as the project requirements.
Note that this way of using inheritance is the direct implementation of the Darwinian paradigm in software. Long chains of information getting passed down and modified slightly as needed based on requirements. It is also the least preferred way to do OOP in practice. Using this approach will create monolithic, inflexible systems. You will find tons of horror stories reading and watching the forums and videos online who discuss OOP. When people criticize the paradigm as being a bad way to develop software, they are usually referring to the above approach. But it’s not the only one.
Interfaces
The other way to do OOP is the use of interfaces. Instead of deep lines of inheritance, you basically outline a contract of behavior that does not directly define what the object does but lays out the high level logic and constraints. A child class will then inherit the interface and “implements” that interface by overriding the abstract methods in the interface and providing a concrete implementation. This is characterized as a “Has a” approach. The key is to compose behavior instead of dictating strict hiearachies of inheritance directly, which is why this method is often referred to as Composition. The following code demonstrates this using python Abstract Base Classes.
from abc import ABC, abstractmethod
class AbstractEngine(ABC):
@abstractmethod
def start(self):
pass
@abstractmethod
def stop(self):
pass
class Engine(AbstractEngine):
def __init__(self, horsepower):
self.horsepower = horsepower
def start(self):
print(f"Engine with {self.horsepower} HP is starting.")
def stop(self):
print(f"Engine with {self.horsepower} HP is stopping.")
class Car:
def __init__(self, make, model, engine_horsepower):
self.make = make
self.model = model
self.engine = Engine(engine_horsepower)
def drive(self):
self.engine.start()
print(f"{self.make} {self.model} is driving.")
def park(self):
print(f"{self.make} {self.model} is parking.")
self.engine.stop()
my_car = Car("Toyota", "Camry", 200)
my_car.drive()
my_car.park()
The AbstractEngine class defines what an engine generally does at a high level, but does not supply the details. The Engine class inherits from AbstractEngine, but must implement the methods it defines, meaning it must supply the actual behavior within the constraints of the Abstract class. We know that engines will always start and stop, however for our Car we may not know what kind of engine we want yet, and we may put a different engine in it one day if we are absolutely made of money. The main point is that interfaces allow more flexible code reuse, allowing adaptability to changing conditions.
The interface method is preferable in dynamic environments, which is almost every software context, except perhaps some very old legacy systems running mainframes with extremely limited scope. Dynamic certainly describes the fitness landscapes of Darwinian evolution. Interfaces are simply more successful in these dynamic areas than the inheritance eapproach that more closely matches evolution.
This raises the question about the ubiquity of life on earth. That life is successful and widespread on our planet is not disputed. And yet we are told that a sub-optimal algorithm led to this diversity and adaptability, when another one more closely follows the story and possibly explains it better. We see how processes that mirror the evolution hypothesis are not as successful at adaptation as the process that mirrors those of interfaces. And adaptation is the mechanism that allows species to avoid exctinction.
The evolutionists reply here is likely that they never claimed that the Darwinian model was optimized, only that given enough time it could lead to life that can adapt to the requirements of its environment. On the face of it, that is fair. But we should go further and explore what the scientific literature says about this topic.
Dependency Graphs
The previous exposition is a strong analogy, but we would need more evidence to conclude that the Composition/Dependency Graph hypothesis is true. Just because it is a superior method and aligns with what we would expect given life’s success on earth, we still need to evaluate the evidence more directly. Turns out we have peer reviewed research that does just that. The OOP analogy is very similar to the hypothesis that Winston Ewert proposes in the following paper “The Dependency Graph of Life”.
Ewert explores the previous discussion in the context of dependency graphs vs trees. For our purposes, dependency graphs are interfaces which flatten the hierarchy of information flow and allow for efficient reuse of data and methods. Trees on the other hand represent the commonly accepted inheritance and mutation model of life.
This paper generally uses a Bayesian model comparison approach to evaluate the following expression
That is, given the chosen model, what is the probability that it generated the data? To compare two models, a Bayes factor is calculated as such
Logarithms of base 2 are monotonic, so a higher Bayes factor corresponds to a better model case for model 2 than model 1, where negative values would indicate that model 1 is a better fit. In this comparison framework, three different models were created as follows
Null Model: there is no pattern to the reuse of gene families
Tree Model: any genes shared between two species derive from another species ancestral to the extant species.
Dependency Graph: each module may have multiple modules it depends on. Every gene family is introduced in a single module and inherited by all modules that depend on that module.
The paper then goes on to compare the models across each other
- Null vs Tree
- Null vs Dependency Graph
- Tree vs Dependency Graph
The paper compiles genetic data from a variety of databases such as Uniref, Pfam, and several others. This data was used as input to the program EvolSimulator and later post-processed by OrthoFinder. The point of this is to produce gene families that were known to have been produced by common descent. And just for fun, the paper includes Javascript code from single page applications produced by a compiler, for comparison.
Results
After computing the Bayes Factors for both the Javascript control and the EvolSim outputs, the Tree hypothesis beats the Null and Dependency Graph, except in the JS case. This is basically a sanity check to show that the comparison process was able to discern accurately between gene families known to have been produced by common descent, and those that are only postulated to be. In every EvolSim case, we get the results we expected. Sanity check complete and on to the actual data in question.
Table 3: The log Bayes factors for the models in the synthetic datasets
Dataset | Tree vs Null | Graph vs Null | Graph vs Tree |
---|---|---|---|
JavaScript | 9,791 | 15,078 | 4,749 |
EvolSim 1 | 83,301 | 83,023 | -886 |
EvolSim 2 | 45,939 | 45,005 | -1,582 |
EvolSim 3 | 29,851 | 28,485 | -1,598 |
EvolSim 4 | 27,982 | 26,554 | -1,793 |
EvolSim 5 | 36,120 | 35,553 | -874 |
Finally, the direct gene family data is put through the comparison process and the Dependency Graph hypothesis wins handily over the other two explanations. Ewert cites that the literature has landed on 6.6 bits and above as decisive evidence for one model over another. In our case, we are in another galaxy when we compare Graph vs Tree. Tree is still better than the Null hypothesis of no connection, however it is inferior to the Dependency Graph hypothesis across all genetic datasets.
Table 4: The log Bayes factors for combinations of models and datasets
Dataset | Tree vs Null | Graph vs Null | Graph vs Tree |
---|---|---|---|
UniRef-50 | 6,193,801 | 6,308,988 | 111,823 |
OrthoDB | 9,214,606 | 9,730,055 | 515,450 |
Ensembl | 875,350 | 962,274 | 86,924 |
TreeFam | 1,362,985 | 1,403,952 | 40,967 |
Hogenom | 884,815 | 1,022,243 | 137,428 |
EggNOG | 1,497,174 | 1,579,650 | 82,476 |
Pfam | 1,173,599 | 1,251,841 | 78,244 |
OMA | 3,265,608 | 3,451,745 | 184,777 |
HomoloGene | 106,010 | 116,080 | 10,064 |
There you have it folks, scientific evidence that doesn’t support the Darwinian model that we are constantly told explains everything. I highly encourage you to read the paper for yourself. It is approachable like a good wine and doesn’t need advanced math to understand. In fact, we should all be able to ask questions and submit theories to testing to sift through the good and the bad.
That is, after all, what science is all about.