Publications
2026
Hybridized Evolution Accelerating Thermostability (HEAT): A Novel MOEA Approach to EASME Protein Engineering
James S. L. Browning, Daniel R. Tauritz, John Beckmann
The vital functions of life are driven by proteins — long folded strings of amino acids. Millions of proteins exist in nature, all optimized for a certain set of operating conditions. One particularly important trait of a protein is thermostability, or the tendency to maintain form and function at high temperatures. Protein thermostability is important for both biological organisms and industrial enzymatic processes that utilize on them. This work proposes a novel hybrid EC approach, employing the EAs Simulating Molecular Evolution (EASME) software, for evolving thermostable variants of proteins with minimal computational overhead and customized problem-specific fitness functions. A variation of NSGA-II with a novel recombination/mutation interplay was designed to tackle the specific challenges of working with proteins. The results of this work, which computationally improved the thermostability of a protein more effectively than a state of the art machine learning model from 2025, demonstrate that EC may have been historically undervalued in the realm of protein design. The pipeline designed for this work could be applied to numerous other proteins, easily expanded to optimize other traits of those proteins, or employed on another problem entirely where mutations have a notable fitness cost.
DOI: To Be Announced
2024
Evolutionary algorithms simulating molecular evolution: a new field proposal (Briefings in Bioinformatics)
James S. L. Browning, Daniel Tauritz, John Beckmann
The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins—the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared with the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein “vocabulary.” A major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago or have never evolved (yet). By merging evolutionary algorithms, machine learning, and bioinformatics, we can develop highly customized “designer proteins.” We dub the new subfield of computational evolution, which employs evolutionary algorithms with DNA string representations, biologically accurate molecular evolution, and bioinformatics-informed fitness functions, Evolutionary Algorithms Simulating Molecular Evolution.
DOI: 10.1093/bib/bbae360
2024
Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal (arXiv Extended)
James S. L. Browning, Daniel Tauritz, John Beckmann
The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins -- the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." The major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago, or maybe never evolved in the first place. We outline a computational approach to solving this problem. By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before. We envision this work forming a new sub-field of computational evolution we dub evolutionary algorithms simulating molecular evolution (EASME).
2023
Modeling emergence of Wolbachia toxin-antidote protein functions with an evolutionary algorithm
John Beckmann, Joe Gillespie, Daniel Tauritz
Evolutionary algorithms (EAs) simulate Darwinian evolution and adeptly mimic natural evolution. Most EA applications in biology encode high levels of abstraction in top-down population ecology models. In contrast, our research merges protein alignment algorithms from bioinformatics into codon based EAs that simulate molecular protein string evolution from the bottom up. We apply our EA to reconcile a problem in the field of Wolbachia induced cytoplasmic incompatibility (CI). Wolbachia is a microbial endosymbiont that lives inside insect cells. CI is conditional insect sterility that operates as a toxin antidote (TA) system. Although, CI exhibits complex phenotypes not fully explained under a single discrete model. We instantiate in-silico genes that control CI, CI factors (cifs), as strings within the EA chromosome. We monitor the evolution of their enzymatic activity, binding, and cellular localization by applying selective pressure on their primary amino acid strings. Our model helps rationalize why two distinct mechanisms of CI induction might coexist in nature. We find that nuclear localization signals (NLS) and Type IV secretion system signals (T4SS) are of low complexity and evolve fast, whereas binding interactions have intermediate complexity, and enzymatic activity is the most complex. Our model predicts that as ancestral TA systems evolve into eukaryotic CI systems, the placement of NLS or T4SS signals can stochastically vary, imparting effects that might impact CI induction mechanics. Our model highlights how preconditions and sequence length can bias evolution of cifs toward one mechanism or another.
Frontiers in Microbiology, Volume 14 (2023)