Review Article | Open Access
Olga Khersonsky, Sarel J. Fleishman, "What Have We Learned from Design of Function in Large Proteins?", BioDesign Research, vol. 2022, Article ID 9787581, 11 pages, 2022. https://doi.org/10.34133/2022/9787581
What Have We Learned from Design of Function in Large Proteins?
The overarching goal of computational protein design is to gain complete control over protein structure and function. The majority of sophisticated binders and enzymes, however, are large and exhibit diverse and complex folds that defy atomistic design calculations. Encouragingly, recent strategies that combine evolutionary constraints from natural homologs with atomistic calculations have significantly improved design accuracy. In these approaches, evolutionary constraints mitigate the risk from misfolding and aggregation, focusing atomistic design calculations on a small but highly enriched sequence subspace. Such methods have dramatically optimized diverse proteins, including vaccine immunogens, enzymes for sustainable chemistry, and proteins with therapeutic potential. The new generation of deep learning-based ab initio structure predictors can be combined with these methods to extend the scope of protein design, in principle, to any natural protein of known sequence. We envision that protein engineering will come to rely on completely computational methods to efficiently discover and optimize biomolecular activities.
The high versatility and specificity that protein binders and enzymes exhibit make them exceptionally attractive in biomolecular research, medicine, and biotechnology. Fields as diverse as biomedical engineering , sustainable chemistry , and commodity production  have come to rely on proteins to provide efficient, economical, and environmentally sustainable solutions. These fields are likely to increasingly focus on proteins due to the pressing need to minimize energy use and the environmental impact of industrial processes. As a rule, however, natural proteins very rarely meet the stringencies of real-world applications. For instance, depending on the source organism, as many as half of nonmembrane proteins exhibit low solubility in heterologous expression systems [4, 5], limiting their usefulness even in basic research let alone in applications. Furthermore, with the exception of proteins from thermophilic organisms, many proteins are not stable at high temperatures or in long-term storage and may exhibit suboptimal levels of activity or selectivity . Thus, natural proteins present enormous opportunities to control the “chemistry of life”—the intricate molecular processes that underlie life processes—but a host of obstacles related to stability and activity must be overcome to study and exploit them. Understanding the underlying biophysical reasons for these obstacles and developing a general strategy to address them have been the subjects of immense basic and applied interest for decades [7–9].
Atomistic protein design strategies are based on the thermodynamic hypothesis , which stipulates that the protein native-state energy must be lower than that of any competing misfolded or unfolded states. Accordingly, design calculations search for a sequence and conformation that exhibit low native-state energy . Early successes in the atomistic design of stable and accurate protein structures , including of a completely new fold  led to optimism that protein design would replace traditional, iterative, and laborious protein optimization methods with a completely rational approach. According to this view, if stable new-to-nature folds could be designed at atomic accuracy, it stood to reason that general and reliable atomistic methods for stabilizing proteins or altering their activities could not be far from reach.
Nevertheless, atomistic design of natural proteins exhibited only limited success. Binder and enzyme designs almost always exhibited only a low level of activity and demanded intensive iterative experimental optimization to reach acceptable levels [14–19]. Furthermore, experimental structures often revealed significant differences from the design conception, including substantial and unpredicted deviations in backbone conformation or active-site sidechain constellations [20–22]. These inaccuracies highlighted a fundamental challenge to protein design methodology; namely, a prerequisite to the successful design of function is accurate control over all conformational degrees of freedom. Phrased more precisely, the key determining question for protein design methodology has been to identify a general computational strategy to encode the significant energy gap between native and nonnative protein conformations [8, 11, 23].
The number of possible nonnative conformations scales exponentially with the size of the protein . Thus, small proteins may not exhibit many nonnative alternative states that must be countered in design, and such proteins are therefore more amenable to complete computational design . Indeed, over the past decade, impressive progress was made in understanding the folding and stability of small de novo designed proteins (typically <90 amino acids) or idealized versions of natural folds [11, 25–28]. Such proteins can now be generated completely on the computer, though they exhibit no significant sequence relationship to natural proteins . These successes demonstrate significant progress and a high level of understanding and control over the fundamentals of protein folding. We refer the reader to excellent recent reviews on de novo designed proteins [28, 30–33].
Despite these dramatic achievements, however, reliable and fully atomistic design of large proteins of a complex fold has not made comparable breakthroughs. Certainly, protein engineering studies often use computational design calculations to focus experiments or to construct “smart” libraries for experimental screening [34–39]. Due to the limited accuracy of the atomistic design calculations, however, these workflows typically iterate computations with experimental screening and structure determination and do not provide a complete computational optimization solution. Critically, they also demand system-specific expertise and are difficult to generalize to proteins for which such expertise has not yet been developed. Thus, the key question that has guided our research is whether there may be general design principles that can be universally applied to proteins of all folds and sizes?
2. Design Essentials in Large Proteins
Natural proteins tend to be large. The average size of proteins in all organisms is approximately 250 amino acids (350 in eukaryotes), and fewer than 2% of natural proteins are smaller than 100 amino acids . Furthermore, regardless of their functional class, enzymes tend to have large sizes (Figure 1). A possible explanation for this propensity to large size is that proteins evolve through the accretion of subdomain fragments [41, 42]—an inefficient process of exploring the space of potential folds that may result in structural redundancy. Although this argument is plausible, we favor the view that proteins are large for a fundamental reason: many enzymes and binders must encode destabilizing molecular features in their active sites, such as desolvated nucleophilic or acidic amino acids [6, 43]. Accurately positioning such features demands a significant thermodynamic compensation from large regions outside the protein active site. Thus, even though an active site may comprise only a handful of amino acids, its accurate formation may require hundreds of amino acids that fold into a low-energy state. Other important functional features, such as regulatory sites, and large ligand interaction surfaces may also demand large protein sizes.
As we were studying how to design large proteins of a complex fold, we hoped to find molecular features that were common to diverse proteins . But one of the obstacles to inferring general protein design principles is the sheer diversity of protein structures. Protein domains are classified by SCOP into more than 1,500 folds and 2,500 superfamilies . Nevertheless, we were struck by the fact that the proteins we studied invariably broke some of the fundamental rules that have been successfully implemented in de novo design methodology. First, many protein folds, particularly the functionally more versatile ones like TIM barrels, β propellers, and immunoglobulins, comprise long loops at the active sites. By contrast, de novo designed proteins are typically dominated by secondary-structure elements that are connected through very short unstructured linkers [13, 29, 46] (Figure 2(a)). Critically, in natural proteins, irregular but structured backbone loops often form large parts of the active site and have important functional roles. Second, in de novo design methodology, amino acid positions in the protein core are programmed to exhibit only hydrophobic identities [25, 29] bolstering the hydrophobic effect which is one of the primary driving forces for protein folding . Nevertheless, in every natural fold we examined, we found polar and even charged amino acid residues buried in the protein core [44, 47–49] (Figures 2(b) and 2(c)). The buried polar amino acids often interact with the loop backbones (Figure 2(b)), suggesting that these amino acids are important for the structural stability of the irregular loop regions. We also found that these buried polar amino acids are evolutionarily conserved among homologs, further suggesting that they have an essential structural role . Since hydrogen bond networks demand high structural precision and often link distant parts of the protein sequence, we were intrigued that they might provide a powerful mechanism to specify the backbone conformation in large proteins of a complex fold. They could thereby accentuate the energy gap between the native and nonnative states that is the hallmark of natural proteins . Thus, our working hypothesis was that these buried polar networks, though they diverge from ideality, may hold a key to the problem of designing large and complex folds.
3. Backbone Design in Large and Functional Proteins
To test this hypothesis, we developed a general strategy for protein backbone design in large proteins through the assembly of subdomain fragments [48, 51]. In this work, we were inspired by bioinformatics studies that had demonstrated that large natural folds likely emerged by the accretion of small subdomains [41, 42, 52, 53]. Furthermore, protein engineering studies had implemented this strategy in the lab, demonstrating that fragments could be recombined to generate new proteins with different stability and specificity profiles [54–59]. Recently, assembly strategies have also been applied to extend the size of de novo designed proteins [60–64]. Nevertheless, recombination events can lead to structural inaccuracy in the form of “hopeful monsters”  and typically yield proteins that exhibit low (or no) activity. Thus, accuracy and control over the outcome of the assembly and design process are significant challenges.
In our work, we asked whether we could design new backbones and functional proteins through an evolution-guided atomistic design approach [48, 49, 51]. In this approach, following the assembly of backbone fragments, we subjected the entire protein to atomistic sequence design. Here, we were conscious that atomistic design calculations were very likely to eliminate crucial hydrogen bonding networks in the protein core. Therefore, instead of allowing all amino acid choices at each position during the sequence design phase, we biased design calculations to mutations commonly observed among homologs and forbade rare mutations.
As a first test, we applied this strategy to the design of antibody variable domains. Antibodies were recognized as modular proteins already in the 1970s , leading to a wave of innovation in therapeutic antibody engineering . We demonstrated that antibody variable domains designed strictly according to Rosetta atomistic design calculations eliminated conserved and critical buried hydrogen bond networks and exhibited very low protein expression levels . By contrast, by applying evolutionary sequence constraints, we retained these critical networks, generating highly expressed and structurally accurate antibodies that exhibited dozens of mutations from any natural antibody.
To test the generality of modular assembly and design, we applied it to two additional long-standing challenges of protein design methodology. First, we generated new enzymes through fragment assembly and atomistic design (Figure 3(a)) . Despite encoding more than 100 mutations from any natural enzyme, some of the designs were as stable and functional as natural enzymes in the same functional family, and some exhibited substantially different substrate selectivity profiles. Second, we used this strategy to recombine backbone fragments from nonhomologous proteins, generating new backbones and sequences in a high-affinity pair of interacting proteins . This procedure yielded atomically accurate designs including in the new backbone and designed hydrogen-bonded networks (Figure 3(b)). Remarkably, some of the binding pairs exhibited very high binding specificity relative to the natural pair, demonstrating that accurate control over the backbone and sidechain degrees of freedom is the key to the design of high-specificity interactions. Taken together, these results suggested that evolution-guided atomistic design could provide a general solution to outstanding problems in protein design of function.
4. Reliable and Completely Computational Protein Optimization
Encouraged by finding that evolution-guided atomistic design could design accurate and stable new backbones, we turned to protein optimization. Here, we developed two complementary strategies: PROSS for optimizing protein native-state stability (by designing the sequence outside the active or binding site)  and FuncLib for designing stable and preorganized constellations of amino acid residues within enzyme active sites or protein binding sites . In developing these design approaches, we relied on insights from decades of research on protein engineering and evolution  that demonstrated that (1) the vast majority of point mutations are neutral or deleterious to protein activity or stability ; (2) large differences in activity demand multipoint mutations in the active site, but multipoint mutants are even more likely than single-point mutations to destabilize or reduce protein activity levels ; and (3) the active-site constellation of amino acid residues is extremely sensitive even to remote mutations that may deform the protein backbone . Additionally, we sought a general protein optimization framework that would only rely on data that are readily available, in principle, for any natural protein structure and not on protein-specific expertise.
The strategy that we developed uses data from multiple-sequence alignments of homologous proteins in addition to atomistic design calculations. In natural evolution, homologous proteins diverged from a common ancestor, and selection pressures ensured that all of the extant proteins retained their primary activity and foldability. Thus, a sequence alignment of natural homologs indicates which mutations are likely to be tolerated. Indeed, inferences from phylogenetic calculations have been successfully used in “consensus” design  and ancestral sequence reconstruction  for decades. They have also been implemented successfully in other atomistic stability design methods [36, 75]. Additionally, in order to mitigate the risk of deforming the active site, mutations are only accepted if they are predicted not to alter the catalytic constellation [50, 68, 69]. As a last step, we applied combinatorial sequence design either inside the active site to modify protein activity (in FuncLib ) or outside the active site to stabilize the protein (PROSS ).
For several decades, directed evolution has been the method of choice for protein optimization . Although there are numerous examples of directed evolution successes, this method is iterative, laborious, and applicable only to systems that are amenable to medium or even high-throughput screening (>103 and even >106 variants). Many proteins, however, can only be assayed at low throughput, either because of complicated production requirements or because measuring their activity requires sensitive instruments. We started by applying PROSS and FuncLib to proteins that can only be assayed at low throughput, thus testing whether evolution-guided atomistic design is reliable enough to address “real-world” engineering challenges that are difficult or impossible for iterative methods. In these studies, we demonstrated that PROSS and FuncLib could dramatically improve protein expression levels, stability, and activity by testing a handful of designs (in the case of PROSS) [68, 77–79] or a few dozen (in the case of FuncLib) [50, 69, 80]. Due to their reliability, we enabled both methods as web servers. Opening these methods to general use had a profound impact on our understanding of the generality and reliability of these methods. In the following, we briefly describe lessons from studies mostly by other labs to address long-standing protein engineering problems using these methods.
5. Applied Protein Design
In the first community benchmark of a design method, 12 labs applied PROSS to 14 diverse proteins . Remarkably, in nine of these, at least one of the designs exhibited increased expression levels, and in 90% of the tested proteins, thermal stability improved in the designs. Furthermore, thermal stability increased more substantially in designs that incorporated a greater number of mutations. This observation suggested that the designed mutations were mostly additive and explained how the designs tolerate even large numbers of mutations (sometimes >50). Indeed, in a recent application, PROSS was successfully used to stabilize the bacterial chondroitinase ABC enzyme . This enzyme comprises more than 1,000 amino acids and is, to our knowledge, the largest enzyme that has been successfully subjected to design calculations (Figure 4(a)). This enzyme is attractive due to its therapeutic potential to regenerate nerves in the aftermath of spinal cord injuries, but its low stability in the body () has limited its applicability. By contrast, one of the PROSS designs exhibited prolonged stability (>4 days) and an increased level of activity.
The FuncLib method computes a set of active-site multipoint mutants for experimental testing . Since substrate-bound and transition-state models are often inaccurate or difficult to compute, FuncLib design calculations can be applied to the enzyme apo state. In this case, the resulting designs do not target a specific substrate. Rather, the designs explore different active-site sequences, each of which is predicted to be stable and to stabilize the core catalytic amino acid residues in their functionally competent constellation. This strategy thus increases the likelihood that the designs would exhibit diverse selectivities and high activity. By changing active-site shape and electrostatics, FuncLib dramatically altered selectivity profiles generating nerve agent hydrolases that exhibited three orders of magnitude improved breakdown of toxic nerve agents compared to the natural enzyme which served as a starting point . Furthermore, FuncLib was used to improve the regioselectivity of a nicotinamide N-methyltransferase (Figure 4(b)) . In this study, nearly 30% of the designs exhibited improved activity (by up to two orders of magnitude) and one of the designs exhibited 99% regioselectivity. This design may be useful for the precise production of N-alkylated pyrazoles which are important intermediates in producing small-molecule therapeutics.
Enzyme active sites and protein-protein binding sites share in common a high density of amino acid interactions. We therefore also applied FuncLib to optimize protein-protein interactions, finding that it can improve protein-binding affinity  and antibody stability and affinity [83, 84] by optimizing atomic interactions across the interacting surfaces. Furthermore, improving the interactions across the homooligomeric interfaces in a trimeric bacterial enzyme called PodA (Figure 4(c)) led to improved stability and an order of magnitude increase in its production yields . This enzyme is a candidate to serve as a novel antibiotic that targets recalcitrant Pseudomonas aeruginosa biofilms, and the design has enabled finding optimal treatment options.
As this brief summary shows, many challenges in basic and applied protein science are difficult to address with laboratory methods that demand high-throughput experiments. By contrast, the high reliability of evolution-guided atomistic design allows one-shot protein optimization through low-throughput experimental screening.
6. What Have We Learned from Evolution-Guided Atomistic Design?
We started exploring ways to incorporate evolutionary data in atomistic design calculations about a decade ago. At the time, we were frustrated with the incomplete control that atomistic design processes exerted over the design outcome [11, 20]: designed proteins exhibited low activity and often misfolded relative to the design conception. We assumed that by subjecting naturally occurring and versatile folds such as TIM barrels and antibodies to atomistic design, we would expose the design principles of natural and functional folds. We were particularly hopeful to identify negative-design principles—those principles that underlie the accurate folding of complex protein domains and rule out the myriads of misfolded (and nonfunctional) alternatives. In this section, we attempt to explain why evolution-guided atomistic design is reliable and what unexpected design principles we have learned from its application. We hope that these principles would be useful in protein design challenges that have not yet been addressed, not least, in the de novo design of large and functional proteins.
The failure of atomistic design calculations to reliably optimize stability and activity in large proteins was seen as a liability for many years [11, 86]. Speculations on the sources of error suggested that energy calculations are inherently inaccurate due to the approximate nature of the energy potentials related to solvation and electrostatics. Furthermore, the inability to provide a general framework to explain the mutational effects observed in protein engineering and directed-evolution experiments implied that perhaps protein optimization cannot be rationalized at all and would continue to rely on iterative experimental exploration . By contrast with these views, however, evolution-guided protein stability design methods are able to improve the thermal stability and expressibility of probably more than half of the proteins subjected to them . Some successfully designed proteins comprised hundreds and even more than 1,000 amino acids indicating that the design strategy is not very sensitive to the compounding of error in energy calculations which was held as a difficult challenge to overcome .
Evolution-guided atomistic design owes its accuracy to eliminating mutations that are likely to destabilize the protein, induce misfolding, or distort the active-site constellation of residues . Particularly, critical sequence and structure features that are not ideal, such as loops, buried polar interaction networks, and bent secondary structure elements, are maintained in evolution-guided atomistic design owing to the use of natural backbones and sequence constraints [44, 47–49]. Therefore, although atomistic design calculations, on their own, exhibit limited accuracy when applied to large proteins [49, 91], together with evolutionary constraints, they nonetheless exhibit high accuracy. The nonideal sequence and structural features are likely to serve a negative-design purpose as they drastically limit the number of nonnative low-energy conformations (Figure 2). Implementing such nonideal features may also provide an important key to increasing the size and fold complexity of de novo designed proteins (Figure 5) [88–90].
Dynamics is a critical determinant in many protein functions. The fact that protein design calculations are limited to considering one state (or a handful of states in some cases) severely restricts their ability to address dynamics. Furthermore, in some cases, dynamics and stability may trade-off, since stability design calculations introduce new stabilizing contacts to only one of the protein states . Several of the proteins which were successfully designed using evolution-guided atomistic design processes nevertheless exhibit functionally important dynamics. Remarkably, the human estrogen receptor, which undergoes critical conformational changes in response to ligand binding, was subjected to PROSS stability design calculations, yielding a design with 24 mutations that improved its stability and yet maintained a very similar activation profile to the human protein . Possibly, the dynamics that are critical to protein activity are maintained in these designs due to the sequence constraints derived from natural homologs. It is too early to say whether these intriguing results can be generalized to other dynamic proteins, and we are actively studying this question in other dynamic proteins.
Finally, the most important lesson has been to build on the insights gained over the past four decades of protein engineering and simulation. The critical importance of negative-design principles to counter misfolding and aggregation [8, 23, 94, 95], the reliability of sequence-based “consensus” design  and ancestral sequence reconstruction , the modularity of so many of the most versatile protein folds , and the fact that most mutations are neutral or deleterious [18, 71] have shaped our design strategy. We were also fortunate to have an intense and fruitful dialogue with one of the leaders of modern enzyme evolution and engineering, Dan Tawfik, who tragically died last year . He made profound contributions to clarifying these principles [98–101] and insisted that design methods should be tested on real-world protein engineering challenges, collaborating with us to design large proteins that were intransigent even to the most reliable computational and experimental optimization strategies [68, 69, 80].
The most dramatic development in computational structural biology of the recent decade is the emergence of deep learning-based ab initio structure predictors such as AlphaFold2  and RoseTTAFold  that generate atomically accurate model structures directly from sequence. Using these methods, essentially any protein can be accurately modeled without requiring large computational resources. This is an exciting development for structure-based protein optimization methods since they can now be used to generate functionally expressed designs even in proteins that are so unstable that they had not previously been characterized experimentally . This combined modeling and design strategy, therefore, goes beyond mere optimization of known activities to discover new activities encoded in natural proteins. We believe that this combined strategy will contribute significantly to research and utilization of proteins that are critical to human health, industry, and the environment but have not yielded to experimental characterization.
Looking beyond the optimization of natural proteins, a long-standing goal of protein design methodology is to design new activities completely from scratch. It is still unclear, however, how to use evolutionary data in guiding the design of activities that are not encoded in nature . Recent results using deep learning-based predictors suggest an intriguing possibility that they may be able to assess the foldability of protein designs [106–108]. These methods may therefore replace evolutionary data in ensuring that designs accurately fold as conceived. Thus, the next phase of innovation in protein design methodology is likely to rely in part on statistical learning methods. These may open the way to one of the most long-standing goals of protein engineering: the completely computational design of new or improved molecular activities without recourse to experimental data.
OK and SJF are named inventors on patents regarding designs and methods mentioned in the manuscript.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
We thank Shiran Barber-Zucker for critical reading and Rosalie Lipsh-Sokolik for preparing Figure 3(a). Research was supported by a European Research Council Consolidator Award (815379), the Israel Science Foundation (1844), the Volkswagen Foundation (94747), the Dr. Barry Sherman Institute for Medicinal Chemistry, and a charitable donation in memory of Sam Switzer.
- P. J. Krohl, S. D. Ludwig, and J. B. Spangler, “Emerging technologies in protein interface engineering for biomedical applications,” Current Opinion in Biotechnology, vol. 60, pp. 82–88, 2019.
- E. L. Bell, W. Finnigan, S. P. France et al., “Biocatalysis,” Nature Reviews Methods Primers, vol. 1, no. 1, pp. 1–21, 2021.
- S. Wu, R. Snajdrova, J. C. Moore, K. Baldenius, and U. T. Bornscheuer, “Biocatalysis: enzymatic synthesis for industrial applications,” Angewandte Chemie (International Ed. in English), vol. 60, no. 1, pp. 88–119, 2021.
- C. Mehlin, E. Boni, F. S. Buckner et al., “Heterologous expression of proteins from Plasmodium falciparum: results from 1000 genes,” Molecular and Biochemical Parasitology, vol. 148, no. 2, pp. 144–160, 2006.
- D. Christendat, A. Yee, A. Dharamsi et al., “Structural proteomics: prospects for high throughput sample preparation,” Progress in Biophysics and Molecular Biology, vol. 73, no. 5, pp. 339–345, 2000.
- A. Goldenzweig and S. J. Fleishman, “Principles of protein stability and their application in computational design,” Annual Review of Biochemistry, vol. 87, pp. 105–129, 2018.
- B. W. Matthews, “Studies on protein stability with T4 lysozyme,” Advances in Protein Chemistry, vol. 46, pp. 249–278, 1995.
- J. S. Richardson, D. C. Richardson, N. B. Tweedy et al., “Looking at proteins: representations, folding, packing, and design. Biophysical Society National Lecture, 1992,” Biophysical Journal, vol. 63, no. 5, pp. 1185–1209, 1992.
- A. R. Fersht and L. Serrano, “Principles of protein stability derived from protein engineering experiments,” Current Opinion in Structural Biology, vol. 3, no. 1, pp. 75–83, 1993.
- C. B. Anfinsen, “Principles that govern the folding of protein chains,” Science, vol. 181, no. 4096, pp. 223–230, 1973.
- S. J. Fleishman and D. Baker, “Role of the biomolecular energy gap in protein design, structure, and evolution,” Cell, vol. 149, no. 2, pp. 262–273, 2012.
- B. I. Dahiyat and S. L. Mayo, “De novo protein design: fully automated sequence selection,” Science, vol. 278, no. 5335, pp. 82–87, 1997.
- B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker, “Design of a novel globular protein fold with atomic-level accuracy,” Science, vol. 302, no. 5649, pp. 1364–1368, 2003.
- O. Khersonsky, G. Kiss, D. Röthlisberger et al., “Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59,” Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 26, pp. 10358–10363, 2012.
- F. Richter, R. Blomberg, S. D. Khare et al., “Computational design of catalytic dyads and oxyanion holes for ester hydrolysis,” Journal of the American Chemical Society, vol. 134, no. 39, pp. 16197–16206, 2012.
- S. J. Fleishman, T. A. Whitehead, D. C. Ekiert et al., “Computational design of proteins targeting the conserved stem region of influenza hemagglutinin,” Science, vol. 332, no. 6031, pp. 816–821, 2011.
- J. Karanicolas, J. E. Corn, I. Chen et al., “A de novo protein binding pair by computational design and directed evolution,” Molecular Cell, vol. 42, no. 2, pp. 250–260, 2011.
- T. A. Whitehead, A. Chevalier, Y. Song et al., “Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing,” Nature Biotechnology, vol. 30, no. 6, pp. 543–548, 2012.
- T. A. Whitehead, D. Baker, and S. J. Fleishman, “Computational design of novel protein binders and experimental affinity maturation,” Methods in Enzymology, vol. 523, pp. 1–19, 2013.
- S. D. Khare and S. J. Fleishman, “Emerging themes in the computational design of novel enzymes and protein-protein interfaces,” FEBS Letters, vol. 587, no. 8, pp. 1147–1154, 2013.
- P. B. Stranges and B. Kuhlman, “A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds,” Protein Science, vol. 22, no. 1, pp. 74–82, 2013.
- I. V. Korendovych and W. F. DeGrado, “Catalytic efficiency of designed catalytic proteins,” Current Opinion in Structural Biology, vol. 27, pp. 113–121, 2014.
- M. J. Rooman and S. J. Wodak, “Extracting information on folding from the amino acid sequence: consensus regions with preferred conformation in homologous proteins,” Biochemistry, vol. 31, no. 42, pp. 10239–10249, 1992.
- K. A. Dill, “Polymer principles and protein folding,” Protein Science, vol. 8, no. 6, pp. 1166–1180, 1999.
- N. Koga, R. Tatsumi-Koga, G. Liu et al., “Principles for designing ideal protein structures,” Nature, vol. 491, no. 7423, pp. 222–227, 2012.
- P. Liò and P. Zuliani, Automated Reasoning for Systems Biology and Medicine, Springer, 2019.
- L. Cao, I. Goreshnik, B. Coventry et al., “De novo design of picomolar SARS-CoV-2 miniprotein inhibitors,” Science, vol. 370, no. 6515, pp. 426–431, 2020.
- E. G. Baker, G. J. Bartlett, K. L. Porter Goff, and D. N. Woolfson, “Miniprotein design: past, present, and prospects,” Accounts of Chemical Research, vol. 50, no. 9, pp. 2085–2092, 2017.
- G. J. Rocklin, T. M. Chidyausiku, I. Goreshnik et al., “Global analysis of protein folding using massively parallel design, synthesis, and testing,” Science, vol. 357, no. 6347, pp. 168–175, 2017.
- X. Pan and T. Kortemme, “Recent advances in de novo protein design: Principles, methods, and applications,” The Journal of Biological Chemistry, vol. 296, article 100558, 2021.
- P.-S. Huang, S. E. Boyken, and D. Baker, “The coming of age of de novo protein design,” Nature, vol. 537, no. 7620, pp. 320–327, 2016.
- B. Kuhlman and P. Bradley, “Advances in protein structure prediction and design,” Nature Reviews. Molecular Cell Biology, vol. 20, no. 11, pp. 681–697, 2019.
- H. T. Kratochvil, R. W. Newberry, B. Mensa, M. Mravic, and W. F. DeGrado, “Spiers Memorial Lecture: analysis and de novo design of membrane-interactive peptides,” Faraday Discussions, vol. 232, pp. 9–48, 2021.
- H. J. Wijma, M. J. L. J. Fürst, and D. B. Janssen, “A computational library design protocol for rapid improvement of protein stability: FRESCO,” Methods in Molecular Biology, vol. 1685, pp. 69–85, 2018.
- H. J. Wijma, R. J. Floor, S. Bjelic, S. J. Marrink, D. Baker, and D. B. Janssen, “Enantioselective enzymes by computational design and in silico screening,” Angewandte Chemie (International Ed. in English), vol. 54, no. 12, pp. 3726–3730, 2015.
- M. Musil, J. Stourac, J. Bendl et al., “FireProt: web server for automated design of thermostable proteins,” Nucleic Acids Research, vol. 45, no. W1, pp. W393–W399, 2017.
- L. Sumbalova, J. Stourac, T. Martinek, D. Bednar, and J. Damborsky, “HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information,” Nucleic Acids Research, vol. 46, no. W1, pp. W356–W362, 2018.
- C. E. Sequeiros-Borja, B. Surpeta, and J. Brezovsky, “Recent advances in user-friendly computational tools to engineer protein function,” Briefings in Bioinformatics, vol. 22, no. 3, 2021.
- S. M. Marques, J. Planas-Iglesias, and J. Damborsky, “Web-based tools for computational enzyme design,” Current Opinion in Structural Biology, vol. 69, pp. 19–34, 2021.
- L. Brocchieri and S. Karlin, “Protein length in eukaryotic and prokaryotic proteomes,” Nucleic Acids Research, vol. 33, no. 10, pp. 3390–3400, 2005.
- L. N. Kinch and N. V. Grishin, “Evolution of protein structures and functions,” Current Opinion in Structural Biology, vol. 12, no. 3, pp. 400–408, 2002.
- R. V. Eck and M. O. Dayhoff, “Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences,” Science, vol. 152, no. 3720, pp. 363–366, 1966.
- P. A. Srere, “Why are enzymes so big?” Trends in Biochemical Sciences, vol. 9, no. 9, pp. 387–390, 1984.
- O. Khersonsky and S. J. Fleishman, “Why reinvent the wheel? Building new proteins based on ready-made parts,” Protein Science, vol. 25, no. 7, pp. 1179–1187, 2016.
- A. Andreeva, D. Howorth, C. Chothia, E. Kulesha, and A. G. Murzin, “Investigating protein structure and evolution with SCOP2,” Current Protocols in Bioinformatics, vol. 49, no. 1, pp. 1.26.1–1.26.21, 2015.
- P.-S. Huang, K. Feldmeier, F. Parmeggiani, D. A. F. Velasco, B. Höcker, and D. Baker, “De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy,” Nature Chemical Biology, vol. 12, no. 1, pp. 29–34, 2016.
- G. Lapidoth, O. Khersonsky, R. Lipsh et al., “Highly active enzymes by automated combinatorial backbone assembly and sequence design,” Nature Communications, vol. 9, no. 1, p. 2780, 2018.
- G. D. Lapidoth, D. Baran, G. M. Pszolla et al., “AbDesign: an algorithm for combinatorial backbone design guided by natural conformations and sequences,” Proteins, vol. 83, no. 8, pp. 1385–1406, 2015.
- D. Baran, M. G. Pszolla, G. D. Lapidoth et al., “Principles for computational design of binding antibodies,” Proceedings of the National Academy of Sciences of the United States of America, vol. 114, no. 41, pp. 10900–10905, 2017.
- R. Netzer, D. Listov, R. Lipsh et al., “Ultrahigh specificity in a network of computationally designed protein- interaction pairs,” Nature Communications, vol. 9, no. 1, p. 5286, 2018.
- R. Lipsh-Sokolik, D. Listov, and S. J. Fleishman, “The AbDesign computational pipeline for modular backbone assembly and design of binders and enzymes,” Protein Science, vol. 30, no. 1, pp. 151–159, 2021.
- R. Sterner and B. Höcker, “Catalytic versatility, stability, and evolution of the (betaalpha)8-barrel enzyme fold,” Chemical Reviews, vol. 105, no. 11, pp. 4038–4055, 2005.
- A. J. Michael, “Evolution of biosynthetic diversity,” The Biochemical Journal, vol. 474, no. 14, pp. 2277–2299, 2017.
- B. Höcker, J. Claren, and R. Sterner, “Mimicking enzyme evolution by generating new (betaalpha) 8-barrels from (betaalpha) 4-half-barrels,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 47, pp. 16448–16453, 2004.
- B. Höcker, S. Beismann-Driemeyer, S. Hettwer, A. Lustig, and R. Sterner, “Dissection of a (betaalpha) 8-barrel enzyme into two folded halves,” Nature Structural Biology, vol. 8, no. 1, pp. 32–36, 2001.
- I. Yadid and D. S. Tawfik, “Functional β-propeller lectins by tandem duplications of repetitive units,” Protein Engineering, Design & Selection, vol. 24, no. 1-2, pp. 185–195, 2011.
- I. Yadid and D. S. Tawfik, “Reconstruction of functional beta-propeller lectins via homo-oligomeric assembly of shorter fragments,” Journal of Molecular Biology, vol. 365, no. 1, pp. 10–17, 2007.
- C. A. Voigt, C. Martinez, Z.-G. Wang, S. L. Mayo, and F. H. Arnold, “Protein building blocks preserved by recombination,” Nature Structural Biology, vol. 9, no. 7, pp. 553–558, 2002.
- S. Shanmugaratnam, S. Eisenbeis, and B. Höcker, “A highly stable protein chimera built from fragments of different folds,” Protein Engineering, Design & Selection, vol. 25, no. 11, pp. 699–703, 2012.
- T. M. Jacobs, B. Williams, T. Williams et al., “Design of structurally distinct proteins using strategies inspired by evolution,” Science, vol. 352, no. 6286, pp. 687–690, 2016.
- J. A. Fallas, G. Ueda, W. Sheffler et al., “Computational design of self-assembling cyclic protein homo-oligomers,” Nature Chemistry, vol. 9, no. 4, pp. 353–360, 2017.
- T. J. Brunette, F. Parmeggiani, P.-S. Huang et al., “Exploring the repeat protein universe through computational protein design,” Nature, vol. 528, no. 7583, pp. 580–584, 2015.
- C. Reichen, S. Hansen, C. Forzani et al., “Computationally designed armadillo repeat proteins for modular peptide recognition,” Journal of Molecular Biology, vol. 428, no. 22, pp. 4467–4489, 2016.
- L. Doyle, J. Hallinan, J. Bolduc et al., “Rational design of α-helical tandem repeat proteins with closed architectures,” Nature, vol. 528, no. 7583, pp. 585–588, 2015.
- B. Höcker, “Engineering chimaeric proteins from fold fragments: “hopeful monsters” in protein design,” Biochemical Society Transactions, vol. 41, no. 5, pp. 1137–1140, 2013.
- T. T. Wu and E. A. Kabat, “An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity,” The Journal of Experimental Medicine, vol. 132, no. 2, pp. 211–250, 1970.
- G. Winter and C. Milstein, “Man-made antibodies,” Nature, vol. 349, no. 6307, pp. 293–299, 1991.
- A. Goldenzweig, M. Goldsmith, S. E. Hill et al., “Automated structure- and sequence-based design of proteins for high bacterial expression and stability,” Molecular Cell, vol. 70, no. 2, p. 380, 2018.
- O. Khersonsky, R. Lipsh, Z. Avizemer et al., “Automated design of efficient and functionally diverse enzyme repertoires,” Molecular Cell, vol. 72, no. 1, pp. 178–186.e5, 2018.
- J. Weinstein, O. Khersonsky, and S. J. Fleishman, “Practically useful protein-design methods combining phylogenetic and atomistic calculations,” Current Opinion in Structural Biology, vol. 63, pp. 58–64, 2020.
- M. Kimura and T. Ohta, “Protein polymorphism as a phase of molecular evolution,” Nature, vol. 229, no. 5285, pp. 467–469, 1971.
- N.-S. Hong, D. Petrović, R. Lee et al., “The evolution of multiple active site configurations in a designed enzyme,” Nature Communications, vol. 9, no. 1, p. 3900, 2018.
- B. Steipe, B. Schiller, A. Plückthun, and S. Steinbacher, “Sequence statistics reliably predict stabilizing mutations in a protein domain,” Journal of Molecular Biology, vol. 240, no. 3, pp. 188–192, 1994.
- M. A. Spence, J. A. Kaczmarski, J. W. Saunders, and C. J. Jackson, “Ancestral sequence reconstruction for protein engineers,” Current Opinion in Structural Biology, vol. 69, pp. 131–141, 2021.
- R. T. Khan, M. Musil, J. Stourac, J. Damborsky, and D. Bednar, “Fully automated ancestral sequence reconstruction using FireProtASR,” Current Protocols, vol. 1, no. 2, article e30, 2021.
- F. H. Arnold, “Innovation by evolution: bringing new chemistry to life (Nobel Lecture),” Angewandte Chemie (International Ed. in English), vol. 58, no. 41, pp. 14420–14426, 2019.
- I. Campeotto, A. Goldenzweig, J. Davey et al., “One-step design of a stable variant of the malaria invasion protein RH5 for use as a vaccine immunogen,” Proceedings of the National Academy of Sciences of the United States of America, vol. 114, no. 5, pp. 998–1002, 2017.
- Y. Peleg, R. Vincentelli, B. M. Collins et al., “Community-wide experimental evaluation of the PROSS stability-design method,” Journal of Molecular Biology, vol. 433, no. 13, article 166964, 2021.
- H. Allouche-Arnon, O. Khersonsky, N. D. Tirukoti et al., “Computationally designed dual-color MRI reporters for noninvasive imaging of transgene expression,” Nature Biotechnology, 2022.
- D. L. Trudeau, C. Edlich-Muth, J. Zarzycki et al., “Design and in vitro realization of carbon-conserving photorespiration,” Proceedings of the National Academy of Sciences of the United States of America, vol. 115, no. 49, pp. E11455–E11464, 2018.
- M. H. Hettiaratchi, M. J. O’Meara, T. R. O’Meara, A. J. Pickering, N. Letko-Khait, and M. S. Shoichet, “Reengineering biocatalysts: computational redesign of chondroitinase ABC improves efficacy and stability,” Science Advances, vol. 6, no. 34, article eabc6378, 2020.
- L. L. Bengel, B. Aberle, A.-N. Egler-Kemmerer, S. Kienzle, B. Hauer, and S. C. Hammer, “Engineered enzymes enable selective N-alkylation of pyrazoles with simple haloalkanes,” Angewandte Chemie (International Ed. in English), vol. 60, no. 10, pp. 5554–5560, 2021.
- S. Warszawski, A. Borenstein Katz, R. Lipsh et al., “Optimizing antibody affinity and stability by the automated design of the variable light-heavy chain interfaces,” PLoS Computational Biology, vol. 15, no. 8, article e1007207, 2019.
- A. Borenstein-Katz, S. Warszawski, R. Amon et al., “Biomolecular recognition of the glycan neoantigen CA19-9 by distinct antibodies,” Journal of Molecular Biology, vol. 433, no. 15, article 167099, 2021.
- C. M. VanDrisse, R. Lipsh-Sokolik, O. Khersonsky, S. J. Fleishman, and D. K. Newman, “Computationally designed pyocyanin demethylase acts synergistically with tobramycin to kill recalcitrant Pseudomonas aeruginosa biofilms,” Proceedings of the National Academy of Sciences of the United States of America, vol. 118, no. 12, 2021.
- D. Baker, “What has de novo protein design taught us about protein folding and biophysics?” Protein Science, vol. 28, no. 4, pp. 678–683, 2019.
- H. Zhao and F. H. Arnold, “Directed evolution converts subtilisin E into a functional equivalent of thermitase,” Protein Engineering, vol. 12, no. 1, pp. 47–53, 1999.
- J. G. Wiese, S. Shanmugaratnam, and B. Höcker, “Extension of a de novo TIM barrel with a rationally designed secondary structure element,” Protein Science, vol. 30, no. 5, pp. 982–989, 2021.
- N. Koga, R. Koga, G. Liu, J. Castellanos, G. T. Montelione, and D. Baker, “Role of backbone strain in de novo design of complex α/β protein structures,” Nature Communications, vol. 12, no. 1, p. 3921, 2021.
- S. Kordes, S. Romero-Romero, L. Lutz, and B. Höcker, “A newly introduced salt bridge cluster improves structural and biophysical properties of de novo TIM barrels,” Protein Science, vol. 31, no. 2, pp. 513–527, 2022.
- K. E. Johansson, N. T. Johansen, S. Christensen et al., “Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template,” Journal of Molecular Biology, vol. 428, no. 21, pp. 4361–4377, 2016.
- V. Vongsouthi, J. H. Whitfield, P. Unichenko et al., “A rationally and computationally designed fluorescent biosensor for d-serine,” ACS Sens., vol. 6, no. 11, pp. 4193–4205, 2021.
- M. Kriegel, H. J. Wiederanders, S. Alkhashrom, J. Eichler, and Y. A. Muller, “A PROSS-designed extensively mutated estrogen receptor α variant displays enhanced thermal stability while retaining native allosteric regulation and structure,” Scientific Reports, vol. 11, no. 1, article 10509, 2021.
- C. F. Wright, S. A. Teichmann, J. Clarke, and C. M. Dobson, “The importance of sequence diversity in the aggregation and evolution of proteins,” Nature, vol. 438, no. 7069, pp. 878–881, 2005.
- L. Pauling, H. A. Itano, S. J. Singer, and I. C. Wells, “Sickle cell anemia a molecular disease,” Science, vol. 110, no. 2865, pp. 543–548, 1949.
- M. Wang and G. Caetano-Anollés, “The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world,” Structure, vol. 17, no. 1, pp. 66–78, 2009.
- A. Aharoni and S. J. Fleishman, “Obituary: Dan S Tawfik (1955–2021),” The FEBS Journal, vol. 288, no. 13, pp. 3880–3883, 2021, https://febs.onlinelibrary.wiley.com/doi/abs/10.1111/febs.16019.
- D. Davidi, L. M. Longo, J. Jabłońska, R. Milo, and D. S. Tawfik, “A bird’s-eye view of enzyme evolution: chemical, physicochemical, and physiological considerations,” Chemical Reviews, vol. 118, no. 18, pp. 8786–8797, 2018.
- S. G. Peisajovich, L. Rockah, and D. S. Tawfik, “Evolution of new protein topologies through multistep gene rearrangements,” Nature Genetics, vol. 38, no. 2, pp. 168–174, 2006.
- D. L. Trudeau and D. S. Tawfik, “Protein engineers turned evolutionists--the quest for the optimal starting point,” Current Opinion in Biotechnology, vol. 60, pp. 46–52, 2019.
- D. L. Trudeau, M. Kaltenbach, and D. S. Tawfik, “On the potential origins of the high stability of reconstructed ancestral proteins,” Molecular Biology and Evolution, vol. 33, no. 10, pp. 2633–2641, 2016.
- J. Jumper, R. Evans, A. Pritzel et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021.
- M. Baek, F. DiMaio, I. Anishchenko et al., “Accurate prediction of protein structures and interactions using a three-track neural network,” Science, vol. 373, no. 6557, pp. 871–876, 2021.
- Shiran Barber-Zucker, Vladimir Mindel, Eva Garcia-Ruiz, J. J. Weinstein, Miguel Alcalde, and Sarel J. Fleishman, “Stable and Functionally Diverse Versatile Peroxidases Designed Directly from Sequences,” Journal of the American Chemical Society, vol. 144, no. 8, pp. 3564–3571, 2022.
- V. Frappier and A. E. Keating, “Data-driven computational protein design,” Current Opinion in Structural Biology, vol. 69, pp. 63–69, 2021.
- C. Norn, B. I. M. Wicky, D. Juergens et al., “Protein sequence design by conformational landscape optimization,” Proceedings of the National Academy of Sciences, vol. 118, no. 11, article e2017228118, 2021.
- I. Anishchenko, S. J. Pellock, T. M. Chidyausiku et al., “De novo protein design by deep network hallucination,” Nature, vol. 600, no. 7889, pp. 547–552, 2021.
- S. J. Fleishman, D. Listov, R. Lipsh-Sokolik, C. Yang, and B. E. Correia, Assessing and enhancing foldability in designed proteins, bioRxiv, 2021, https://europepmc.org/article/ppr/ppr418107.
Copyright © 2022 Olga Khersonsky and Sarel J. Fleishman. Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).