Many sequences identified through transcriptomic and proteomic technologies do not align with annotated genes1. Across various species, hundreds to thousands of noncanonical open reading frames (ORFs) have been detected, and many of these are associated with detectable levels of protein (for example, ref. 2).
These ORFs are typically much shorter than well-characterized and evolutionary conserved genes and are present in substantially lower quantities3. Most such sequences are rapidly purged by selection, but numerous instances of surviving proteins originating de novo from previously noncoding DNA have been documented4.
Although the mechanisms that govern their emergence and adaptation remain poorly understood, these sequences represent ongoing evolutionary 'experiments'. This raises the question of how frequently random sequences can interact with the cellular environment in non-deleterious ways, and how often they can assume novel functional roles.
Writing in this issue of Nature Ecology & Evolution, Frumkin and Laub5 tackle these questions by searching sequence space for random sequences that rescue E. coli from the deleterious effects of the RNase toxin MazF.