Viruses rapidly co-evolve with their hosts. The 9 million sequenced SARS-CoV-2 genomes by March 2022 provide a detailed account of viral evolution, showing that all amino acids have been mutated many times.
However, only a few became prominent in the viral population. Here, we investigated the emergence of the same mutations in unrelated parallel lineages and the extent of such convergent evolution on the molecular level in the spike (S) protein.
We found that during the first phase of the pandemic (until mid 2021, before mass vaccination) 31 mutations evolved independently >= 3-times within separated lineages. These included all the key mutations in SARS-CoV-2 variants of concern (VOC) at that time, indicating their fundamental adaptive advantage.
The omicron added many more mutations not frequently seen before, which can be attributed to the synergistic nature of these mutations, which is more difficult to evolve. The great majority (24/31) of S-protein mutations under convergent evolution tightly cluster in three functional domains; N-terminal domain, receptor-binding domain, and Furin cleavage site.
Furthermore, among the S-protein receptor-binding motif mutations, ACE2 affinity-improving substitutions are favoured. Next, we determined the mutation space in the S protein that has been covered by SARS-CoV-2.
We found that all amino acids that are reachable by single nucleotide changes have been probed multiple times in early 2021. The substitutions requiring two nucleotide changes have recently (late 2021) gained momentum and their numbers are increasing rapidly.
These provide a large mutation landscape for SARS-CoV-2 future evolution, on which research should focus now.