Monday, May 18, 2026

Orphan Genes: What Are They?

 








I don’t know if any of you out there know what orphan genes are, but until recently, I had never heard of what are called “orphan genes”, but every plant and animal has them.  What orphan genes are, are genes that are unique to any given organism that no other possesses which creationists claim pose a problem for evolutionists because they appear suddenly and without any trace of evolutionary ancestry. [1]
 
But that has not stopped evolutionists from attempting to explain how these orphan genes may have evolved.  There are two common proposals for how they claim this may have happened:
 
 
 
1.  Divergence Beyond Recognition: Proponents of this theory argue that new or novel genes arose by way of repeated duplications of already existing genes during the process of which mutations accumulate to the point of creating a gene that is different from the rest.
 
2.  de novo emergence from ancestral non-genic sequences: Proponents of this theory argue that orphan genes somehow evolved through non-genic sequences
 
 
 
As I took the time to find out how evolutionists might try to counter creationist claims that the existence of orphan genes serves as evidence against evolution, I came across one paper from eLife, that examined mainly the DBR proposal and another from Nature, written by proponents of de novo emergence.
 
As I read the paper from eLife, that examined mainly the testing of the DBR theory, the researchers went into great detail as to how they were able to distinguish genes shared in common by different specimens (homologous or orthologous genes) and those genes not shared in common (non-homologous or non-orthologous) and how they tried to go about determining at what point genes responsible for the vast biodiversity that we see emerged through this process:
 
 
 
 
we first select a set of target genomes to compare to our focal genome.  Using precomputed pairs of homologous genes (those belonging to the same OrthoDB group) we identify regions of conserved micro-synteny. Our operational definition of conserved micro-synteny consists of cases where a gene in the focal genome is found within a conserved chromosomal block of at least three genes: the immediate downstream and upstream neighbours of the focal gene must have homologues in the target genome that are themselves separated by at most one or two genes and, if the genes immediately next to these neighbours (second neighbours of the focal gene) have homologues in the target genome, these must also be separated from the homologues of the immediate neighbours by at most one gene. Since the choice of synteny criterion can have an impact on downstream analyses we have also used one more relaxed and one more stringent definition. All focal genes for which at least one region of conserved micro-synteny, in any target genome, is identified, are retained for further analysis. This step establishes a list of focal genes with at least one presumed homologue in one or more target genomes (i.e., the gene located in the conserved location in the micro-synteny block).
 
We then examine whether the focal gene has any sequence similarity in the target species. We search for sequence similarity in two ways: comparison with annotated genes (proteome), and comparison with the genomic DNA (genome). First, we search within BLASTP matches that we have precomputed ourselves (these are different from the OrthoDB data) using the complete proteome of the focal species as query against the complete proteome of the target species. Within this BLASTP output we look for matches between the query gene and the candidate gene. If none is found then we use TBLASTN to search the genomic region around the candidate gene b’ for similarity to the query gene b. If no similarity is found, the search is extended to the rest of the target proteome and genome. If there is no sequence similarity after these successive searches, then we infer that the sequence has diverged beyond recognition. After having recorded whether similarity can be detected for all eligible query genes, we finally retrieve the focal-target pairs and produce the found and not found proportions for each pair of genomes…
 
Homology detection is highly sensitive to the technical choices made during sequence similarity searches…
 
We next sought to estimate how much the process of divergence beyond recognition contributes to the genome-wide pool of genes without detectable similarity. To do so, we need to assume that the proportion of genes that have diverged beyond recognition in micro-synteny blocks can be used as a proxy for the genome-wide rate of origin-by-divergence for genes without detectable similarity, irrespective of the presence of micro-synteny conservation. This in turn depends on the distribution of evolutionary rates inside and outside micro-synteny blocks…[2]
 
 
 
But they admitted that their methods were not necessarily fool-proof:
 
 
 
we can see how the ratios of undetectable and false homologies vary as a function of the BLAST E-value threshold used. The proportion of undetectable homologies depended quasi-linearly on the E-value cut-off. By contrast, false homologies depended exponentially on the cut-off, as expected from the E-value definition. Furthermore, the impact of E-value cut-off was more pronounced in comparisons of species separated by longer evolutionary distances, whereas it was almost non-existent for comparisons amongst the most closely related species. Conversely, there seems to be no dependence between percentage of false homologies and evolutionary time across the range of E-values that we have tested.
 
This means that, when comparing relatively closely related species, failing to appropriately control for false homologies would have an overall more severe effect on homology detection than failing to account for false negatives.
 
In the context of phylostratigraphy (estimation of phylogenetic branch of origin of a gene based on its taxonomic distribution, gene age underestimation due to BLAST ‘false negatives’ has been considered a serious issue, although the importance of spurious BLAST hits generating false positives has also been stressed…[3]
 
 
 
 
In other words, compared genes that might have appeared similar might not have been as similar as they appeared and in some cases, genes appearing different from each other might have been more similar than they appeared.  They also admitted that DBR accounted for substantially less of the origin of orphan genes than they had estimated:
 
 
 
We found that, in most pairwise species comparisons, the observed proportion of all genes without similarity far exceeds that estimated to have originated by divergence…
 
We also applied the same reasoning to estimate how much divergence beyond recognition contributes to TRGs. To this aim we calculated the fraction of focal genes lacking detectable homologues in a phylogeny-based manner, in the target species and in all species more distantly related to the focal species than the target species.  Again, the observed proportion of TRGs far exceeded that estimated to have originated by divergence…Thus, we conclude that the origin of most genes without similarity cannot be attributed to divergence beyond recognition. This implies a substantial role for other evolutionary mechanisms such as de novo emergence…
 
we have specifically addressed this problem and demonstrated that sequence divergence of ancestral genes explains only a minority of orphans and TRGs.
 
We were very conservative when estimating the proportion of orphans and TRGs that have evolved by complete divergence inside regions of conserved micro-synteny. Indeed, we simultaneously underestimated the number of orphans and TRGs while overestimating the number that originated by divergence. We underestimated the total number of orphans and TRGs by relying on relaxed similarity search parameters. As a result, we can be confident that those genes without detectable similarity really are orphans and TRGs, but in turn we also know that some will have spurious similarity hits giving the illusion that they have homologues when they do not in reality…[4]
 
 
 
Which led the researchers to conclude that the origins of orphan genes could not be solely attributed to diversity beyond recognition but were forced to conclude that the process of de novo emergence also had a substantial role to play and perhaps even more so than DBR:
 
 
 
 
 
We looked for cases of focal genes that resulted from complete lineage-specific divergence along a specific phylogenetic branch. When comparing the CDS lengths of these focal genes to those of their undetectable homologues, we found that focal genes tend to be much shorter. This finding could partially explain the shorter lengths frequently associated with young genes…
 
We were very conservative when estimating the proportion of orphans and TRGs that have evolved by complete divergence inside regions of conserved micro-synteny. Indeed, we simultaneously underestimated the number of orphans and TRGs while overestimating the number that originated by divergence. We underestimated the total number of orphans and TRGs by relying on relaxed similarity search parameters. As a result, we can be confident that those genes without detectable similarity really are orphans and TRGs, but in turn we also know that some will have spurious similarity hits giving the illusion that they have homologues when they do not in reality. Furthermore, the annotation that we used in yeast does not include the vast majority of dubious ORFs, labelled as such because they are not evolutionarily conserved even though most are supported by experimental evidence. [5]
 
 
 
(ORFs being Open Reading Frames)
 
 
 
A paper proposing de novo emergence go into much greater detail as to how evolutionists believe novel genes may have come about:
 
 
 
 
it has been proposed that pervasive translation of non-genic transcripts can expose genetic variation, in the form of novel polypeptides, to natural selection, thereby purging toxic sequences and providing adaptive potential to the organism...[6]
 
 
 
To test this proposal, a study was conducted on a number of species of yeast:
 
 
 
Analyses of genome-wide TM propensities led us to hypothesize that novel adaptive TM peptides may spontaneously emerge when thymine-rich non-genic regions become translated: a “TM-first” model of gene birth. The plausibility of this model is supported by a detailed reconstruction of the evolutionary history of one locus where an ORF (YBR196C-A) emerged de novo in a thymine-rich ancestral non-genic region, accumulated substantial changes under positive selection and progressively increased its TM propensity to give rise to a protein that integrates into the membrane of the endoplasmic reticulum (ER) while retaining the potential for adaptive change…[7]
 
 
 
(TM an abbreviation of transmembrane)
 
 
 
Across kingdoms, one type of evolutionary change that typically accompanies the maturation of young genes is an increase in expression level. It follows that, according to our prediction, increasing the expression level of emerging ORFs should increase the organism’s fitness more frequently than when the same perturbation is imposed on established ORFs (whose expression levels have presumably been molded by natural selection). Alternatively, if emerging ORFs mostly correspond to spurious non-genic loci with no role in de novo gene birth, increasing their expression level should generally be neutral or toxic, and not provide fitness benefits…
Ancestral reconstruction of the genomic region along the clade showed that no potential ORF longer than 30 codons was present in the Saccharomyces ancestor, in any reading frame, confirming the de novo origination of YBR196C-A…[8]
 
 
 
Here they presume the established open reading frames to be older and the emerging open reading frames to be those in the process of evolving and are much more expressive than the older ORFs but are they really ORFs in the process of emerging or is what they are witnessing simply two different types of ORFs serving different purposes?
 
 
 
The initial ORF that became YBR196C-A (YBR_Initial) likely originated at the common ancestor of S. kudriavzevii, S. mikatae, S. paradoxus and S. cerevisiae and already encoded putative TM domains. In fact, the ancestral non-genic sequence at the base of the clade already contained a suite of codons that would have had the capacity to encode TM domains, had it not been interrupted by stop codons…[9]
 
 
 
The non-genic sequence already had within itself the potential to produce this gene.
 
 
 
This TM propensity persisted in most extant sequences despite substantial primary sequence changes. Consistent with our previous analyses, YBR196C-A is extremely T-rich (48%, 99th percentile of all annotated ORFs) and so are its extant relatives and reconstructed ancestors. The inferred evolutionary history of the YBR196C-A locus was therefore consistent with a TM-first scenario…
 
By combining evolutionary, structural and overexpression analyses of the YBR196C-A locus, we provided an unprecedented view of how a thymine-rich intergenic sequence with high TM propensity may, upon acquisition of translation signals, be molded by positive selection into a genuine TM protein with the potential for adaptive change, and mature over millions of years. Future studies are needed to determine in which circumstances Ybr196c-a is natively translated and uncover what specific activities of the protein are under positive selection. To date, this is the only locus whose evolutionary history has been investigated in enough detail to corroborate a TM-first model of de novo gene emergence. The TM-first model is an attractive hypothesis that may explain how sequences that were not translated previously could spontaneously exhibit secondary structures with the potential for adaptive change.
 
Our analyses suggest that a simple thymine bias suffices to generate a diverse reservoir of novel TM peptides, and that incipient proto-genes with TM domains are more likely to increase fitness than proto-genes without TM domains. This could account for the observation that young ORFs have high TM propensities across multiple yeast species. Beyond yeast, putative de novo genes with TM domains have also been characterized. Furthermore, evidence suggests that the fitness-enhancing capacities of small TM proteins might extend to bacteria as well as to mouse. Finally, unannotated TM sequences may also be pervasively translated in bacteria, insects and mammals…[10]
 
 
 
 
They deduce that because the ORF YBR16C-A possesses a substantial amount of thymine that it must have been produced from a thymine-rich intergenic sequence with high transmembrane propensity and believe that once translation signals were acquired, was molded into transmembrane protein possessing the potential for adaptive changes.  The potential for the production of this potentially adaptive protein and gene was already in place.  It didn’t just suddenly come about but came about with the use of material already in existence.
 
The Triangle Association for the Science of Creation further explains what might actually be taking place:
 
 
 
Although they may not fully understand the function of the entire genetic sequence and the complex regulatory systems of organisms, biologists are able to identify the genes—those shorter sequences of hundreds or thousands of base pairs along the DNA strand that translate specifically into amino acid sequences that are then folded into proteins. It is these relatively short but numerous genes that define individual specific proteins that are used in the structure and function of living organisms…
 
Not only are these orphan genes unique for their species or group, but they are also separated in the “genetic space” by such large distances (i.e., sequence-level differences in terms of base pairs from any known other gene sequences) that it is not understood how they could have originated by any slow, incremental process from other known genes based on known genetic mechanisms…[11]
 
 
 
And going back to eLife paper reporting on the testing of the validity of the DBR hypothesis, the researchers stated observations in their respective studies and research that suggested that what was being produced by way of DBR may not have been the production of “new” genes but a loss of genetic information:
 
 
 
 
Many studies have previously reported that genes without detectable homologues tended to be shorter than conserved ones.  This relationship has been interpreted as evidence that young genes can arise de novo from short open reading frames but also as the result of a bias due to short genes having higher evolutionary rates, which may explain why their homologues are hard to find.  Our results enable another view of these correlations of evolutionary rate, gene age and gene length.  We have shown that an event akin to incomplete pseudogenization could be taking place, wherein a gene loses functionality through some disruption, thus triggering rapid divergence due to absence of constraint. After a period of evolutionary ‘free fall', this would eventually lead to an entirely novel sequence. If this is correct, then it could explain why some short genes, presenting as young, evolve faster.  [12]
 
 
 
 
A loss of genetic function and information is not evolutionary, but degeneracy.   Jeffrey P. Tomkins of the Institute for Creation Research explains in further detail as to why Creationists are not convinced that divergence beyond recognition or de novo emergence could satisfactorily explain why every respective form of life possesses genes unique unto itself and which share nothing in common with any other kind of life:
 
 
 
Because evolutionists do not ascribe the design and complexity of the genome to the Creator God, they have come up with a wide variety of speculative mechanisms for the origin of OGs. The most popular idea is called de novo gene birth, where they claim genes somehow arise from noncoding regions of the genome such as areas between genes (intergenic segments) or noncoding regions within genes (introns). This idea is absolutely untenable since genes are very complex, containing promoters, regulatory elements, open reading frames (if coding for proteins), and many different types of embedded signal sequences to regulate transcription, cellular transport of the RNA product, and translation (protein production). To think that such information-rich code could magically pop out of so-called random DNA sequences borders on absurdity. And de novo gene birth has never been scientifically documented. And because the process of this type of “gene birth” has never been observed in the gradual development of a new gene, evolutionists claim that it happens rapidly.
 
Another speculative mechanism is that OGs diverge from existing genes, where a gene is duplicated and then somehow becomes so mutated in its sequence through copying errors or other damage that the duplicated gene appears “orphanish.” The problem with this idea is that random genetic errors, especially on a massive scale, can never create new and useful information. Furthermore, genetic corruption on the scale required to radically alter a gene are not allowed to occur in the genome due to the effective application of built-in DNA surveillance and error-correction systems that are constantly at work to protect the chromosomes from such dangerous activity.
 
Yet another proposed mechanism is the precise rearrangement of preexisting genes that actually occurs in a single step during chromosomal recombination during meiosis. This can include gene fusion, gene fission, exon shuffling, and other rearrangements. These structural changes in the genome can produce novel combinations and new reading frames. The problem with this idea for evolutionists is that genetic recombination is a highly regulated, nonrandom process, and these sorts of functional structural variants are part of built-in design features to create adaptive variability. If genetic recombination were not strictly controlled, organisms would soon die. [13]
 
 
 
Even evolutionists will admit that neither their proposals for DBR or de novo emergence can really provide a satisfactory explanation for the presence of orphan genes:
 
 
 
The origin of ‘orphan’ genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown… The persistent presence of orphans and TRGs in almost every genome studied to date despite the growing number of available sequence databases demands an explanation. [14]
 
 
It is particularly unclear how non-genic sequences could spontaneously encode proteins with specific and useful capacities…it remains unknown if, how, how often, and how rapidly, may native proto-genes accumulate adaptive fitness-enhancing changes to become established genes…[15]
 
Orphan genes are found every time a new genome is sequenced. Their ubiquity has been one of the biggest surprises of genomics over the last 20 years. Many researchers had hypothesized that the number of orphan genes found would steadily diminish as more and more genomes were sequenced – but this is not the case. Orphan genes continue to comprise a sizeable proportion of each new genome sequenced.
 
Orphan genes are "the hard problem" for evolutionary genomics. Because we can't find other genes similar to them in other species, we can't build family trees for them. We cannot hypothesise their gradual evolution; instead they seem to appear out of nowhere. Various attempts have been made at explaining their origins but…the problem remains unsolved.
 
Given their ubiquity in all genome sequences, orphan genes receive comparatively little attention from the research community. I suspect this is partly because they are such a difficult problem.  Popularizers and communicators of science have also had surprisingly little to say about orphan genes. This is a pity: what can be more interesting and more inspiring than an unsolved mystery? Who could choose to ignore a lost orphan? [16]
 
 
And they are right about one thing:
 
 
 
in a strict sense, nothing in evolution is created de novo. Each new gene must have arisen from an existing gene…[17]
 
 
 
A greater challenge that evolutionists face is explaining how genes came into existence at all.  If “new” genes don’t just suddenly appear, then neither could genes and DNA have just suddenly appeared in existence either much less with a blueprint for each respective form of life.
 
For even unrelated forms of life designed to carry out similar functions or operate in somewhat similar fashions, it should be no surprise that they should have some degree of genetic similarity, but that they would also possess genetic differences unique unto themselves and to no other kind of life and which cannot be traced back to anything ancestral to all life should serve as evidence that microbes, plants, animals, and mankind were created to reproduce after their own respective kinds and nothing more. (Gen. 1:11-12, 21, 24-25)

 

 

 

End notes:


 
1.  Jeffrey P. Tompkins; P.H.D, “Newly Discovered 'Orphan Genes' Defy Evolution,” 
Institute For Creation Research, August 23, 2016
https://www.icr.org/article/newly-discovered-orphan-genes-defy
 
2.  Nikolaos Vakirlis, Anne-Ruxandra Carvunis, and Aoife McLysaght, “Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes,”
e Life, February 18, 2020
https://elifesciences.org/articles/53500
 
3.  Ibid.
 
4.  Ibid.
 
5.  Ibid.
 
6.  Omer Acar, Brian Hsu, Nikolaos Vakirlis, Aaron Wacholder, Kate Medetgul-Ernar, Ray W. Bowman II,Nelson Castilho Coelho, Saurin Bipin Parikh, Aoife McLysaght, Carlos J. Camacho, S. Branden Van Oss,Cameron P. Hines, John Iannotta, Allyson F. O’Donnell3, Trey Ideker, Anne-Ruxandra Carvunis, “De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences,”  
Nature Communications, pg. 2, February 18, 2020
https://www.nature.com/articles/s41467-020-14500-z.pdf
 
7.  Ibid. pg. 2
 
8.  Ibid. pg. 4
 
9.  Ibid. pg. 10
 
10.  Ibid. pg. 10-11
 
11.  Matt Welborn; PhD, “The Discovery and Implications of Orphan Genes,” 
Triangle Association for the Science of Creation, October 1, 2023
https://tasc-creationscience.org/article/discovery-and-implications-orphan-genes
 
12.  Vakirlis,“Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes,” e Life, February 18, 2020
 
13.  Jeffrey P.  Tompkins, “Novel Orphan Genes Aid in Regulated Adaptation,” 
Institute For Creation Research, December 30, 2025
https://www.icr.org/article/novel-orphan-genes-aid-regulated-adaptation
 
14.  Vakirlis,“Synteny-based analyses indicate that sequence divergence is not the main source of orphan genes,” e Life, February 18, 2020
 
15.  Acar, “De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences,” Nature Communications, pg. 2,  February 18, 2020
 
16.  Richard Buggs, “The evolutionary mystery of orphan genes,” 
Springer Nature Research Communities, December 28, 2016
https://communities.springernature.com/posts/the-evolutionary-mystery-of-orphan-genes
 
17.  Vivian Callier, contributing writer, “Where Do New Genes Come From?”
Quanta Magazine, April 9, 2020
https://www.quantamagazine.org/where-do-new-genes-come-from-20200409/
 
 
 
 
Scripture references:
 
 
 
 
1.  Genesis 1:11-12, 21, 24-25





                                                                                                                                                                                                                                                                                                                                              

No comments:

Post a Comment