Skip to main content

GWAS & Polygenic Scores

This page is still a work in progress.

Genetic basis of intelligence variation

The molecular genetic analysis of variation in intelligence presents a different picture than the analysis of some pathological conditions or illnesses. The first thing to say, and to say loudly, is that there is no one gene, or even a small number of genes, responsible for variations in intelligence in the normal range. If there were, we would have found it by now with modern techniques. However, from behavioral genetic studies, we know that in the populations studied, at least half of the variation in intelligence is due to genetic variation. Apparently, we are dealing with a large number of genes, each having a small effect. Multiple genes imply a polygenic inheritance model. Unless a sample is extremely large, the chances of detecting small effects are not good. The challenge is to collect DNA and measures of intelligence on samples larger than have ever been studied by individual research groups. Amazingly, with the advent of multinational consortia designed to collect and share data, new DNA research related to intelligence has reached powerful sample sizes of more than a million people.

A brief overview about the genome.

DNA (Deoxyribose nucleic acid) is double-stranded and shaped like a helix, it is the fundamental molecule that stores all our genetic information composed of 4 bases: (A), thymine (T), guanine (G), and cytosine (C). Bases pair specifically at A-T and C-G each pair is called a base pair. Genes are segments of our DNA that code for proteins, each gene is a sequence of base pairs that provides instructions for building a specific protein. The order of these bases forms a kind of language: short stretches spell out genes, which the cell reads and translates into proteins through mRNA, while longer stretches include regulatory sequences that determine when, where, and how strongly each gene is expressed. Because you inherit one complete copy of DNA from each parent, every position in this genetic code comes in pairs, and the specific combination you receive is known as your genotype. Each position in a gene may come in different forms called alleles, which can influence how traits are expressed, depending on whether one is dominant, recessive, or co-dominant. Whether a gene is used at all is governed by epigenetic marks. Methyl groups on DNA or chemical tweaks on histones loosen or tighten the coil, making a gene readable or silent. When meiosis shuffles DNA for the next generation, pieces that sit close together on the same chromosome tend to travel as a package; these blocks of linked alleles are called haplotypes.

A mutation occurs when there is a change in a base pair, creating a new allele (called a polymorphism). This can happen due to errors in the process of DNA replication, or mutations can be triggered by exposure to environmental hazards. These include radiation, ultraviolet light, and exposure to certain chemicals. Importantly, some mutations appear to occur by random chance events of unknown cause

There are approximately 3 billion (3,000,000,000) base pairs in the human genome. The number of protein-encoding genes is estimated to be only about 20,000. Humans share about 99 percent of all base pairs with each other and about 96 percent with chimpanzees. All the differences within humans and between humans and chimps are in the 1–4 percent of the billion base pairs. This leaves about 30 to 120 million potential genetic differences to sort through for finding those relevant to complex traits like intelligence. This endeavor presents enormous problems for data storage and analyses that fuel the field of bioinformatics.

The development of screening methods for identifying Genetic effects.

Early discoveries followed a discouraging pattern. Initially, putative genes emerged as reasonable candidates because of their involvement with neural efficiency, brain development, or some other key function that ought to be related to intelligence. But initial associations did not replicate in independent studies. The fact that this vexing pattern occurred so frequently led scientists to speculate that no single gene accounts for more than 0.5 percent of the variance in intelligence. They reiterated this theme in an address to the International Society for Intelligence Research, Amsterdam, in December and in an interview with the same society in Albuquerque in.

At this stage of research, the complexity of the hunt has three basic aspects: at least hundreds of genes are likely involved, and each one has only a tiny effect; individual genes can affect more than one trait or condition (pleiotropy); and the functional expression of genes (how they turn on and off across the life span) often depends on complex interactions with other genes, environmental factors like stress, and even random events that influence the developing brain.

In fact, some researchers believe the dynamic cascade of neurobiological events defined at these interactive levels from genes to brain development to traits is so complicated that it is not possible to establish any causal links at all, now or ever. Are they right? Well, there is progress.

Early attempts to find specific genes for intelligence were limited to needle-in-the-haystack methods. Essentially, the methods compared DNA samples for specific genes that were more frequent in a group defined by high IQ scores than in a low-IQ group. Candidate genes were identified for further study. One problem was that even in the best of these studies, no one candidate gene accounted for much variance. Not surprisingly, successful replications were minimal.

After about fifteen years of trying these methods, it became apparent that the group sample sizes were far too small to overcome the statistical problem of multiple small effects and random false positives. Estimates based on mathematical models indicated that sample sizes of at least a million people might be needed to find relevant genes, even if an entire genome could be tested. The cost would be astronomical.

How could this research progress?

The solution of GWAS

Genetic researchers understood that the best solution to this problem was to form multinational consortia to pool data into large samples beyond the resources of any one research group. It sounds easier than it is to pool data from different research groups. Complex questions about standardized protocols, ownership of data, authorship of publications, logistical management of and access to huge data sets, individual institutional regulations, and other issues were all overcome. Moreover, to accelerate progress, these large data sets mostly are available to all researchers, not just limited to those in the consortia. By any standard, the emergence of these cooperative consortia is a major advance in science to celebrate.

One more critical development occurred that was a game-changer: genetic technology advanced dramatically so that researchers could genotype thousands or even millions of sites across the entire genome for each research participant with cost-effective methods. This astonishing technology has led to a new phase of GWAS and the analysis of errors in individual base pairs called SNPs. GWAS have expanded rapidly from initial sample sizes of thousands of people to millions, and the resulting increase in statistical power has increased the probability of finding SNPs possibly related to intelligence measures.


References

  1. Plomin, R., & von Stumm, S. (2018). The new genetics of intelligence. Nature Reviews Genetics, 19, 148-159.
  2. Visscher, P. M., Wray, N. R., et al. (2017). 10 years of GWAS discovery: biology, function, and translation. Nature Reviews Genetics, 18, 473-488.
  3. Davies, G., Lam, M., et al. (2018). Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nature Communications, 9, 2098.
  4. International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature, 409, 860-921.
  5. Feil, R., & Fraga, M. F. (2012). Epigenetics and the environment: emerging patterns and implications. Nature Reviews Genetics, 13, 97-109.
  6. Gabriel, S. B., et al. (2002). The structure of haplotype blocks in the human genome. Science, 296, 2225-2229.
  7. Auton, A., et al. (2015). A global reference for human genetic variation. Nature, 526, 68-74.
  8. Savage, J. E., et al. (2018). Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nature Genetics, 50, 912-919.