General Intelligence (g Factor)

Key Points

The g factor (general intelligence) is a statistical construct used to describe the common source of individual differences across diverse cognitive tasks.
It is inferred from the positive manifold, the consistent finding that scores on different mental ability tests are positively correlated.
It is estimated using psychometric techniques such as factor analysis and is measured through IQ scores.
g shows substantial heritability and is consistently associated with a range of biological and neurological measures, including brain structure, brain function, and neural processing efficiency.
g is one of the most extensively researched and empirically validated constructs in psychology and is a strong predictor, if not the strongest, for a wide range of educational, occupational, and life outcomes.

If you wish to develop a deep understanding of g, we highly recommend reading The g Factor by Arthur Jensen.

Definition and Origin

The concept of g, or general intelligence, originated in 1904 with the work of Charles Spearman, an English psychologist who observed that individuals’ performances across a wide variety of seemingly unrelated mental tasks tended to be positively correlated. Whether tests measured memory, reasoning, verbal ability, or sensory discrimination, people who performed well on one type of task were more likely to perform well on others.

Spearman saw this pattern of positive correlations (later termed the positive manifold) among diverse mental abilities as evidence for a single factor underlying all manner of cognitive performance, which was dubbed g, for general intelligence.

Table 1. Reproduced from Haier (2017), correlation coefficients among scholarly subjects reported by Charles Spearman in 1904.

Variable	Classics	French	English	Math	Pitch	Music	g
Classics		0.83	0.78	0.70	0.66	0.63	0.96
French			0.67	0.67	0.65	0.57	0.88
English				0.64	0.54	0.51	0.80
Math					0.45	0.51	0.75
Pitch						0.40	0.67
Music							0.65
Average r	0.72	0.68	0.63	0.59	0.54	0.52

Note. All coefficients are positive, illustrating the positive manifold. The table is reproduced from Haier (2017), summarizing correlations originally reported by Spearman (1904). The average correlation (r) of each subject with all other subjects and the loading of each subject on the general factor (g) are shown.

Table 1 illustrates this pattern using correlations among diverse academic and sensory measures from Spearman’s original reports¹. Rather than proposing g as a specific mental skill or as emergent from content similarity between tests, Spearman interpreted it as a common source of variance shared by all cognitive abilities, completely stripped of their distinctive features of information content, skill, strategy, and the like.

The idea of g was therefore introduced to explain why cognitive abilities are correlated in the first place. Since then, g has become one of the most replicated psychological findings². Almost always, cognitive tests give rise to a positive manifold that can be accounted for by a general factor explaining approximately 40-50% of the total test variance in both Western and non-Western cultures³. This is even true of tests designed not to produce a general factor, but rather to measure different and unrelated abilities⁴. There is strong evidence that g is a universal phenomenon among humans, given that it has been found in 31 non-Western, nonindustrialized nations³. The g factor has been found in many other species, including but not limited to dogs, rats and mice, donkeys, and dozens of non-human primates⁵.

Psychometric Structure of Cognitive Abilities

To explain the recurring pattern of the positive manifold among diverse mental tests, Spearman proposed a two-factor theory of intelligence where variance between scores on cognitive tests could be explained by a general factor of intelligence (g) that all tasks shared, as well as test-specific factors (s) unique to each task.

In order to model his theory of intelligence, Spearman developed a statistical technique known as factor analysis. Factor analysis is designed to identify latent variables, which are traits that cannot be observed directly and are instead inferred from patterns arising in observable measures, with intelligence being the latent variable in this case. Through factor analysis, Spearman was able to determine that the positive correlations between different cognitive tests could be explained by a single factor, g, alongside smaller task-specific influences. Since then, factor analysis has evolved, and modern psychometrics distinguishes between Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). However, the basic idea of a central factor g underlying cognitive performance still remains the foundation of modern psychometrics.

Modern Full-Scale IQ (FSIQ) tests use large batteries of diverse subtests, and through this, researchers have found that beyond g, tests will cluster strongly among one another as well. For example, verbal subtests correlate more strongly with each other, reasoning subtests with one another, memory with each other, and so on. Through this, many group factors have been identified under g, representing broad subdomains. As such, variance between scores on IQ tests can be attributed to variance unique to the first factor (g), then variance shared among a subset of related tests (f), variance unique to the subtest itself (s), and measurement error (e).

Classical test theory represents this for a given test as such:

$$ X_i = g + f_i + s_i + e_i $$

Although g is certainly not the only important factor, its extraordinary generality makes it the most important factor. In a large battery of diverse cognitive tests, g typically accounts for some 30 – 50 percent of the total population variance in test scores, far exceeding any of the subordinate factors⁶.

Factor Analysis

Factor analysis models these relationships in two different ways, through a hierarchical model and through a bifactor model. In hierarchical models, individual tests cluster together to form first-order factors that represent constructs that are more narrow than g (some common examples being verbal comprehension, fluid reasoning, visual-spatial, working memory, and processing speed). Since these first-order factors strongly correlate with one another, their shared variance can be summarized by a higher-order factor, g.

Below⁷ is an example of a hierarchical model taken from a study performing confirmatory factory analysis on the WAIS-4, a gold-standard professional IQ test commonly used by psychologists and researchers. The squared boxes contain the following tested subtests: Similarities (SI), Vocabulary (VO), Information (IN), Block Design (BD), Matrix Reasoning (MR), Visual Puzzles (VP), Digit Span (DS), Arithmetic (AR), Symbol Search (SS), and Coding (CD).

**Figure 1.** Confirmatory factor analysis with a hierarchical structure from the WAIS-4. Note the pyramid structure, with variance flowing downwards from g.

Another way to model g is through a bifactor model, which represents the same covariance between subtests in a different way. In a bifactor model, each subtest loads directly on a general factor (g) while also loading on another domain-specific factor. This allows researchers to isolate the influence of g on each subtest as well as specific abilities, rather than through multiple levels of hierarchy. Below⁷ is an example of a bifactor model.

**Figure 2.** Confirmatory factor analysis with a bifactor structure performed on WAIS-4. Note that subtests are allowed to load freely on both g as well as the broad factors. Most subtests load predominantly on g, with broad factors explaining smaller residual domain-specific variance.

The Cattell-Horn-Carroll Theory of Intelligence

The most widely accepted and empirically supported model of cognitive abilities used by contemporary researchers is known as the Cattell-Horn-Carroll (CHC) theory⁸, which is the culmination of decades of factor analytic research on specific and diverse cognitive abilities. CHC was highly influenced by the seminal Human Cognitive Abilities: A Survey of Factor-Analytic Studies by John Carroll (1993). This behemoth of a book showed the results of a then-modern method of factor analysis on over 450 datasets from 19 countries. The vast majority of datasets produced a hierarchy of factors with g at the top; this soon led to the development of the CHC model.

**Figure 3.** A diagram of the CHC model. General intelligence (g) is placed at the top, followed by several broad abilities, such as fluid reasoning (gf), crystallized knowledge, (gc), visual-spatial ability (gv), processing speed (gs), and so on, which are measured by numerous narrow abilities and subtests. Adapted from Schneider & McGrew (2012).

The CHC model resolved a great number of controversies that had plagued intelligence research for decades. The CHC model showed, for example, that Spearman was correct that g existed, but also that Louis Thurstone was correct that broad, non-g abilities existed and were important. Today, the CHC theory of intelligence is used as the primary theoretical framework through which modern intelligence research maps out psychometric relationships between general, broad, and specific cognitive abilities.

g-Loadings and Test Characteristics

Not all tests measure g to the same degree. Through factor analysis, how strongly a subtest (or composite of subtests) correlates to g can be statistically determined and is referred to as its g-loading. A test with a high g-loading is a strong measure of general intelligence and vice versa. Tests with a g-loading above 0.8 are considered great, and gold-standard professional tests tend to correlate with g at >0.9.

Similarly, not all tasks correlate equally with g. Tasks that require complex reasoning, abstract problem solving, or the coordination of multiple cognitive processes tend to show higher g-loadings, since performance depends more on cognitive ability. On the other hand, tasks that rely more on specialized knowledge, learned routines, or perceptual speed often show lower g-loadings (as more of their variance can be explained by specific abilities or other factors).

Importantly, g is not related to a specific subject, skill, or domain per se. Highly g-loaded tests may involve numbers, words, shapes, or even nonsensical symbols. What matters is the cognitive demands required to solve it. Almost every conceivable task involving cognition taps into g (even something seemingly unrelated, such as driving or dancing ability), what varies is how g-loaded said task is. Modern IQ tests are therefore designed using decades of psychometric research to include tasks that show high correlations with the general factor, to estimate g as reliably as possible.

What g Is and Is Not

Firstly, g is not a mixture or an average of a number of diverse tests representing different abilities. Rather, it is a distillate, reflecting the single factor that all different manifestations of cognition have in common. In fact, g is not really an ability at all. It does not reflect the tests’ contents per se, or any particular skill or type of performance. It defies description in purely psychological terms. In actuality, it reflects some physical properties of the brain that ultimately cause diverse forms of cognitive activity to be positively correlated, not only in psychometric tests but in all of life’s mental demands.

Secondly, g should not be interpreted as the sole determinant of cognitive performance or life outcomes. Although it has strong predictive capabilities, there are many factors that contribute to differences between individuals in achievement and behavior, such as personality traits, childhood, motivation, environment, etc. This is not to diminish g’s predictive power, but to caution readers to have realistic standards for its use. However, g still remains one of the (if not the) single best predictors of many life outcomes, such as income, life expectancy, job performance, and educational attainment, even though these outcomes are not themselves direct measures of intelligence.

In IQ tests

It is important to distinguish g from the IQ score reported on intelligence tests. An IQ score is a representation of your ability in terms of its rarity among the general population. For example, getting 16 questions correct on one test vs. another will not be directly interpretable due to differences in test content, format, age, and so on. Thus, scores are normed into IQ scores, which represent their rarity among the general population.

But because an IQ test is just a vehicle for g, it inevitably reflects other broad factors as well, such as verbal, numerical, and spatial abilities, besides the specific properties of any particular IQ test. Yet, under proper conditions, the IQ is a good estimate of an individual’s relative standing on g⁶. However, some distinctions exist.

An IQ score can be represented in CTT as follows:

$$ IQ = g + e $$

In the formula, IQ scores are represented as a summation of g and measurement error. In proper testing conditions (as taken by the norming population), with proper sleep, good health, no distractions, etc., tests will be about as accurate as the g-loading calculated on the population sample. However, if a test is taken under improper conditions, performance may be affected by factors unrelated to the examinee’s true ability. In such cases, measurement error increases, reducing the extent to which the observed score reflects the underlying g factor and lowering the accuracy of the score itself.

This is why interpretations of IQ scores need to take into account error along with g. For example, many people may retake a test and then score significantly higher. This does not mean your general intelligence actually rose, but rather, the test’s accuracy itself decreased, as measurement error e grew through non-g effects, such as test familiarity and practice. Many proposed training interventions succumb to this pitfall; gains in IQ scores are not actually reflective of increased g but rather increased e.

Another common example is when a foreign speaker takes an English IQ test. Unfamiliarity with the language will impact the accuracy of the test to measure g, as it is picking up extraneous factors, such as (lack of) English ability.

The Jensen Effect

One way to distinguish whether differences in scores reflect g or measurement error e is through the Jensen effect. The Jensen effect refers to the empirical finding that differences on IQ tests increase on subtests with higher g-loadings, and decreases on ones with lower g-loadings. In other words, when the magnitude of score differences increase proportionally to how g-loaded a subtest is, we can likely attribute that difference to g. Similarly, if the differences are not correlated to g-loading, then they are likely due to non-g influences, or e.

g and Real World Outcomes

Although g is defined as a statistical construct arising from mental tests, it also predicts a wide range of real-world outcomes. More than any other factors, g is correlated with many important variables in the practical world, like education, job proficiency, occupational level, creativity, spouse selection, health status, longevity, accident rates, delinquency, and crime⁶. The figure below summarizes results from multiple meta-analyses examining the relationship between intelligence and a variety of life outcomes⁹. The strongest correlations are typically observed for job performance and academic achievement, where correlations often fall between r ≈ .40 and .60. More moderate correlations are found for income, leadership success, and creativity, while small relationships are observed for variables that are largely unrelated to cognitive ability (such as happiness or popularity).

**Figure 4.** Relationship between intelligence and measures of success (Strenze, 2015).

Note. A helpful guide to interpreting correlations in individual differences research can be found in the glossary here.

Results of a 30-year research program on general and specific abilities (g and s) demonstrate that the g factor is responsible for most of the predictive validity for job and training performance. Specific abilities, while sometimes statistically significant, add little incremental validity for prediction after accounting for g. This illustrates that although g is measured through cognitive tests, its predictive validity extends far beyond just test-taking ability. A more detailed discussion of the broader literature on intelligence and life outcomes can be found in the Everyday Life Outcomes article.

But what is intelligence?

In practice, g-loaded tests capture a large part of what individuals mean to impart when they colloquially use the word "intelligence". Wedding a long history of psychometrics with useful folk ideas, Linda Gottfredson’s definition of intelligence encapsulates it well¹⁰:

Intelligence is a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings — ‘catching on,’ ‘making sense’ of things, or ‘figuring out’ what to do.

(Gottfredson, 1997, p. 13)

This is part of a statement that outlines conclusions on intelligence regarded as mainstream among researchers. Fifty-two prominent intelligence researchers signed it, and it has since become one of the most cited definitions of intelligence in the psychometric literature. Nevertheless, laypeople and researchers may still strenuously disagree over the precise definition of intelligence. The lack of agreement is not an indictment of the science of human intelligence or psychometrics as having been unproductive or the like, because science does not concern itself with debating definitions at the level of colloquial language. Science does not demand or require perfect, universally accepted definitions for particular words. Science also does not adjudicate or prescribe conventional definitions, especially for a word in popular parlance that is so fraught with value judgments, emotions, and prejudices. It makes no sense, as a scientific matter, to disagree over them.

Science concerns itself with translating observations and everyday ideas into an epistemically objective and testable form that can help make predictions and advance understanding. This process of operationalization is central to psychometrics (and much of the social sciences), since features of the mind cannot be directly measured in the same way that properties of physical objects can. Psychometrics proceeds indirectly. Factor analytic methods infer the latent variables that explain covariance in performance among a diverse array of mental tests (i.e., the observed variables).

Consequently, in scientific, psychiatric, and educational practice, ”intelligence” has been operationalized as performance on a diverse battery of tests that are normatively scaled (e.g., the Wechsler's). The tests and their contents are the result of a century of trial and error and gradual improvement. Seasoned test creators know how to devise tests so that they correlate highly with performance on tests that vastly differ in content (high g-load, and hence more correlated to various real-world outcomes, etc.). Most basically, these tests tend to require inductive and deductive reasoning, grasping relationships, inferring rules, generalizing, seeing the similarity in things that differ (e.g., reward-punishment) or the difference between similar things, and decontextualization, regardless of the specific content (visual transformations, verbal abstractions, or quantities). Nowhere in this process need researchers be concerned with the abstract essence of the word “intelligence”, if there exists such a thing. They are ultimately concerned with empirical validity, as assessed by the tests' construct validity, criterion (e.g., predictive) validity, discriminant validity, and so on.

Ultimately, the everyday word “intelligence” may remain imprecise and contested, but the psychometric construct is not. In his The g Factor: The Science of Mental Ability (1998), Arthur Jensen entirely dispensed with the word "intelligence" as an intraspecies concept, because, "it has proved to be either undefinable or arbitrarily defined without a scientifically acceptable degree of consensus" (p. 45)⁶.

Biological correlates of g

The fact that psychometric g has many physical correlates proves that g is not just a methodological artifact of the content and formal characteristics of mental tests or of the mathematical properties of factor analysis, but is a biological phenomenon. The correlations of g with physical variables can be functional (causal), or genetically pleiotropic (two or more different phenotypic effects attributable to the same gene), or genetically correlated through cross-assortative mating on both traits, or the nongenetic result of both being affected by some environmental factor (e.g., nutrition). The physical characteristics correlated with g that are empirically best established are head size, brain size, frequency of alpha brain waves, latency and amplitude of evoked brain potentials, rate of brain glucose metabolism, and general health (p. 137)⁶.

The more a test loads on g, the higher the biological correlations are. Crucially, g reflects biological components of intelligence more strongly than any other psychometric factors (or any combination thereof) that are statistically independent of g. It is clear that g, since it is a product of human evolution, is strongly enmeshed with many other organismic variables¹¹.

The Brain

One of the best known correlations is that of brain size and intelligence. Brain size is usually measured as total volume, assessed in magnetic resonance imaging (MRI) scans. A meta-analysis of data from over 148 studies across more than 8000 individuals estimated the association at r = 0.24. A re-analysis of those data including only healthy adults estimated the association at r = 0.31; this rose to r = 0.39 when it included only the studies judged to have used better-quality intelligence testing¹². That means roughly 10–15% of intelligence variance is explained by brain volume alone, which is a large effect in biology¹¹. The most replicated neural correlate of human intelligence to date is total brain volume.

Brain size is a crude measure. A more important finding is white matter integrity, measured using diffusion tensor imaging. White matter reflects the speed and fidelity of communication between brain regions. Fractional anisotropy (FA), a standard measure of white matter quality, correlates with intelligence at about r = 0.20–0.35¹². Intelligent people have better-organized white matter tracts, especially in frontoparietal pathways.

Exploiting the brain’s structural connectome offers the chance to assess network-based analyses with greater fidelity than via the measurement of fewer, larger pathways. Global measures such as connectomic efficiency, or variation in the ‘degree’ of nodes in morphometric similarity networks, have shown potential to predict intelligence differences, up to a remarkable 40% in one study with an N of 296 young adults¹³. Emerging research tries to move beyond morphometric correlates into functional correlates, with one study finding that resting-state fMRI connectivity matrices predicted 20% of the variance in intelligence among young adults (N = 884)¹⁴.

A 2010 review of the neuroscience of intelligence described parieto-frontal integration theory (P-FIT) as "the best available answer to the question of where in the brain intelligence resides". Under P-FIT, intelligence is not localized in a single "center" but arises from a distributed network of 14 specific brain regions, primarily in the frontal and parietal lobes, connected by white matter tracts like the arcuate fasciculus. Further Reading. The Neuroscience of Intelligence (2nd ed, 2023) by Richard J. Haier.

Heritability and Inbreeding Depression

The heritability (i.e., the proportion of genetic variance) of various tests is directly related to the tests’ g loadings (p. 184)⁶. The more a test is saturated with the g factor, the higher its heritability coefficient. Genetic variance is concentrated in g rather than in specific abilities (also see wiki section on heritability of intelligence).

Inbreeding depression of test scores is a genetic effect that lowers a quantitative trait. It results from the greater frequency of double-recessive alleles in the offspring of genetically related parents, such as cousins. The degree of inbreeding depression on various mental test scores is strongly related to the tests’ g loadings (p. 189)⁶. The larger the g loading, the greater is the effect of inbreeding depression on the test scores.

Criticism and Alternative Models

In the mid-to-late twentieth century, there was dissent (which we will cover next) about the existence and/or importance of general intelligence. However, all that changed in 1993 with the publication of Human Cognitive Abilities: A Survey of Factor-Analytic Studies by John Carroll (1993), which showed the results of a then-modern method of factor analysis on over 450 datasets from 19 countries. The vast majority of datasets produced a hierarchy of factors with g at the top. This soon led to the development of the CHC model, which resolved a great number of controversies that had plagued intelligence research for decades.

Just five years later, Arthur Jensen (1998) published a landmark book, The g Factor: The Science of Mental Ability. This now-classic compiled all of the evidence available at the time on the existence of g, how to measure it, its practical importance, and genetic and environmental influences on people’s intelligence. Jensen also addressed many alternative interpretations of intelligence research and convincingly demonstrated that g theory was the best theory to explain the totality of the data on intelligence. These two books effectively ended many debates about intelligence among experts and got the field to focus on g⁵. One prominent psychologist explained the impact of these books by stating:

Verbal definitions of the intelligence concept have never been adequate or commanded consensus. Carroll’s (1993) Human Cognitive Abilities and Jensen’s (1998) The g Factor ... essentially solve the problem. Development of more sophisticated factor analytic methods than Spearman and Thurstone had makes it clear that there is a g factor, that it is manifested in either omnibus IQ tests or elementary cognitive tasks, that it is strongly hereditary, and that its influence permeates all areas of competence in human life. What remains is to find out what microanatomic or biochemical features of the brain are involved in the hereditable component of g. A century of research ... has resulted in a triumph of scientific psychology, the footdraggers being either uninformed, deficient in quantitative reasoning, or impaired by political correctness (Meehl, 2006, p. 435)².

Multiple Intelligences

One of the most famous proposed alternatives to g is the Multiple Intelligences (MI) theory proposed by Howard Gardner in 1983. Gardner starts with two premises: (1) there is no general trait of overall mental competence, and (2) there are a variety of different and unrelated intelligences. His original framework had seven intelligences, which were later increased to nine, covering linguistic, mathematical, spatial, musical, kinesthetic, interpersonal, intrapersonal, and naturalistic intelligence. Despite the popularity of Gardner’s views (especially among educators), there is virtually no objective supporting evidence (Ferrero et al., 2021)¹⁵. Factor analyses invariably reveal a general factor rather than various statistically independent abilities. Science proceeds when it defines testable hypotheses, and so far, there are no measures of multiple intelligences that show independent factors consistent with his ideas.

Further details on why the MI framework isn’t well supported can be found here in the wiki.

Triarchic Theory of Intelligence

In his triarchic theory of intelligence, Robert Sternberg (1985) distinguished three classes of intelligence (analytic, creative, and practical) that he thought were mostly independent. However, this has been demonstrated to be false (Brody, 2003a¹⁶, 2003b¹⁷). Indeed, the predictive validity of these three types of intelligence is due to a common factor that is nearly indistinguishable from general intelligence (g).

For further reading, see Gottfredson's critical analysis (2003) of Sternberg's Practical Intelligence in Everyday Life (2000) book. Replying to Sternberg's (2003) “Reply to Gottfredson”, Gottfredson concludes:

No one has yet demonstrated that practical intelligence rests on scientifically valid evidence or that it is even a useful construct. No one has yet shown that the various tests of tacit knowledge, the “important aspect” of practical intelligence, measure anything that is not already effectively captured by measures of personality, interests, cognitive abilities, specialized knowledge, and other well-studied human traits and competencies (Gottfredson, 2003).

Gould and The Mismeasure of Man

In The Mismeasure of Man (1981), the Harvard paleontologist Stephen J. Gould wrote a scathing analysis of virtually all aspects of intelligence research. He asserted as false the fundamental idea that intelligence was a meaningful term or could be quantified. He stated, incorrectly, that g was merely a statistical artifact of factor analysis. He concluded that there was no reliable evidence relating brain size to intelligence. When confronted with detailed technical refutations of his key points and even with new MRI-based data on brain size and intelligence⁵, Gould declined to correct his mistakes or modify his opinions in the second edition (1996) of his book. As Rushton (1997, p. 170)⁵ stated:

I know Gould is aware of them [the studies on the brain size–IQ relationship] because my colleagues and I routinely sent him copies as they appeared and asked him what he thought! For the record, let it be known that Gould did not reply to the missives regarding the published scientific data that destroyed the central thesis of his first edition [of The Mismeasure of Man].

Gould had previously achieved considerable public credibility as a commentator on science, so his views were widely accepted despite compelling negative reviews of his work in the technical literature (Jensen, 1982; Davis, 1983; Carroll, 1995), and these negative reviews are sustained and amplified by more recent critiques that include newer research (Warne et al., 2019; Lewis et al., 2011). Today, this book is still widely used in psychology courses to “debunk” intelligence research in general. The “Gould effect” has been coined to describe the deliberate practice of creating false controversy about a scientific question (Woodley et al., 2018). Of The Mismeasure of Man, Arthur Jensen wrote¹⁸:

Instead of taking on the real issues of contemporary research in these fields, paleontologist Gould tilts at a museum collection of scientific fossils and at many a straw person of his own making. ... Present-day workers in these fields will have nothing to worry about!

...Of all the book's references, a full 27 percent precede 1900. Another 44 percent fall between 1900 and 1950 (60 percent of those are before 1925), and only 29 percent are more recent than 1950. From the total literature spanning more than a century, the few "bad apples" have been hand-picked most aptly to serve Gould's purpose. Yet what relevance to current issues in mental testing are the inadequacies and errors of early anatomical studies by Samuel Morton (who died in 1851) or Paul Broca (who died in 1880) concerning racial variation in cranial capacity (to which Gould devotes the better part of two chapters): Who now wishes to resurrect Lombroso's (1836-1909) theory of physical criminal types...?

...Readers expecting to find a forthright critique of the present status of issues and controversies in these fields are in for disappointment. The closest thing they will find to criticism of contemporary mental testing is the insinuation of its guilt through remote historic lineage. ... Gould's exclusive critical focus on forebears (and the worst examples, at that) is much like trying to condemn the modern automobile by merely pointing out the faults of the Model T.

Jensen's review of Gould's The Mismeasure of Man (which the next section will greatly take from) can be found here.

Reification

Gould charges pioneer psychometricians with reifying g; they have supposedly converted an abstract concept, intelligence, into a "single substance" (Gould's words) that occupies space inside the brain. The g factor as conceived by Spearman is variously referred to by Gould as "ineluctable, innate general intelligence," "innate essence of intelligence," a "hard, quantifiable thing," a "quantifiable fundamental particle," a "single, scalable, fundamental 'thing' residing in the human brain," "a 'thing' in the most direct, material sense," and so forth¹⁸. This language is completely misleading. More importantly, it is Gould's language, not Spearman's or other pioneers' in the field, and it is absent in any of the serious literature of factor analysis and intelligence.

Spearman himself remained agnostic about the physical basis of g. As apparent in his (1927) Abilities of Man, he too was fully aware of the reification issue. Accordingly, he treated his idea of g as "mental energy" (energy being metaphorical) as a theoretical attempt to account for the phenomenon which the g factor highlights and quantifies, namely, the positive manifold. In this respect, Spearman made no apologies for hypothesizing causal mechanisms to explain g. Quite the contrary¹⁸:

[Psychology] is a science in its own right, and can no more fulfill this mission without hypotheses than a man can run properly with his legs tied in a sack. What would physics do without its electrons, its ether, or its heat, none of which are, or perhaps even can be, directly perceived? Indeed, there is no necessity for believing that such entities really exist at all. (p. 128)

In fact, what Gould has mistaken for "reification" is neither more nor less than the common practice in every science of hypothesizing explanatory models, constructs, or theories to account for the observed relationships within a given domain. Well-known examples include the Bohr atom, the electromagnetic field, gravitation, quarks, Mendelian genes, mass, velocity, and so forth. None of these constructs exists as a palpable entity occupying physical space, yet they are essential for scientific progress. The g factor, and theories attempting to explain g in terms of models independent of factor analysis itself, are essentially no different. Psychology has the same right as physics to hypothesize models (such as "mental energy" or "neural efficiency") to account for its data, even if those entities cannot be directly observed.

The Brain—Where Else?

While g is a unitary factor at the psychometric level, it likely results from multiple independent biological processes at the molecular or physiological level. Therefore, observing correlations between g and physical variables does not mean one is "reifying" g as a single physical substance, but rather that g is enmeshed in underlying biological systems. One of the best known correlations is that of brain size and intelligence, which, when measured via brain-imaging techniques in living individuals, is between r = .20 and .40⁵. However, it is certainly not the only (or most important) biological variable that is correlated with IQ, nor does brain size fully explain why some people are smarter than others.

References

Haier, R. J., Colom, R., & Hunt, E. (2024). The Science of Human Intelligence (2nd ed.). Cambridge University Press. ↩︎
Meehl, P. E. (2006). The power of quantitative thinking. In N. G. Waller, L. J. Yonce, W. M. Grove, D. Faust, & M. F. Lenzenweger (Eds.), A Paul Meehl reader: Essays on the practice of scientific psychology (pp. 433–444). Lawrence Erlbaum Associates Publishers. ↩︎ ↩︎
Warne, R. T., & Burningham, C. (2019). Spearman’s g found in 31 non-Western nations: Strong evidence that g is a universal phenomenon. Psychological Bulletin, 145(3), 237–272. https://doi.org/10.1037/bul0000184 ↩︎ ↩︎
Chooi, W.-T., Long, H. E., & Thompson, L. A. (2014). The Sternberg Triarchic Abilities Test (Level-H) is a measure of g. Journal of Intelligence, 2(3), 56–67. https://doi.org/10.3390/jintelligence2030056 ↩︎
Warne, R. T. (2020). In the know: Debunking 35 myths about human intelligence. Cambridge University Press. https://doi.org/10.1017/9781108593298 ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Jensen, A. R. (1998). The g factor: The science of mental ability. Praeger. https://arthurjensen.net/wp-content/uploads/2020/04/The-g-factor-the-science-of-mental-ability-Arthur-R.-Jensen.pdf ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
Merz, Z. C., Van Patten, R., Hurless, N., Grant, A., & McGrath, A. B. (2019). Furthering the understanding of Wechsler Adult Intelligence Scale–Fourth Edition factor structure in a clinical sample. The Clinical Neuropsychologist, 33(1), 12–23. https://doi.org/10.1080/23279095.2019.1585351 ↩︎ ↩︎
Schneider, W. J., & McGrew, K. S. (2013). Cattell–Horn–Carroll (CHC) theory of cognitive abilities: Definitions (CHC v2.0). Institute for Applied Psychometrics. https://www.iapsych.com/chcv2.pdf ↩︎
Strenze, T. (2015). Intelligence and socioeconomic success: A study of correlations, causes and consequences (Doctoral dissertation, University of Tartu). University of Tartu. https://dspace.ut.ee/server/api/core/bitstreams/6ea26618-56b2-43a0-8e4a-2586d117cac9/content ↩︎
Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography. Intelligence, 24(1), 13–23. https://www1.udel.edu/educ/gottfredson/reprints/1997mainstream.pdf ↩︎
Gargus, J., & Haier, R. (2025). Toward a Molecular Biology of Human Intelligence: Psychometrics Meets Gene Expressions and Brain Metabolism. Intelligence & Cognitive Abilities, 1(2), 74–93. https://icajournal.scholasticahq.com/article/146520-toward-a-molecular-biology-of-human-intelligence-psychometrics-meets-gene-expressions-and-brain-metabolism ↩︎ ↩︎
Deary, I.J., Cox, S.R. & Hill, W.D. Genetic variation, brain, and intelligence differences. Mol Psychiatry 27, 335–353 (2022). https://doi.org/10.1038/s41380-021-01027-y ↩︎ ↩︎
Seidlitz, J., Váša, F., Shinn, M., Romero-Garcia, R., Whitaker, K. J., Vértes, P. E., Wagstyl, K., Kirkpatrick Reardon, P., Clasen, L., Liu, S., Messinger, A., Leopold, D. A., Fonagy, P., Dolan, R. J., Jones, P. B., Goodyer, I. M., NSPN Consortium, Raznahan, A., & Bullmore, E. T. (2018). Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict Inter-Individual Cognitive Variation. Neuron, 97(1), 231–247.e7. https://doi.org/10.1016/j.neuron.2017.11.039 ↩︎
Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 373(1756), 20170284. https://pmc.ncbi.nlm.nih.gov/articles/PMC6107566/ ↩︎
Ferrero, M., Vadillo, M. A., & León, S. P. (2021). A valid evaluation of the theory of multiple intelligences is not yet possible: Problems of methodological quality for intervention studies. Intelligence, 88, 101566. https://doi.org/10.1016/j.intell.2021.101566 ↩︎
Brody, N. (2003). Construct validation of the Sternberg Triarchic Abilities Test: Comment and reanalysis. Intelligence, 31(4), 319–329. https://emilkirkegaard.dk/en/wp-content/uploads/1.-Construct-validation-of-the-Sternberg-Triarchic-Abilities-Test-Comment-and-reanalysis.pdf ↩︎
Brody, N. (2003). What Sternberg should have concluded. Intelligence, 31(4), 339–342. https://humanvarietiesdotorg.wordpress.com/wp-content/uploads/2013/03/brody-what-sternberg-should-have-concluded.pdf ↩︎
Jensen, A. (1982). The debunking of scientific fossils and straw persons. Contemporary Education Review, 1(2), 121–135. https://gwern.net/doc/iq/1982-jensen-3.pdf ↩︎ ↩︎ ↩︎