The Big Five Personality Test in Hiring: Validity, Limits, and Practical Use

The Big Five personality model — sometimes called the Five Factor Model or OCEAN — is the dominant framework academic personality psychology has converged on over the last forty years. It maps personality onto five broad dimensions: openness, conscientiousness, extraversion, agreeableness, and neuroticism. For hiring teams deciding whether to use it as a signal in their candidate evaluation, the question isn’t whether the model is real (it is, with strong empirical support), but where it actually predicts performance and where it doesn’t.

This article walks through what the Big Five measures, what decades of hiring research tell us about its predictive validity, where it falls short, and how it compares to the more popular MBTI. By the end you should have a clear sense of whether — and how — to fold a Big Five assessment into your hiring loop, plus what to be skeptical of when a vendor pitches you a personality test.

Data Notice: Validity coefficients, meta-analytic estimates, and personality-research findings cited here reflect peer-reviewed sources available at time of writing. Effect sizes vary across job families, measurement instruments, and cultural contexts; consult primary sources and an industrial-organizational psychologist before deploying any assessment in a high-stakes hiring decision.

What the Big Five model is

The Big Five emerged from decades of factor analysis on natural-language personality descriptors. Researchers — most prominently Lewis Goldberg in the lexical-hypothesis tradition, and Paul Costa and Robert McCrae in their work on the NEO inventories — repeatedly found that nearly all the words people use to describe personality cluster into five broad factors (McCrae & John, 1992):

Openness to experience — curiosity, imagination, preference for novelty over routine, intellectual engagement.
Conscientiousness — organization, persistence, deliberateness, self-discipline.
Extraversion — sociability, assertiveness, positive activity, comfort being the center of attention.
Agreeableness — cooperation, trust, prosocial concern, compliance with social norms.
Neuroticism — emotional volatility, anxiety, vulnerability to stress.

Each factor lives on a continuum, not a category. A candidate’s score on extraversion isn’t “extraverted versus introverted” — it’s a position somewhere along the axis, with most people clustered toward the middle. The five factors are largely independent: high conscientiousness doesn’t predict high or low extraversion. That independence is why a single Big Five profile carries five distinct pieces of information rather than collapsing into one global “personality score”.

Why the Big Five became the consensus framework

The Big Five wasn’t designed by one researcher and adopted on authority. It was the structure that kept emerging, independently, when different research teams ran factor analyses on personality data — across self-report, peer-rating, and observer-rating designs (Costa & McCrae, 1992). The five-factor structure has been replicated across more than fifty languages and cultures (Schmitt et al., 2007), with the strongest cross-cultural stability for conscientiousness and extraversion and weaker stability for openness in some non-Western samples.

Twin and adoption studies suggest each Big Five factor has substantial genetic heritability — typically estimated at ~40–60% (Bouchard & Loehlin, 2001) — but personality is not fixed. Mean-level changes are observable across the lifespan: people tend to gain conscientiousness and agreeableness, and lose neuroticism, as they age into and through middle adulthood (Roberts et al., 2006). For hiring, this matters because a candidate’s Big Five profile at 25 is not their profile at 40.

What the Big Five predicts at work

The most cited workplace finding is that conscientiousness predicts job performance across nearly every job family studied. Barrick and Mount’s (1991) meta-analysis aggregating ~117 studies found conscientiousness was the only Big Five factor that predicted performance across all five occupational groups they examined (professionals, police, managers, sales, skilled/semi-skilled), with a corrected validity coefficient around ~0.22. Subsequent meta-analyses have replicated this with modest variation (Salgado, 1997; Hurtz & Donovan, 2000).

Other Big Five factors are more job-specific:

Extraversion predicts performance in sales and management roles where social influence is the work, with corrected validity around ~0.15 in those contexts but near zero in roles where sociability isn’t job-relevant (Barrick & Mount, 1991).
Emotional stability (low neuroticism) predicts performance in high-stress and safety-sensitive roles. Effect sizes are modest (~0.13) but consistent across studies (Salgado, 1997).
Agreeableness predicts performance in roles requiring teamwork and customer-facing cooperation, but is weakly negative for roles requiring tough negotiation or competitive performance (Mount, Barrick & Stewart, 1998).
Openness to experience predicts training proficiency and adaptability in roles facing significant change, but is the weakest general performance predictor (Barrick, Mount & Judge, 2001).

These are corrected validity coefficients. Uncorrected (raw correlations in the field) are typically smaller. The numbers are also averages — individual roles vary substantially.

What the Big Five does NOT predict

Three categories of important hiring outcomes the Big Five does not reliably predict:

Culture fit and team dynamics. Personality describes individual dispositions; team performance depends on the combination of dispositions, the team’s task structure, and the leadership context. A candidate who scores in the “high-fit” zone for one team can be a poor fit on a different team three months later (Morgeson et al., 2007).
Ethics, integrity, and counterproductive behavior. These are better captured by purpose-built integrity assessments than by Big Five inferences. Conscientiousness correlates with reduced counterproductive workplace behavior, but the correlation is too weak to substitute for direct measurement (Sackett & Wanek, 1996).
Current performance state. Big Five measures trait-level personality — what the candidate is like across time and situations. It doesn’t capture motivation, current job satisfaction, or whether they’re considering leaving the role they’d be hired for. A candidate with strong Big Five conscientiousness can still underperform if they’re disengaged from the actual job.

The trait-versus-state distinction matters most for senior hires. Personality predicts what someone is capable of bringing to work; it doesn’t predict whether they will, in the specific role you’re hiring for, in the specific organization you have.

Big Five vs MBTI

This is the comparison hiring teams ask about most often. The short version: the Big Five has decades of peer-reviewed validity evidence; the MBTI does not.

Three specific issues with the MBTI as a hiring tool:

Forced typing. MBTI sorts respondents into one of 16 four-letter types (“INTJ”, “ENFP”). Real personality is continuous, and a candidate scoring 51% on the introversion-extraversion axis gets labeled the same as one scoring 99% — but those are not the same candidate. Big Five preserves the continuum (Pittenger, 1993).
Test-retest reliability. A meaningful percentage of MBTI respondents get a different four-letter type when retested even a few weeks apart, especially when their scores fall near the dichotomy boundaries. Big Five scores are substantially more stable across retest (Boyle, 1995).
Factor structure. Independent factor analyses of MBTI items don’t cleanly recover the four MBTI dimensions. They tend to recover something closer to the Big Five (Stein & Swan, 2019). The MBTI’s own test publishers acknowledge it isn’t designed for personnel selection and recommend against using it for hiring decisions — a stance routinely ignored by HR teams who use it anyway.

For hiring use, the Big Five is the defensible choice. The MBTI’s popularity comes from its accessible four-letter labels, not from its predictive validity.

Practical considerations and limits

Five things to keep in mind before building Big Five into a hiring loop:

Self-report is gameable. Candidates motivated to look good will shift their answers toward the socially desirable end (high conscientiousness, high agreeableness, low neuroticism). Mitigation: use validated instruments with embedded social-desirability scales, and combine personality with behavioral evidence.
Adverse impact. Big Five factors generally show smaller demographic group differences than cognitive ability tests, but they’re not zero. Conscientiousness shows roughly half the standardized mean difference of cognitive tests across major demographic groups (Hough & Oswald, 2008). Validate locally before high-stakes use.
Recency. Personality drifts slowly over the lifespan. A score from three years ago is still meaningful but should be flagged as such; AIEH applies a roughly 5-year half-life decay curve to personality scores on the Skills Passport (see how scoring works for the full decay model).
Integration with structured interviews. Personality should be one signal in a multi-method assessment — combined with structured behavioral interviews, work samples, and (where role-relevant) cognitive ability — not a standalone hiring criterion (Schmidt & Hunter, 1998).
Legal landscape. Use of personality assessment in hiring is regulated in some jurisdictions and triggers EEOC guidance in the US. Consult employment counsel before deploying any test, especially in high-stakes selection contexts.

How AIEH uses the Big Five

AIEH’s Big Five sample test uses the Mini-IPIP item bank (Donnellan et al., 2006) — a 20-item public-domain abbreviated form of Goldberg’s IPIP. The free Big Five sample is deliberately ultra-short (5 items, one per trait) and frames itself as a directional indicator, not a calibrated score. Anyone using it for a real hiring decision is misusing it; the sample exists to give respondents a feel for the construct before committing to the full assessment.

The full Skills Passport assessment uses the IPIP-NEO 120-item form, which produces calibrated trait scores comparable to commercial NEO-PI-R results without the licensing fees. Each trait score is mapped onto AIEH’s 300–850 scale alongside scores from other test families (technical, cognitive, communication) — see how scoring works for the weighting and decay methodology.

For a worked example of a role where Big Five contributes to the recommended-test bundle, see the AI Product Manager role page — it lists Communication and ACL (prompt-to-spec translation) as the highest-relevance assessments, with Big Five contributing a smaller but non-trivial signal on conscientiousness and openness for that role family.

For a deeper dive into one specific Big Five sample item — what it measures and why the wording is diagnostic — see the explainer for the “life of the party” stem.

Takeaway

The Big Five is the right framework for hiring teams adding personality assessment to their pipeline — it has the strongest validity evidence, the cleanest construct structure, and the lowest adverse-impact profile of any major personality model. Conscientiousness is the most useful single signal it produces; the other four factors are role-conditional.

What it isn’t is a hiring decision. Use it as one input in a multi-method assessment alongside cognitive ability (where role-relevant), structured behavioral interviews, and direct work samples. Treat any vendor pitch that frames personality as the primary selection signal with appropriate skepticism — the research doesn’t support that framing, and your hiring loop will be more accurate if personality is one signal among several.

Sources

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the new millennium. International Journal of Selection and Assessment, 9(1‑2), 9–30.
Bouchard, T. J., & Loehlin, J. C. (2001). Genes, evolution, and personality. Behavior Genetics, 31(3), 243–273.
Boyle, G. J. (1995). Myers-Briggs Type Indicator (MBTI): Some psychometric limitations. Australian Psychologist, 30(1), 71–74.
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory professional manual. Psychological Assessment Resources.
Donnellan, M. B., Oswald, F. L., Baird, B. M., & Lucas, R. E. (2006). The mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality. Psychological Assessment, 18(2), 192–203.
Hough, L. M., & Oswald, F. L. (2008). Personality testing and industrial-organizational psychology: Reflections, progress, and prospects. Industrial and Organizational Psychology, 1(3), 272–290.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of Applied Psychology, 85(6), 869–879.
McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2), 175–215.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60(3), 683–729.
Mount, M. K., Barrick, M. R., & Stewart, G. L. (1998). Five-factor model of personality and performance in jobs involving interpersonal interactions. Human Performance, 11(2-3), 145–165.
Pittenger, D. J. (1993). The utility of the Myers-Briggs Type Indicator. Review of Educational Research, 63(4), 467–488.
Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course. Psychological Bulletin, 132(1), 1–25.
Sackett, P. R., & Wanek, J. E. (1996). New developments in the use of measures of honesty, integrity, conscientiousness, dependability, trustworthiness, and reliability for personnel selection. Personnel Psychology, 49(4), 787–829.
Salgado, J. F. (1997). The five factor model of personality and job performance in the European community. Journal of Applied Psychology, 82(1), 30–43.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
Schmitt, D. P., Allik, J., McCrae, R. R., & Benet-Martínez, V. (2007). The geographic distribution of Big Five personality traits. Journal of Cross-Cultural Psychology, 38(2), 173–212.
Stein, R., & Swan, A. B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory. Personality and Mental Health, 13(2), 100–115.