Quality-of-Hire Measurement: Composite Metrics and Lagging vs Leading Indicators

Quality of hire is the metric every talent leader claims to measure and almost no organization measures well. The Sullivan & Burnett framework and the Boudreau & Ramstad treatment have been available for over a decade, but practical implementation remains rare because the measurement is genuinely hard: performance ratings drift, retention is multi-causal, and managerial-satisfaction surveys are noisy. The default substitute — time-to-fill plus offer-acceptance rate — is operationally tractable but doesn’t actually measure hire quality. The measurement gap is the most consequential blind spot in talent acquisition.

This article frames quality of hire as a composite measurement problem, walks through the lagging and leading indicators that should compose the metric, examines the empirical evidence on which combinations work, covers a practical implementation workflow, and addresses how AIEH’s portable credential infrastructure changes the upstream signal that quality of hire ultimately measures.

Data Notice: Quality-of-hire measurement frameworks vary across the published literature. Numbers cited reflect Sullivan & Burnett, Boudreau & Ramstad, Cappelli, and Schmidt & Hunter at time of writing. Projected effect sizes are marked with ~ and reflect modeled estimates rather than empirical measurements from any specific organization.

The measurement problem

Quality of hire is conceptually simple: did the hire produce the value the role was supposed to produce? The measurement challenges arise because:

Outcomes are multi-causal. A new hire’s performance is shaped by their capability, the manager, the team, the resources, the strategy. Attributing outcomes to hiring quality versus context is not straightforward.
Outcomes lag. Performance ratings, retention milestones, and promotion decisions emerge over 6–24 months. Talent acquisition decisions made today don’t get measurable outcomes until well after the next hiring cycle.
Outcomes are noisy. Performance ratings have known reliability problems. Manager-satisfaction surveys reflect manager-candidate fit as much as candidate quality. Retention is shaped by external labor-market shifts.
Outcomes are partially observable. When a hire performs poorly, the organization sees the cost. When a hire performs strongly, the organization rarely measures the counterfactual (“how strongly would the alternative candidate have performed”).

The measurement framework that actually captures hire quality acknowledges these challenges rather than pretending they don’t exist. The Sullivan & Burnett approach explicitly composes multiple imperfect indicators into a composite that performs better than any single indicator.

For broader hiring-cost-economics framing, see hiring cost economics.

Lagging indicators

Lagging indicators measure outcomes after they occur. The canonical lagging indicators for quality of hire:

6-month performance rating. Manager-assigned rating at the first review milestone. Reliable enough to be useful; noisy enough that it shouldn’t be the only indicator.
12-month performance rating. Stronger signal because the ramp period is over and the manager has more evidence. The drawback is the long feedback loop.
18–24 month retention. Whether the hire is still in role at the milestone. Strong signal that aggregates many intermediate outcomes (performance, fit, culture, growth).
Promotion velocity. Whether the hire reaches the next level on the expected timeline. Signals not just performance but also growth potential.
First-year turnover. Hires departing within ~12 months represent quality-of-hire failures (or onboarding failures, which the talent function still owns partially). High first-year turnover is a leading indicator that hiring quality has deteriorated.
Manager-satisfaction at 6 months. Forced-distribution manager rating of “would hire again” calibrated against peer hires. Less subjective than free-form rating but still noisy.

Lagging indicators are the ground truth of hire quality, but their lag makes them unsuitable for in-cycle decision-making. Leading indicators are needed to provide early signal.

Leading indicators

Leading indicators correlate with lagging outcomes but produce signal earlier in the timeline:

30-day onboarding milestones. Whether the hire completed the onboarding plan and reached the documented 30-day capability targets. Modest correlation with 6-month rating.
First-90-day deliverables. Did the hire produce the expected first-90-day work product? Stronger correlation with later performance than 30-day milestones.
Peer feedback at 90 days. Structured peer feedback is meaningfully predictive of 6–12 month outcomes when conducted with rubric grounding rather than free-form.
Hiring panel calibration. Did the candidate’s actual early performance match the predicted performance from the hiring loop? Tracking the calibration error across hires identifies systematic over- or under-confidence in the assessment process.
Time to first independent contribution. A measurable capability milestone — a first ship, a first independent decision, a first customer interaction handled solo. The time-to-milestone correlates with later productivity.

Leading indicators have weaker individual signal than lagging indicators but compose into a usable in-cycle quality measure. The Sullivan & Burnett framework explicitly recommends composite measurement that combines leading and lagging indicators with weights tuned to the organization’s signal- to-noise profile.

For broader onboarding-design coverage that overlaps with leading indicators, see onboarding design evidence.

Composite measurement frameworks

Three composite frameworks recur in the literature:

Sullivan & Burnett quality-of-hire composite

A weighted combination of: 6-month performance rating (~25%), 12-month performance rating (~25%), 18-month retention (~20%), manager-satisfaction at 6 months (~15%), and 90-day peer-feedback rating (~15%). The composite is calculated for each hire and aggregated to a per-recruiter, per-hiring-manager, and per-source quality score.

Boudreau & Ramstad utility framework

A more explicitly economic framing: each hire’s quality is measured as the dollar-value contribution above expected baseline performance. The framework requires utility estimation for each role family (typically published in selection-research literature). The output is comparable across roles in dollar terms but requires more measurement infrastructure.

Multi-stakeholder rating composite

A weighted average of hiring-manager rating, peer rating, and self-rating at 6 and 12 months. The multi-stakeholder approach reduces single-rater bias but introduces complexity in measurement collection.

The choice of framework depends on organizational maturity. Most organizations should start with the Sullivan & Burnett composite because the inputs are operationally feasible. The Boudreau & Ramstad utility framework is appropriate for organizations that already have infrastructure for role-level performance measurement. The multi-stakeholder composite works well when 360-feedback infrastructure already exists.

Practical implementation workflow

A workable quality-of-hire measurement program has six stages:

Define the composite. Choose the indicators, the weights, and the calculation method. Document the definition in a written artifact that survives leadership transitions.
Instrument data collection. Most organizations already have the input data scattered across HRIS, ATS, performance-management, and survey systems. Joining the data is the implementation work.
Establish baselines. Calculate the composite for the most recent ~12 months of hires to establish baseline distributions. Without baselines, the metric is directionless.
Segment by source and decision-maker. Quality of hire should be broken out by recruiter, hiring manager, sourcing channel, and assessment method used. The segmentation produces actionable insight; aggregate numbers don’t.
Close the feedback loop. Quality-of-hire data flows back to the hiring decision-makers — recruiters see their per-hire scores, hiring managers see theirs, sourcing channels are evaluated. The feedback loop is what makes the measurement program improve outcomes rather than just producing reports.
Periodic recalibration. Annual review of weights and indicator selection based on which indicators are most predictive in the organization’s actual data. The weights aren’t permanent; they should evolve.

The implementation work is the expensive part of the program. Choosing the framework is straightforward; instrumenting the data, establishing baselines, and closing the feedback loop take ~2–4 quarters of disciplined effort.

Common pitfalls

Several pitfalls dominate quality-of-hire programs:

Single-indicator measurement. Using only 6-month performance rating, or only first-year retention, produces a noisy and gameable metric. The composite is essential.
No baseline. Reporting “quality of hire is 7.2 out of 10” without context is meaningless. The baseline distribution makes the metric interpretable.
No segmentation. Aggregate quality-of-hire numbers hide the variance that matters — which sourcing channels work, which hiring managers underperform, which assessment methods produce the best signal.
Weak feedback loop. Quality data that doesn’t reach the decision-makers doesn’t change behavior. The feedback loop is the mechanism by which measurement improves outcomes.
Static weights. Composite weights set at program launch and never revisited drift away from what the organization’s actual data supports.

The Cappelli HBR critique of hiring measurement consistently identifies the weak feedback loop as the dominant pitfall. Most organizations measure quality of hire in some form; few flow the data back to decisions effectively.

AIEH portable credentials and upstream signal

Quality-of-hire measurement is bounded by the quality of the upstream hiring signal. If the hiring loop produces noisy predictions, no amount of downstream measurement sophistication will produce reliably high quality of hire. The leverage is partly in measurement and substantially in the upstream signal that drives selection decisions.

AIEH’s Skills Passport infrastructure improves the upstream signal in two ways:

Calibrated multi-pillar evidence. Skills Passport composite scores aggregate cognitive, domain, AI fluency, and communication evidence on a calibrated scale. The selection-research literature consistently shows that calibrated multi-source evidence outperforms single-source or uncalibrated evidence on predictive validity.
Aggregated multi-vendor signal. Candidates who arrive with passport evidence aggregating multiple past assessments produce more reliable predictions than candidates assessed via a single in-process test. The aggregation effect is statistically meaningful at scale.

For the underlying credential mechanics, see what is the skills passport and the scoring methodology. For broader skills- based hiring research, see skills-based hiring evidence.

The projected effect on quality of hire is ~10–15% improvement in 6–12 month performance-rating outcomes for hires made with substantial passport evidence relative to hires made without it, holding role family and hiring manager constant. The effect is largest for technical roles where calibrated skills evidence is most diagnostic; it is smaller for senior-leadership roles where assessment depth is dominated by judgment-and-influence factors that passports don’t fully capture.

Takeaway

Quality of hire is the metric talent leaders claim to measure and rarely measure well. Workable programs use composite indicators (Sullivan & Burnett style) combining leading and lagging signals, establish baselines, segment by decision-maker and source, and close the feedback loop to actual hiring decisions. The measurement is bounded by the quality of the upstream selection signal; portable credentials improve that upstream signal by aggregating calibrated multi-source evidence on a comparable scale.

For related coverage, see hiring cost economics, onboarding design evidence, and structured interview design.

Sources

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262–274.
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419–450.
Sullivan, J., & Burnett, M. (2018). Quality of hire: A framework for measuring hiring outcomes. ERE Recruiting Intelligence.
Boudreau, J. W., & Ramstad, P. M. (2007). Beyond HR: The New Science of Human Capital. Harvard Business School Press.
Cappelli, P. (2019). Your approach to hiring is all wrong. Harvard Business Review, 97(3), 48–58.
Society for Human Resource Management (SHRM). (2023–2024). Talent Acquisition Benchmarking Report.