The Cannarella tirzepatide pilot is one of the most cited studies in the GLP-1/testosterone literature. Eighty-three men across three arms — tirzepatide, lifestyle intervention, transdermal TRT — followed for two months. The results are dramatic: 100% reversal of hypogonadism in the tirzepatide arm, testosterone up 128.5%, LH surging 80%, FSH rising 72.2%. The TRT arm normalized testosterone too, but through the opposite biology — LH falling 24%, the axis shutting down as exogenous hormone replaced what the body had stopped making.
These are real findings. The study is competently designed within its constraints. But when you read it closely — not the conclusions, but the methods section, the assay platform, the calculation formulas, the units and thresholds — something else emerges. What you see is not a study about testosterone. What you see is a study about the instrument.
Level 1: The instrument defines the disease
The Cannarella pilot enrolls men with "functional hypogonadism." To qualify, their serum total testosterone had to fall below a threshold. But which threshold?
There are now at least thirteen published clinical guidelines for male hypogonadism. Each defines the condition using serum testosterone. Each draws the line somewhere different. The Endocrine Society uses 264 ng/dL (9.2 nmol/L). The EAU uses 350 ng/dL (12.1 nmol/L). The AUA uses 300 ng/dL. The BSSM uses 231 ng/dL for definite and 346 ng/dL for probable. The VA uses a two-test confirmatory system. The ISA uses 200 ng/dL for no debate and 400 ng/dL for probable.
These are not minor variations. A man with a testosterone of 280 ng/dL is hypogonadal under the AUA definition, definitively normal under the Endocrine Society's, and sitting in a gray zone under the BSSM's. The same blood sample. The same lab. Different diseases depending on which guideline the clinician follows.
Pozzi and Ramasamy, writing in the International Journal of Impotence Research in 2025, noted that the terminological inconsistencies across guidelines — "male hypogonadism" versus "testosterone deficiency" versus "late-onset hypogonadism" versus "functional hypogonadism" — are not just semantic. They "reflect potential disagreements about the pathophysiological basis" of the condition. This is Kelley's jingle fallacy operating at clinical scale: the same word attached to different constructs, creating the illusion of agreement where none exists.
When 30% of men diagnosed with hypogonadism normalize on retest without any intervention, the natural reading is measurement noise. The deeper reading is that the construct boundary is so uncertain that individual patients drift across it between blood draws. The instrument is not detecting a disease. It is defining one — and the definition changes depending on who holds the instrument.
Level 2: The instrument measures success on an uncertified platform
The Cannarella pilot measured testosterone using electrochemiluminescence immunoassay (ECLIA) on a Roche Cobas 6000 analyzer. This is standard clinical practice. Most hospitals use Roche or similar automated immunoassay platforms for routine testosterone measurement.
In February 2026, Li and colleagues published in the Journal of Clinical Endocrinology & Metabolism the formal evaluation of the only automated immunoassay to achieve certification through the CDC's Hormone Standardization (HoSt) program for testosterone. It is the Siemens Atellica IM TSTII. Not the Roche Cobas. The certified platform demonstrated slope 0.98 versus LC-MS/MS reference, correlation r = 0.991, within-lab precision CV 3.46%. It met the HoSt-TT criterion of ±6.4% mean bias.
The Roche Cobas is not necessarily inaccurate. In proficiency testing, Cobas systems show mean bias of approximately −1.2%, which falls within the ±6.4% HoSt-TT limit. But it has not been formally certified through the standardization program. The distinction matters. In a field where thirteen guidelines cannot agree on what testosterone level defines disease, measuring the primary biomarker on an uncertified platform adds a second layer of instability. The diagnosis rests on a number. The number rests on a platform that has not undergone the formal validation designed to ensure the number means what it claims to mean.
The Roche Ionify gap
Roche has introduced mass spectrometry capability to its Cobas platform — the Ionify reagent system received CE marking for the cobas i 601 analyzer. Mass spectrometry is the gold standard for steroid hormone measurement. The Ionify panel covers six steroid hormones: estradiol, DHEA, DHEA-S, progesterone, 17-hydroxyprogesterone, and androstenedione. Testosterone is not among them. The most commonly measured steroid hormone in the diagnosis and monitoring of male hypogonadism — the one biomarker on which thirteen guidelines, millions of diagnoses, and every treatment decision depends — is the one Roche left out of its mass spectrometry revolution.
Level 3: The instrument masks the biological signal
The Cannarella pilot reports total testosterone increasing by 128.5% in the tirzepatide arm. This number anchors the study's central claim: GLP-1 receptor agonists reverse functional hypogonadism. The number is real. But what does total testosterone actually measure?
Total testosterone is the sum of three fractions: testosterone bound to sex hormone-binding globulin (approximately 65%), testosterone bound to albumin (approximately 33%), and free testosterone (approximately 2%). Only free testosterone is biologically active. SHBG-bound testosterone is essentially sequestered — unavailable to tissues, unable to activate androgen receptors.
Weight loss raises both testosterone and SHBG. But it raises them unequally. An umbrella review published in Endocrine Practice in February 2026 by Nayak and colleagues, synthesizing data across bariatric surgery studies, found that SHBG rose by 21.22 nmol/L while total testosterone rose by only 8.73 nmol/L. SHBG increases 2.4 times more than total testosterone. This ratio is not procedure-specific — it reflects the biology of weight loss itself. The Salvio meta-analysis confirmed a significant SHBG rise with GLP-1 receptor agonists specifically (standardized mean difference 2.39, p = 0.0007), with high heterogeneity.
The Cannarella pilot calculated free and bioavailable testosterone using the Vermeulen formula, the most widely used calculation method. The Vermeulen formula uses measured total testosterone, SHBG, and albumin concentrations with published association constants to estimate free testosterone. It is the best available calculation — and it systematically overestimates free testosterone by 19–30% compared to equilibrium dialysis, the reference method. The median overestimation ratio is 1.19. This bias is relatively independent of SHBG and testosterone levels, meaning it has high internal validity: the error is consistent. But it is an error. Every calculated free testosterone value in the study is approximately 20% higher than the true value.
The combined effect: a man on tirzepatide loses weight. His total testosterone rises dramatically. His SHBG rises even more dramatically, per unit. The instrument reports total testosterone — the sum that includes the sequestered fraction. The calculation estimates free testosterone using a formula that overestimates by a fifth. The number on the lab report says "normalized." The amount of testosterone actually available to his tissues may tell a different story.
This is not a hypothetical. Portillo-Canales presented data at ENDO 2025 from 110 men followed for 18 months on GLP-1 receptor agonists. At baseline, 53% had both normal total and normal free testosterone. At 18 months, 77% normalized both. That progression — from 53% to 77% — means masking is not total. Most men genuinely improve. But 23% did not normalize free testosterone even after 18 months. Those 23% are the population the instrument cannot see. Their total testosterone may look adequate. Their biologically active testosterone may not be. The instrument reports success. The biology reports something else.
Level 4: The instrument groups opposite populations
Close-read the Cannarella pilot's three-arm comparison and watch what happens across every measured variable:
| Axis | Tirzepatide | TRT | Meaning |
|---|---|---|---|
| LH | +80% | −24% | Opposite gonadotropin direction |
| Estradiol | −60% | +21% | Opposite estrogen trajectory |
| Fat mass | −42% | −15% | Different metabolic effect |
| Lean mass | +18% | +11% | Both improve, different magnitude |
| Axis direction | Restored upstream | Replaced downstream | Opposite mechanism of action |
Both arms normalize testosterone. The instrument sees two successes. But in the tirzepatide arm, the hypothalamic-pituitary-gonadal axis is waking up — LH surging, the brain resuming its signal to the testes. In the TRT arm, the axis is shutting down — exogenous testosterone telling the brain there is no need to signal. One treatment restores the system. The other bypasses it. Same lab value. Opposite biology.
The estradiol divergence is the sharpest cut. Tirzepatide drops estradiol by 60% — from 33 to 11 pg/mL — as fat mass (and the aromatase it contains) shrinks. TRT raises estradiol by 21% as exogenous testosterone is aromatized in peripheral tissue. Two "successful" treatments producing opposite hormonal environments. The instrument measuring success — serum total testosterone — cannot distinguish between them.
These are not two treatments for one disease. These are two treatments for two diseases that happen to share a lab value. The instrument — serum testosterone — created the illusion of a single condition. The close reading destroys it.
What a close reading reveals
Four levels. The instrument defines which men are sick (thirteen thresholds, thirteen different patient populations). The instrument measures whether treatment works, on a platform that has not been certified through the program designed to ensure the measurement is reliable (Li et al., JCEM 2026). The instrument reports a number — total testosterone — that systematically obscures the biologically active fraction behind disproportionate SHBG binding and a calculation formula with a built-in 20% overestimate (Nayak et al., Endocrine Practice 2026). And the instrument groups two populations — one whose axis has been metabolically suppressed, one whose axis is structurally insufficient — under one diagnosis, directing them toward interventions that are biologically opposite.
None of this invalidates the Cannarella pilot. The within-study comparisons remain valid because all arms used the same platform, the same formula, the same thresholds. The relative differences are real. What the close reading reveals is not that this study is wrong. It is that the instrument through which we interpret all such studies carries four nested assumptions, each one shaping what the science can see and what it cannot.
The field is not measuring a disease. It is measuring a number. The number is not the disease. It is the shadow the disease casts on the wall of a particular instrument — and the shape of that shadow depends on the wall.
Sources
Cannarella R et al. Tirzepatide vs lifestyle vs TRT in functional male hypogonadism. Reprod Biol Endocrinol. 2025. PMC12220628.
Li H et al. Evaluation of the Siemens Atellica IM TSTII assay in the CDC HoSt-TT program. J Clin Endocrinol Metab. 2026.
Nayak SS et al. SHBG and testosterone changes after bariatric surgery: umbrella review. Endocrine Practice. 2026.
Pozzi E, Ramasamy R. Comment on Tsampoukas et al.: terminological inconsistencies in hypogonadism guidelines. Int J Impot Res. 2025.
Portillo-Canales R et al. Testosterone normalization with GLP-1 RAs: total and free T outcomes. Poster, ENDO 2025. n=110, 18 months.
Salvio G et al. GLP-1 receptor agonists and SHBG: systematic review and meta-analysis.