Using Codex to unpack the dead salmon fMRI methods section

A small example of turning a famous science-paper methods detail into an interactive model: what does p(uncorrected) mean, why do multiple-comparisons corrections matter, and how would different parameters change the result?

Codex methods helper demo

From a methods paragraph to a working intuition

The dead salmon paper is memorable because an uncorrected fMRI analysis found apparently significant activity in a fish that was not alive. Codex located the source, extracted the reported numbers, reproduced the basic false-positive arithmetic, then converted the methods detail into sliders.

Paper number

8,064 voxels searched at p(uncorrected) < 0.001.

Arithmetic

8,064 × 0.001 gives about 8 expected false positives.

Correction

FDR/FWER correction reported no active voxels.

Prompts used

The interaction started with two plain requests

find the famous dead salmon paper and replicate the results based on the numbers - create a small visualisation of the results to illustrate the numbers

let me tweak parameters so I understand better the corrections made and what was the problem and how different parameters would have influenced the results

Search volume

How many voxelwise tests are being searched. A typical fMRI volume can be much larger than the salmon search mask.

Effective independent tests

Spatial smoothing makes neighbouring voxels correlated. Lower this to see a rough effective-tests heuristic.

Uncorrected voxel threshold

Log scale. The salmon poster used p < 0.001 before correction.

Observed significant voxels

The poster reported 16 significant voxels in the uncorrected analysis.

Weakest selected voxel p

Use this as the p-value of the least impressive voxel among the observed hits. We do not have the salmon's raw p-value list.

Target FWER

Familywise error rate: the chance of at least one false positive in the searched family.

Target FDR

False discovery rate: expected proportion of false discoveries among the declared discoveries.

Cluster extent threshold

The poster used a 3-voxel extent threshold. This alone is not the same as correcting across all possible clusters.

Expected false positives

Chance of any false positive

What FWER correction is trying to control.

Sidak/FWER p threshold

Per-test threshold needed for the selected familywise risk.

BH/FDR p threshold

Threshold for the currently selected count of discoveries.

Counts: observed hits versus noise expectation

Bars compare the selected uncorrected result with the number of false positives expected by chance.

P-value gates on a log scale

Markers show how far a selected voxel p-value has to move to survive common correction gates.

What changes the conclusion?

What the salmon example shows

Uncorrected p-values answer the wrong-sized question.

A p < 0.001 threshold sounds strict for one test. Across thousands of voxels, it can still make at least one noise result almost guaranteed.

Corrections change the family being protected.

FWER asks for a low chance of any false positive. FDR is less strict when there are many discoveries, but it still needs the ranked p-values to be small enough.

Cluster extent is not a magic shield.

Requiring neighbouring voxels can help, especially with proper cluster-level correction, but a small uncorrected extent threshold can still let structured noise through.

The exact salmon correction used imaging software.

The poster reports FDR and FWER corrections in SPM, including random-field methods. This tuner uses transparent approximations so the arithmetic is visible.

Process copied into the demo

This is the visible workflow, not hidden reasoning: the useful part is the sequence of source-checking, calculation, model-building, and verification.

1 Find the source

Locate the Bennett, Baird, Miller, and Wolford poster PDF rather than relying on memory.

2 Extract the numbers

Record the search volume, uncorrected threshold, reported significant voxels, and correction outcome.

3 Replicate the arithmetic

Use the reported voxel count and threshold to estimate expected false positives and the chance of at least one false positive.

4 Make it adjustable

Expose search volume, effective tests, p-thresholds, observed hits, FWER, FDR, and extent threshold as controls.

5 Verify the page

Check the single-file HTML locally, test desktop and mobile render, then deploy as a direct-upload Cloudflare Pages site.

Source defaults: Bennett, Baird, Miller, and Wolford, post-mortem Atlantic salmon poster. The poster reports a search volume of 8,064 voxels, threshold t(131) > 3.15 / p(uncorrected) < 0.001, 3-voxel extent threshold, 16 significant voxels, and no active voxels after FDR/FWER correction.