◆ methods

How this atlas was built.

Organisms & sampling

Around 112 pure-isolate soil organisms (~90 at species level) spanning six kingdoms and 27 phyla. Six positive-mode and six negative-mode FBMN batches acquired on Orbitrap.

Feature alignment

Cross-batch LOESS retention-time correction; GNPS2-based consensus alignment yielded 273,248 features × 270 samples (Analysis-15).

Biomarker scoring

Three complementary tier systems are run in parallel — each captures a different statistical claim about a feature. A single feature can sit in zero, one, two, or all three sets. This page surfaces the union honestly, with parallel badges.

composite — specificity × prevalence × intensity across phyla. Tiered Platinum / Diamond / Silver.
indval — Dufrêne-Legendre IndVal computed within each batch, then consensus across batches.
simper — SIMPER fingerprint: features that contribute most to between-phylum community dissimilarity.

Annotation

Layered evidence: LipidSearch direct, diagnostic-ion classification, SIRIUS+CANOPUS, CSI:FingerID, and MS²Query spectral matching. Annotations are tiered Gold / Silver / Bronze by cross-method agreement; ~27.7% of biomarkers have no confident annotation and are published as "known unknowns".

Validation

Public MASST against GNPS community data (detection rate, soil-specific detection rate) and ClimGrass — a long-term warming × drought grassland experiment — where 24.4% of soil spectra match atlas biomarkers after the Analysis-19 IS+RIE correction.

Reproducibility

Pipeline code, MS² reference spectra, and trained models are released under permissive open licenses. The full analysis corpus (scripts, notebooks, manuscript drafts) lives in soillifeatlas/soilmass-analysis. This atlas is regenerated by a single script that reads the canonical tables and emits atlas.duckdb.