◆ methods
How this atlas was built.
Organisms & sampling
Around 112 pure-isolate soil organisms (~90 at species level) spanning six kingdoms and 27 phyla. Six positive-mode and six negative-mode FBMN batches acquired on Orbitrap.
Feature alignment
Cross-batch LOESS retention-time correction; GNPS2-based consensus alignment yielded 273,248 features × 270 samples (Analysis-15).
Biomarker scoring
Three complementary tier systems are run in parallel — each captures a different statistical claim about a feature. A single feature can sit in zero, one, two, or all three sets. This page surfaces the union honestly, with parallel badges.
- composite — specificity × prevalence × intensity across phyla. Tiered Platinum / Diamond / Silver.
- indval — Dufrêne-Legendre IndVal computed within each batch, then consensus across batches.
- simper — SIMPER fingerprint: features that contribute most to between-phylum community dissimilarity.
Annotation
Layered evidence: LipidSearch direct, diagnostic-ion classification, SIRIUS+CANOPUS, CSI:FingerID, and MS²Query spectral matching. Annotations are tiered Gold / Silver / Bronze by cross-method agreement; ~27.7% of biomarkers have no confident annotation and are published as "known unknowns".
Validation
Public MASST against GNPS community data (detection rate, soil-specific detection rate) and ClimGrass — a long-term warming × drought grassland experiment — where 24.4% of soil spectra match atlas biomarkers after the Analysis-19 IS+RIE correction.
Reproducibility
Pipeline code, MS² reference spectra, and trained models are released under
permissive open licenses. The full analysis corpus (scripts, notebooks, manuscript
drafts) lives in soillifeatlas/soilmass-analysis.
This atlas is regenerated by a single script that reads the canonical tables and
emits atlas.duckdb.