FoMo (`fomo`)¶

Quick Facts¶

Field	Value
Model ID	`fomo`
Family / Backbone	FoMo-Bench `MultiSpectralViT` (vendored local code + checkpoint loader)
Adapter type	`on-the-fly`
Training alignment	Medium-High when `S2_KEYS`, normalization, and model config match checkpoint assumptions

FoMo In 30 Seconds

FoMo (FoMo-Bench MultiSpectralViT) treats each Sentinel-2 band as its own spectral modality: alongside the 12-band image the adapter passes 12 modality keys, and the model emits a token sequence whose layout is [modalities, H, W, D], which rs-embed then averages over the modality dimension to produce a single spatial grid.

In rs-embed, its most important characteristics are:

required 12-value spectral modality key mapping (one integer per S2 channel), overridable via RS_EMBED_FOMO_S2_KEYS: see Input Contract
grid output is a modality-averaged patch grid, with a 1×1 vector grid fallback when token layout is incompatible: see Output Semantics
small default input size (64) and patch size (16) that must stay aligned with the checkpoint — changing model-config envs silently breaks loading: see Environment Variables / Tuning Knobs

Input Contract¶

Field	Value
Backend	provider only (`gee` / `auto`)
`TemporalSpec`	`range` recommended (normalized via shared helper)
Default collection	`COPERNICUS/S2_SR_HARMONIZED`
Default bands (order)	`B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12` (12-band)
Default fetch	`scale_m=10`, `cloudy_pct=30`, `composite="median"`, `fill_value=0.0`
`input_chw`	`CHW`, `C=12` in adapter band order, raw SR `0..10000`
Side inputs	required 12 spectral modality keys — adapter provides S2 defaults

Spectral modality keys

The FoMo forward path requires one modality key per channel. The default S2 mapping is encoded in _DEFAULT_S2_MODALITY_KEYS, and can be overridden via RS_EMBED_FOMO_S2_KEYS (exactly 12 comma-separated integers).

Preprocessing Pipeline¶

Resize is the default — tiling is also available

The pipeline below shows the default input_prep="resize" path. For large ROIs, use input_prep="tile" to split the input into tiles and preserve spatial detail. See Choosing Settings.

flowchart LR
    INPUT["S2 12-band"] --> PREP["Normalize → resize 64×64"]
    PREP --> FWD["forward(x, spectral_keys)"]
    FWD --> POOL["pooled: mean/max over tokens"]
    FWD --> GRID["grid: modality-averaged\npatch grid"]

Architecture Concept¶

flowchart LR
    subgraph Input
        S2["S2 12-band"] --> K["12 spectral\nmodality keys"]
    end
    subgraph "MultiSpectralViT"
        K --> |"Band 0→Key 0\nBand 1→Key 1\n...\nBand 11→Key 11"| FWD["forward(image,\nspectral_keys)"]
        FWD --> TOK["Token output\n[modalities, H, W, D]"]
    end
    subgraph Output
        TOK --> AVG["Modality averaging\n(mean across modality axis)"]
        AVG --> GRID["grid: (D,H,W)"]
        AVG --> POOL["pooled: mean/max"]
    end

Environment Variables / Tuning Knobs¶

Core model / preprocessing¶

Env var	Default	Effect
`RS_EMBED_FOMO_IMG`	`64`	Resize target image size
`RS_EMBED_FOMO_PATCH`	`16`	Patch size (used for model config + grid expectations)
`RS_EMBED_FOMO_NORM`	`unit_scale`	`unit_scale`, `per_tile_minmax`, or `none`
`RS_EMBED_FOMO_S2_KEYS`	adapter S2 default mapping	12 comma-separated modality keys
`RS_EMBED_FOMO_FETCH_WORKERS`	`8`	Provider prefetch workers for batch APIs

FoMo model config (advanced; keep aligned with checkpoint)¶

Env var	Default	Effect
`RS_EMBED_FOMO_DIM`	`768`	Model dim
`RS_EMBED_FOMO_DEPTH`	`12`	Transformer depth
`RS_EMBED_FOMO_HEADS`	`12`	Attention heads
`RS_EMBED_FOMO_MLP_DIM`	`2048`	MLP dim
`RS_EMBED_FOMO_NUM_CLASSES`	`1000`	Class head size (config compatibility)

Checkpoint loading¶

Env var	Default	Effect
`RS_EMBED_FOMO_CKPT`	unset	Local checkpoint path
`RS_EMBED_FOMO_AUTO_DOWNLOAD`	`1`	Allow checkpoint auto-download
`RS_EMBED_FOMO_CACHE_DIR`	`~/.cache/rs_embed/fomo`	Checkpoint cache dir
`RS_EMBED_FOMO_CKPT_FILE`	default FoMo checkpoint filename	Cached ckpt filename
`RS_EMBED_FOMO_CKPT_URL`	default Dropbox URL	Checkpoint download URL
`RS_EMBED_FOMO_CKPT_MIN_BYTES`	adapter threshold	Download size sanity check

Output Semantics¶

pooled: token mean/max over the full sequence; metadata records token_count, token_dim, and pooling mode.

grid: tokens are interpreted as [modalities, H, W, D], averaged over modalities → (D,H,W) with grid_kind="spectral_mean_patch_tokens"; falls back to a 1x1 vector grid with grid_kind="vector_as_1x1" when token layout is incompatible.

Examples¶

Minimal provider-backed example¶

from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec

emb = get_embedding(
    "fomo",
    spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
    temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
    output=OutputSpec.pooled(),
    backend="gee",
)

Example FoMo tuning (env-controlled)¶

# Example (shell):
export RS_EMBED_FOMO_IMG=64
export RS_EMBED_FOMO_PATCH=16
export RS_EMBED_FOMO_NORM=unit_scale
export RS_EMBED_FOMO_S2_KEYS=6,7,8,9,10,11,12,13,14,15,17,18

Paper & Links¶

Publication: AAAI 2025
Code: RolnickLab/FoMo-Bench

Reference¶

RS_EMBED_FOMO_S2_KEYS must have exactly 12 values — length mismatches raise immediately.
Grid output uses modality-averaging over spectral keys; if the token layout is incompatible, the adapter falls back to a 1×1 grid.
The default image size is 64 (not 224) — this is intentional and matches FoMo's architecture.

FoMo (fomo)¶