SatMAE RGB (`satmae`)¶

Quick Facts¶

Field	Value
Model ID	`satmae`
Aliases	`satmae_rgb`
Family / Backbone	SatMAE via `rshf.satmae.SatMAE`
Adapter type	`on-the-fly`
Training alignment	Medium-High (higher when wrapper `model.transform(...)` is available and used)

SatMAE In 30 Seconds

SatMAE is an MAE-pretrained ViT on fMoW imagery exposed via rshf.satmae.SatMAE, and in rs-embed it is the simplest RGB-only token extractor: forward_encoder(mask_ratio=0.0) is called every time to get patch tokens, which are then either pooled to a vector or reshaped into a ViT patch-token grid.

In rs-embed, its most important characteristics are:

RGB-only (B4,B3,B2); raw SR is converted to uint8 before model preprocessing: see Preprocessing Pipeline
token path is always used (mask_ratio=0.0), and any CLS token is auto-removed before pooling/grid: see Output Semantics
checkpoint selection via RS_EMBED_SATMAE_ID (Hugging Face model ID) — default targets the fMoW large checkpoint: see Environment Variables / Tuning Knobs

Input Contract¶

Field	Value
Backend	provider only (`gee` / `auto`)
`TemporalSpec`	`range` recommended (normalized via shared helper)
Default collection	`COPERNICUS/S2_SR_HARMONIZED`
Default bands (order)	`B4, B3, B2`
Default fetch	`scale_m=10`, `cloudy_pct=30`, `composite="median"`
`input_chw`	`CHW`, `C=3` in `(B4,B3,B2)` order, raw SR `0..10000`
Side inputs	none

The adapter converts raw SR 0..10000 to uint8 RGB before model preprocessing.

Preprocessing Pipeline¶

Resize is the default — tiling is also available

The pipeline below shows the default input_prep="resize" path. For large ROIs, use input_prep="tile" to split the input into tiles and preserve spatial detail. See Choosing Settings.

flowchart LR
    INPUT["S2 RGB → uint8"] --> PREP["Resize 224×224\n→ model.transform or fallback"]
    PREP --> FWD["forward_encoder\n(mask_ratio=0.0)"]
    FWD --> POOL["pooled: patch mean/max"]
    FWD --> GRID["grid: reshape (D,H,W)"]

Token extraction

The current adapter path always targets token output rather than pre-pooled wrapper outputs. If a CLS token is present, the pooling and grid helpers remove it automatically.

Architecture Concept¶

flowchart LR
    subgraph Input
        RGB["S2 RGB\n(B4,B3,B2)"] --> U8["uint8"]
    end
    subgraph "MAE ViT (fMoW pretrained)"
        U8 --> ENC["forward_encoder\nmask_ratio=0.0\n(all patches visible)"]
        ENC --> CLS["Remove CLS\ntoken"]
        CLS --> TOK["Patch tokens\nN_patches, D"]
    end
    subgraph Output
        TOK --> POOL["pooled:\nmean / max"]
        TOK --> GRID["grid:\nreshape (D,H,W)"]
    end

Environment Variables / Tuning Knobs¶

Env var	Default	Effect
`RS_EMBED_SATMAE_ID`	`MVRL/satmae-vitlarge-fmow-pretrain-800`	HF model ID used by `SatMAE.from_pretrained(...)`
`RS_EMBED_SATMAE_IMG`	`224`	Resize / preprocess image size
`RS_EMBED_SATMAE_FETCH_WORKERS`	`8`	Provider prefetch workers for batch APIs
`RS_EMBED_SATMAE_BATCH_SIZE`	CPU:`8`, CUDA:`32`	Inference batch size for batch APIs

Examples¶

Minimal provider-backed example¶

from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec

emb = get_embedding(
    "satmae",
    spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
    temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
    output=OutputSpec.pooled(),
    backend="gee",
)

Example model/image-size tuning (env-controlled)¶

# Example (shell):
export RS_EMBED_SATMAE_ID=MVRL/satmae-vitlarge-fmow-pretrain-800
export RS_EMBED_SATMAE_IMG=224

Paper & Links¶

Publication: NeurIPS 2022
Code: sustainlab-group/SatMAE

Reference¶

Provider-only — backend="tensor" is not supported.
Requires rshf with a compatible SatMAE wrapper exposing forward_encoder.
The adapter auto-removes the CLS token; if rshf changes its output format, grid reshape may break.

SatMAE RGB (satmae)¶