Skip to content

Supported Models (Advanced Reference)

How To Use This Page

Read Temporal Handling and Multi-frame Semantics before comparing temporal models. Use Modality and Extra Inputs Matrix when you need a fair benchmark across models with different side inputs. Preprocessing and Temporal Env Vars matters mainly when tuning preprocessing or reproducing training pipelines.

Precomputed Embeddings

Model ID Output Resolution Dim Time Coverage Notes
Tessera tessera pooled / grid 10m 128 2017–2025 GeoTessera global tile embeddings
Google Satellite Embedding (Alpha Earth) gse pooled / grid 10 m 64 2017–2024 Annual embeddings via GEE
Copernicus Embed copernicus pooled / grid 0.25° 768 2021 Official Copernicus embeddings

On-the-fly Foundation Models

For per-model input contracts, preprocessing pipelines, and environment variables, see each model's detail page (linked from Models Overview). The tables below focus on cross-model comparison dimensions that are hard to see from individual pages.

Temporal Handling

Read this section before comparing any model that accepts TemporalSpec.range(...).

For most on-the-fly adapters, TemporalSpec.range(start, end) means "filter imagery in [start, end) and build one composite patch for model input," usually with median and optionally mosaic through SensorSpec.composite.

The multi-frame adapters agrifm, anysat, and galileo instead split the requested range into sub-windows and composite one frame per bin. Current single-composite adapters include remoteclip, satmae, satmaepp, satmaepp_s2_10b, scalemae, wildsat, prithvi, terrafm, terramind, dofa, fomo, thor, and satvision.

Multi-frame Semantics

This section only matters for adapters that construct multi-frame inputs from one requested time window.

Shared behavior for current multi-frame adapters (agrifm, anysat, galileo):

All three split TemporalSpec.range(start, end) into T equal end-exclusive sub-windows and composite each sub-window into one frame. If a sub-window has no valid observation, the provider path reuses a fallback composite so frame count stays stable. The runtime always enforces exactly T frames; for user-provided inputs that means CHW is repeated to T and TCHW is padded or truncated to T. Frame compositing follows SensorSpec.composite, with median as the default and mosaic as the main alternative.

Per-model temporal packaging:

Model ID Frame count env (default) Temporal side input Notes
agrifm RS_EMBED_AGRIFM_FRAMES (8) none (uses TCHW directly) Temporal information is encoded only in the frame stack.
anysat RS_EMBED_ANYSAT_FRAMES (8) s2_dates (per-frame DOY, 0..364) DOY values are derived from each frame bin midpoint date.
galileo RS_EMBED_GALILEO_FRAMES (8) months (per-frame month, 1..12) By default from frame bin midpoints; RS_EMBED_GALILEO_MONTH can force a constant month for all frames.

Modality and Extra Inputs Matrix

Use this table to avoid unfair comparisons between plain image encoders and adapters that require side inputs.

Interpretation

"Backbone multimodal" means the upstream model family supports multiple modalities. "Current rs-embed path" means what this implementation actually feeds today. "Requires extra metadata" means the forward path needs non-image inputs as a hard requirement.

Model ID Backbone multimodal? Current rs-embed path uses multiple modalities? Multi-input forward (beyond image tensor)? Requires extra metadata?
remoteclip No No No No
satmae No No No No
satmaepp No No No No
satmaepp_s2_10b No (this adapter path) No No No (but strict 10-band order is required)
scalemae No No Yes (input_res_m) Yes: scale/resolution (sensor.scale_m)
anysat Yes Partially (S2-only imagery, plus temporal date tokens) Yes (s2, s2_dates) Yes: day-of-year/date signal (derived from temporal range)
galileo Yes Mostly S2 path in current adapter + temporal month tokens Yes (multiple tensors + masks + months) Yes: month/time signal (derived from temporal range)
wildsat No No No No
prithvi No (this adapter path) No Yes (x, temporal_coords, location_coords) Yes: location + time are required
terrafm Yes (S1/S2) Yes (select one modality per call: s1 or s2) No No hard extra metadata (optional S1 options: orbit, linear/DB path)
terramind Yes Usually single selected modality (S2L2A default) No (single selected modality tensor in this adapter) No hard extra metadata
dofa Yes (spectral generalization) Yes (multi-band spectral input) Yes (image + wavelength list) Yes: per-band wavelengths (explicit or inferable from bands)
fomo No No No No
thor Yes (S1/S2) Yes (select one modality per call: s1 or s2) No No hard extra metadata (optional S1 options: orbit, linear/DB path)
agrifm No (this adapter path) No No extra side tensor, but temporal stack [T,C,H,W] required Temporal coverage is important (no separate metadata tensor)
satvision No (this adapter path) No No separate side tensor Yes: strict 14-channel order/calibration schema (band semantics)

In practice, the most obviously multi-input models here are prithvi (image plus temporal and location coordinates), anysat (time series plus s2_dates), galileo (image-derived tensors plus masks and months), dofa (image plus wavelengths), and scalemae (image plus input_res_m).

Preprocessing and Temporal Env Vars

This table only lists env vars that materially change model input construction or temporal packaging.

Model ID Main preprocessing env keys
remoteclip fixed image_size=224 in code path; no per-model preprocess env switch
satmae RS_EMBED_SATMAE_IMG
satmaepp RS_EMBED_SATMAEPP_ID, RS_EMBED_SATMAEPP_IMG, RS_EMBED_SATMAEPP_CHANNEL_ORDER, RS_EMBED_SATMAEPP_BGR
satmaepp_s2_10b RS_EMBED_SATMAEPP_S2_CKPT_REPO, RS_EMBED_SATMAEPP_S2_CKPT_FILE, RS_EMBED_SATMAEPP_S2_MODEL_FN, RS_EMBED_SATMAEPP_S2_IMG, RS_EMBED_SATMAEPP_S2_PATCH, RS_EMBED_SATMAEPP_S2_GRID_REDUCE, RS_EMBED_SATMAEPP_S2_WEIGHTS_ONLY
scalemae RS_EMBED_SCALEMAE_IMG
anysat RS_EMBED_ANYSAT_IMG, RS_EMBED_ANYSAT_NORM, RS_EMBED_ANYSAT_FRAMES, RS_EMBED_ANYSAT_GRID_MODE, RS_EMBED_ANYSAT_POOLED_SOURCE
galileo RS_EMBED_GALILEO_IMG, RS_EMBED_GALILEO_PATCH, RS_EMBED_GALILEO_NORM, RS_EMBED_GALILEO_FRAMES, RS_EMBED_GALILEO_MONTH
wildsat RS_EMBED_WILDSAT_IMG, RS_EMBED_WILDSAT_NORM
prithvi RS_EMBED_PRITHVI_PREP, RS_EMBED_PRITHVI_IMG, RS_EMBED_PRITHVI_PATCH_MULT
terrafm modality and sensor-side options (s2/s1); image size fixed to 224 in implementation
terramind RS_EMBED_TERRAMIND_NORMALIZE (default z-score stats), image size fixed 224
dofa image size fixed 224; provider/tensor channels and wavelengths drive preprocessing
fomo RS_EMBED_FOMO_IMG, RS_EMBED_FOMO_NORM
thor RS_EMBED_THOR_IMG, RS_EMBED_THOR_NORMALIZE, plus modality and sensor-side options (s2/s1)
agrifm RS_EMBED_AGRIFM_IMG, RS_EMBED_AGRIFM_NORM, RS_EMBED_AGRIFM_FRAMES
satvision RS_EMBED_SATVISION_TOA_IMG, RS_EMBED_SATVISION_TOA_NORM, channel-index and calibration env keys

Practical Guidance

For highest reproducibility, keep each model's default normalization mode unless you can match the original training pipeline exactly. For strict-schema models such as satvision, terramind, thor, and agrifm, do not change channel order unless checkpoint metadata explicitly allows it. If you are comparing embeddings across models, standardize ROI and temporal compositing first, because preprocessing differences are substantial.