Supported Models (Advanced Reference)¶

How To Use This Page¶

Read Temporal Handling and Multi-frame Semantics before comparing temporal models. Use Modality and Extra Inputs Matrix when you need a fair benchmark across models with different side inputs. Preprocessing and Temporal Env Vars matters mainly when tuning preprocessing or reproducing training pipelines.

Precomputed Embeddings¶

Model	ID	Output	Resolution	Dim	Time Coverage	Notes
Tessera	`tessera`	pooled / grid	10m	128	2017–2025	GeoTessera global tile embeddings
Google Satellite Embedding (Alpha Earth)	`gse`	pooled / grid	10 m	64	2017–2024	Annual embeddings via GEE
Copernicus Embed	`copernicus`	pooled / grid	0.25°	768	2021	Official Copernicus embeddings

On-the-fly Foundation Models¶

For per-model input contracts, preprocessing pipelines, and environment variables, see each model's detail page (linked from Models Overview). The tables below focus on cross-model comparison dimensions that are hard to see from individual pages.

Temporal Handling¶

Read this section before comparing any model that accepts TemporalSpec.range(...).

For most on-the-fly adapters, TemporalSpec.range(start, end) means "filter imagery in [start, end) and build one composite patch for model input," usually with median and optionally mosaic through SensorSpec.composite.

The multi-frame adapters agrifm, anysat, and galileo instead split the requested range into sub-windows and composite one frame per bin. Current single-composite adapters include remoteclip, satmae, satmaepp, satmaepp_s2_10b, scalemae, wildsat, prithvi, terrafm, terramind, dofa, fomo, thor, and satvision.

Multi-frame Semantics¶

This section only matters for adapters that construct multi-frame inputs from one requested time window.

Shared behavior for current multi-frame adapters (agrifm, anysat, galileo):

All three split TemporalSpec.range(start, end) into T equal end-exclusive sub-windows and composite each sub-window into one frame. If a sub-window has no valid observation, the provider path reuses a fallback composite so frame count stays stable. The runtime always enforces exactly T frames; for user-provided inputs that means CHW is repeated to T and TCHW is padded or truncated to T. Frame compositing follows SensorSpec.composite, with median as the default and mosaic as the main alternative.

Per-model temporal packaging:

Model ID	Frame count env (default)	Temporal side input	Notes
`agrifm`	`RS_EMBED_AGRIFM_FRAMES` (`8`)	none (uses `TCHW` directly)	Temporal information is encoded only in the frame stack.
`anysat`	`RS_EMBED_ANYSAT_FRAMES` (`8`)	`s2_dates` (per-frame DOY, `0..364`)	DOY values are derived from each frame bin midpoint date.
`galileo`	`RS_EMBED_GALILEO_FRAMES` (`8`)	`months` (per-frame month, `1..12`)	By default from frame bin midpoints; `RS_EMBED_GALILEO_MONTH` can force a constant month for all frames.

Modality and Extra Inputs Matrix¶

Use this table to avoid unfair comparisons between plain image encoders and adapters that require side inputs.

Interpretation

"Backbone multimodal" means the upstream model family supports multiple modalities. "Current rs-embed path" means what this implementation actually feeds today. "Requires extra metadata" means the forward path needs non-image inputs as a hard requirement.

Model ID	Backbone multimodal?	Current rs-embed path uses multiple modalities?	Multi-input forward (beyond image tensor)?	Requires extra metadata?
`remoteclip`	No	No	No	No
`satmae`	No	No	No	No
`satmaepp`	No	No	No	No
`satmaepp_s2_10b`	No (this adapter path)	No	No	No (but strict 10-band order is required)
`scalemae`	No	No	Yes (`input_res_m`)	Yes: scale/resolution (`sensor.scale_m`)
`anysat`	Yes	Partially (S2-only imagery, plus temporal date tokens)	Yes (`s2`, `s2_dates`)	Yes: day-of-year/date signal (derived from temporal range)
`galileo`	Yes	Mostly S2 path in current adapter + temporal month tokens	Yes (multiple tensors + masks + `months`)	Yes: month/time signal (derived from temporal range)
`wildsat`	No	No	No	No
`prithvi`	No (this adapter path)	No	Yes (`x`, `temporal_coords`, `location_coords`)	Yes: location + time are required
`terrafm`	Yes (`S1`/`S2`)	Yes (select one modality per call: `s1` or `s2`)	No	No hard extra metadata (optional S1 options: orbit, linear/DB path)
`terramind`	Yes	Usually single selected modality (`S2L2A` default)	No (single selected modality tensor in this adapter)	No hard extra metadata
`dofa`	Yes (spectral generalization)	Yes (multi-band spectral input)	Yes (image + wavelength list)	Yes: per-band wavelengths (explicit or inferable from bands)
`fomo`	No	No	No	No
`thor`	Yes (`S1`/`S2`)	Yes (select one modality per call: `s1` or `s2`)	No	No hard extra metadata (optional S1 options: orbit, linear/DB path)
`agrifm`	No (this adapter path)	No	No extra side tensor, but temporal stack `[T,C,H,W]` required	Temporal coverage is important (no separate metadata tensor)
`satvision`	No (this adapter path)	No	No separate side tensor	Yes: strict 14-channel order/calibration schema (band semantics)

In practice, the most obviously multi-input models here are prithvi (image plus temporal and location coordinates), anysat (time series plus s2_dates), galileo (image-derived tensors plus masks and months), dofa (image plus wavelengths), and scalemae (image plus input_res_m).

Preprocessing and Temporal Env Vars¶

This table only lists env vars that materially change model input construction or temporal packaging.

Model ID	Main preprocessing env keys
`remoteclip`	fixed `image_size=224` in code path; no per-model preprocess env switch
`satmae`	`RS_EMBED_SATMAE_IMG`
`satmaepp`	`RS_EMBED_SATMAEPP_ID`, `RS_EMBED_SATMAEPP_IMG`, `RS_EMBED_SATMAEPP_CHANNEL_ORDER`, `RS_EMBED_SATMAEPP_BGR`
`satmaepp_s2_10b`	`RS_EMBED_SATMAEPP_S2_CKPT_REPO`, `RS_EMBED_SATMAEPP_S2_CKPT_FILE`, `RS_EMBED_SATMAEPP_S2_MODEL_FN`, `RS_EMBED_SATMAEPP_S2_IMG`, `RS_EMBED_SATMAEPP_S2_PATCH`, `RS_EMBED_SATMAEPP_S2_GRID_REDUCE`, `RS_EMBED_SATMAEPP_S2_WEIGHTS_ONLY`
`scalemae`	`RS_EMBED_SCALEMAE_IMG`
`anysat`	`RS_EMBED_ANYSAT_IMG`, `RS_EMBED_ANYSAT_NORM`, `RS_EMBED_ANYSAT_FRAMES`, `RS_EMBED_ANYSAT_GRID_MODE`, `RS_EMBED_ANYSAT_POOLED_SOURCE`
`galileo`	`RS_EMBED_GALILEO_IMG`, `RS_EMBED_GALILEO_PATCH`, `RS_EMBED_GALILEO_NORM`, `RS_EMBED_GALILEO_FRAMES`, `RS_EMBED_GALILEO_MONTH`
`wildsat`	`RS_EMBED_WILDSAT_IMG`, `RS_EMBED_WILDSAT_NORM`
`prithvi`	`RS_EMBED_PRITHVI_PREP`, `RS_EMBED_PRITHVI_IMG`, `RS_EMBED_PRITHVI_PATCH_MULT`
`terrafm`	modality and sensor-side options (`s2`/`s1`); image size fixed to 224 in implementation
`terramind`	`RS_EMBED_TERRAMIND_NORMALIZE` (default z-score stats), image size fixed 224
`dofa`	image size fixed 224; provider/tensor channels and wavelengths drive preprocessing
`fomo`	`RS_EMBED_FOMO_IMG`, `RS_EMBED_FOMO_NORM`
`thor`	`RS_EMBED_THOR_IMG`, `RS_EMBED_THOR_NORMALIZE`, plus modality and sensor-side options (`s2`/`s1`)
`agrifm`	`RS_EMBED_AGRIFM_IMG`, `RS_EMBED_AGRIFM_NORM`, `RS_EMBED_AGRIFM_FRAMES`
`satvision`	`RS_EMBED_SATVISION_TOA_IMG`, `RS_EMBED_SATVISION_TOA_NORM`, channel-index and calibration env keys

Practical Guidance¶

For highest reproducibility, keep each model's default normalization mode unless you can match the original training pipeline exactly. For strict-schema models such as satvision, terramind, thor, and agrifm, do not change channel order unless checkpoint metadata explicitly allows it. If you are comparing embeddings across models, standardize ROI and temporal compositing first, because preprocessing differences are substantial.