DOFA (dofa)¶
DOFA adapter for multispectral inputs with explicit per-channel wavelengths, supporting provider and tensor backends.
Quick Facts¶
| Field | Value |
|---|---|
| Model ID | dofa |
| Family / Backbone | DOFA ViT (base / large, official checkpoints) |
| Adapter type | on-the-fly |
| Typical backend | provider backend (gee), also supports backend="tensor" |
| Primary input | Raw Sentinel-2 SR CHW + wavelengths (µm) |
| Default resolution | 10m default provider fetch (sensor.scale_m) |
| Temporal mode | provider path requires TemporalSpec.range(...) |
| Output modes | pooled, grid |
| Model config keys | model_config["variant"] (default: base; choices: base, large) |
| Extra side inputs | required wavelength vector (wavelengths_um) |
| Training alignment (adapter path) | Medium-High (when wavelengths and band semantics are correct) |
When To Use This Model¶
Good fit for¶
- multispectral experiments where wavelength-aware modeling matters
- custom sensor/band combinations (if you provide matching wavelengths)
- comparing spectral models against S2-specific models
Be careful when¶
- wavelengths are missing or mismatched with channels
- assuming arbitrary bands can be inferred automatically (only known sets like S2 are inferable)
- comparing results without logging
variantand wavelengths used
Input Contract (Current Adapter Path)¶
Spatial / temporal¶
- Provider path requires
TemporalSpec.range(start, end) - Tensor path does not use provider/temporal fetch semantics
Sensor / channels (provider path)¶
Default SensorSpec if omitted:
- Collection:
COPERNICUS/S2_SR_HARMONIZED - Bands: official DOFA S2 9-band order (
B4,B3,B2,B5,B6,B7,B8,B11,B12) scale_m=10,cloudy_pct=30,composite="median",fill_value=0.0
Wavelengths:
- Adapter requires one wavelength (µm) per channel
- If
sensor.wavelengthsis not provided, adapter tries to infer fromsensor.bands len(wavelengths_um)must equal channel countC- official preprocessing currently only supports subsets/re-orderings of the default DOFA S2 9-band set
input_chw contract (provider override path):
- must be
CHWwithC == len(bands) - raw SR values expected (
0..10000)
Tensor backend contract¶
backend="tensor"requiresinput_chwasCHW- batch tensor inputs should use
get_embeddings_batch_from_inputs(...) sensor.bandsis required so official preprocessing can be appliedsensor.wavelengthsshould be provided, orsensor.bandsmust allow wavelength inferenceinput_chwis expected to contain raw SR values (0..10000), not pre-normalized[0,1]
Preprocessing Pipeline (Current rs-embed Path)¶
Provider path¶
- Fetch raw multiband Sentinel-2 SR patch
- Optional input inspection on raw SR (
expected_channels=len(bands), value range[0,10000]) - Convert raw SR
0..10000to0..255-like scale - Apply official DOFA S2 per-band mean/std normalization
- Resize to fixed
224x224(bilinear; no crop/pad) - Load DOFA model variant (
base/large) - Forward with image tensor + wavelength vector
- Return pooled embedding or reshape tokens to patch-token grid
Tensor path¶
- Read raw SR
input_chw(CHW) - Apply the same official DOFA S2 per-band normalization used by the provider path
- Resize to
224x224 - Resolve wavelengths from
sensor.wavelengthsor infer fromsensor.bands - Forward DOFA with image + wavelengths
Fixed adapter behavior:
- image size fixed to
224in current implementation - current official preprocessing path is defined for Sentinel-2 subsets of
B4,B3,B2,B5,B6,B7,B8,B11,B12
Environment Variables / Tuning Knobs¶
| Env var | Default | Effect |
|---|---|---|
RS_EMBED_DOFA_FETCH_WORKERS |
8 |
Provider prefetch workers for batch APIs |
RS_EMBED_DOFA_BATCH_SIZE |
CPU:8, CUDA:64 |
Inference batch size for batch APIs |
Non-env model selection knobs:
model_config["variant"]:base/large(default:base)sensor.bands: channel semantics for provider fetch and wavelength inferencesensor.wavelengths: explicit wavelength vector (µm)
If model_config["variant"] is omitted, rs-embed uses the base DOFA checkpoint by default. Set model_config={"variant": "large"} to switch to the larger model.
Quick reminder:
- DOFA supports
variantdirectly throughmodel_config - current public usage is:
model_config={"variant": "base"}model_config={"variant": "large"}- for export jobs, pass the same setting via
ExportModelRequest("dofa", model_config={"variant": ...})
Output Semantics¶
OutputSpec.pooled()¶
- Returns DOFA pooled vector
(D,) - Metadata includes wavelength vector, variant, preprocess strategy, token metadata
OutputSpec.grid()¶
- Reshapes DOFA patch tokens to
xarray.DataArray(D,H,W)(usually square token grid) - Requires token count to be a perfect square
- Grid is ViT patch-token layout, not georeferenced raster pixels
Examples¶
Minimal provider-backed example (S2 wavelengths inferred automatically)¶
from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec
emb = get_embedding(
"dofa",
spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
model_config={"variant": "base"},
output=OutputSpec.pooled(),
backend="gee",
)
Switch to the large checkpoint¶
from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec
emb = get_embedding(
"dofa",
spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
model_config={"variant": "large"},
output=OutputSpec.pooled(),
backend="gee",
)
Custom bands / wavelengths example (conceptual)¶
from rs_embed import SensorSpec
sensor = SensorSpec(
collection="COPERNICUS/S2_SR_HARMONIZED",
bands=("B2", "B3", "B4", "B8"),
scale_m=10,
)
# If bands are non-standard, provide wavelengths explicitly via an extended sensor object/field used by your code path.
Common Failure Modes / Debugging¶
- provider path called with non-
rangetemporal spec - wavelength vector missing or wrong length for channel count
- unsupported bands for the official S2 preprocessing path
- tensor backend called with already-normalized
[0,1]inputs - tensor backend called without
input_chw - unknown
variant(must bebaseorlarge)
Recommended first checks:
- print/log
bandsandwavelengths_umused by the adapter - verify provider input is scaled/ordered as expected before forward pass
Reproducibility Notes¶
Keep fixed and record:
variant(basevslarge)- exact
bandsandwavelengths_um - temporal window and compositing (provider path)
- output mode (
pooledvsgrid) - whether backend is
providerortensor
Source of Truth (Code Pointers)¶
- Registration/catalog:
src/rs_embed/embedders/catalog.py - Adapter implementation:
src/rs_embed/embedders/onthefly_dofa.py - Wavelength inference map:
src/rs_embed/embedders/onthefly_dofa.py