DOFA (`dofa`)¶

Quick Facts¶

Field	Value
Model ID	`dofa`
Family / Backbone	DOFA ViT (`base` / `large`, official checkpoints)
Adapter type	`on-the-fly`
Model config keys	`variant` (default: `base`; choices: `base`, `large`)
Training alignment	Medium-High (when wavelengths and band semantics are correct)

DOFA In 30 Seconds

DOFA ("Dynamic One-For-All") is a wavelength-conditioned ViT: it takes a per-channel wavelength vector (µm) as an explicit side input and uses it to generate its patch embeddings, so arbitrary multispectral band combinations work as long as you provide matching wavelengths — the wavelengths are part of the model input, not metadata.

In rs-embed, its most important characteristics are:

required per-channel wavelengths_um side input; inference from sensor.bands only works for known S2 subsets: see Input Contract
official DOFA S2 per-band mean/std preprocessing locked to fixed 224×224: see Preprocessing Pipeline
supports both provider backend and backend="tensor", with the tensor path rejecting already-normalized [0,1] inputs: see Input Contract

Input Contract¶

Provider backend (gee / auto)Tensor backend (tensor)

Field	Value
`TemporalSpec`	required `TemporalSpec.range(start, end)`
Default collection	`COPERNICUS/S2_SR_HARMONIZED`
Default bands (order)	`B4, B3, B2, B5, B6, B7, B8, B11, B12` (official DOFA S2 9-band order)
Default fetch	`scale_m=10`, `cloudy_pct=30`, `composite="median"`, `fill_value=0.0`
`input_chw` override	`CHW`, `C == len(bands)`, raw SR `0..10000`
Side inputs	required `wavelengths_um` (µm, one per channel)

Field	Value
`TemporalSpec`	ignored (no provider fetch)
`input_chw`	required, `CHW`, raw SR `0..10000` — not pre-normalized `[0,1]`
`sensor.bands`	required — gates the official per-band preprocessing path
Side inputs	required `wavelengths_um`, or inferable from `sensor.bands` for known S2 subsets
Batch API	use `get_embeddings_batch_from_inputs(...)` for batched tensor inputs

Wavelength inference

If sensor.wavelengths is not provided, the adapter tries to infer from sensor.bands, but this only works for the known Sentinel-2 subsets. Whatever the source, len(wavelengths_um) must always match C. For custom band combinations, pass sensor.wavelengths explicitly.

Preprocessing Pipeline¶

Resize is the default — tiling is also available

The pipeline below shows the default input_prep="resize" path. For large ROIs, use input_prep="tile" to split the input into tiles and preserve spatial detail. See Choosing Settings.

Provider path¶

flowchart LR
    FETCH["Fetch multiband\nS2 SR"] --> PREP["Scale → per-band norm\n→ resize 224×224"]
    PREP --> FWD["DOFA forward\n(image, wavelengths)"]
    FWD --> POOL["pooled: embedding vector"]
    FWD --> GRID["grid: patch-token (D,H,W)"]

Tensor path¶

flowchart LR
    INPUT["Read raw SR\ninput_chw"] --> PREP["Reject [0,1] inputs\n→ provider-equiv norm\n→ resize 224×224"]
    PREP --> WL["Resolve wavelengths\n(sensor or infer from bands)"]
    WL --> FWD["forward(image, wavelengths)"]
    FWD --> POOL["pooled: embedding vector"]
    FWD --> GRID["grid: patch-token grid"]

Fixed adapter behavior

The current implementation fixes image size at 224, and the official preprocessing path is defined for Sentinel-2 subsets of B4,B3,B2,B5,B6,B7,B8,B11,B12.

Architecture Concept¶

flowchart LR
    BANDS["Variable S2 bands\n(default 9-band)"] --> DPE["Dynamic patch embedding\n← wavelengths (µm) per channel"]
    DPE --> TF["Transformer"]
    TF --> POOL["pooled: embedding vector"]
    TF --> GRID["grid: patch-token (D,H,W)"]

Environment Variables / Tuning Knobs¶

Env var	Default	Effect
`RS_EMBED_DOFA_FETCH_WORKERS`	`8`	Provider prefetch workers for batch APIs
`RS_EMBED_DOFA_BATCH_SIZE`	CPU:`8`, CUDA:`64`	Inference batch size for batch APIs
`RS_EMBED_DOFA_BASE_WEIGHTS`	unset	Local override for the base checkpoint file
`RS_EMBED_DOFA_LARGE_WEIGHTS`	unset	Local override for the large checkpoint file
`RS_EMBED_DOFA_WEIGHTS_DIR`	unset	Directory override containing DOFA checkpoint files
`RS_EMBED_DOFA_HF_REPO_ID`	`earthflow/DOFA`	Hugging Face repo used for checkpoint download
`RS_EMBED_DOFA_HF_REVISION`	`main`	Hugging Face revision used for checkpoint download

Non-env model selection knobs:

The main non-env knobs are variant (base or large), sensor.bands for channel semantics and wavelength inference, and sensor.wavelengths for an explicit wavelength vector in micrometers.

Model-specific Settings¶

variant selects the DOFA backbone size. In rs-embed, pass it as variant="base" | "large" (or their one-letter aliases "b"/"l").

Variant	Vendored builder	HF checkpoint file	Patch size	Embed dim	Transformer blocks	Attention heads	Notes
`base`	`vit_base_patch16`	`DOFA_ViT_base_e100.pth`	16	768	12	12	Current default. ViT-B/16-class backbone with the DOFA wavelength-dynamic patch embed (~113M params total).
`large`	`vit_large_patch16`	`DOFA_ViT_large_e100.pth`	16	1024	24	16	Highest capacity. ViT-L/16-class backbone (~329M params total).

How To Read Embed Dim

Embed dim is DOFA's embed_dim and becomes the pooled embedding width (D,) and the channel dimension of a grid output (D,H,W).

Checkpoints are downloaded from RS_EMBED_DOFA_HF_REPO_ID (default earthflow/DOFA) at RS_EMBED_DOFA_HF_REVISION. You can fully override them with RS_EMBED_DOFA_BASE_WEIGHTS / RS_EMBED_DOFA_LARGE_WEIGHTS, or point RS_EMBED_DOFA_WEIGHTS_DIR at a directory that contains both files.

Quick reminder

Pass variant directly to get_embedding("dofa", ..., variant="base"). For export jobs, use ExportModelRequest.configure("dofa", variant="large").

Example:

from rs_embed import PointBuffer, TemporalSpec, OutputSpec, get_embedding

emb = get_embedding(
    "dofa",
    spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
    temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
    output=OutputSpec.pooled(),
    backend="gee",
    variant="large",
)

Examples¶

Minimal provider-backed example (S2 wavelengths inferred automatically)¶

from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec

emb = get_embedding(
    "dofa",
    spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
    temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
    output=OutputSpec.pooled(),
    backend="gee",
)

Switch to the large checkpoint¶

from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec

emb = get_embedding(
    "dofa",
    spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
    temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
    output=OutputSpec.pooled(),
    backend="gee",
    variant="large",
)

Custom bands / wavelengths example (conceptual)¶

from rs_embed import SensorSpec

sensor = SensorSpec(
    collection="COPERNICUS/S2_SR_HARMONIZED",
    bands=("B2", "B3", "B4", "B8"),
    scale_m=10,
)
# If bands are non-standard, provide wavelengths explicitly via an extended sensor object/field used by your code path.

Paper & Links¶

Publication: arXiv 2024
Code: zhu-xlab/DOFA

Reference¶

The tensor backend rejects inputs already in [0,1] — pass raw SR values (0..10000).
wavelengths_um must have exactly as many entries as input channels. If using non-standard bands, pass sensor.wavelengths explicitly.
Only base and large variants exist; other values raise immediately.

DOFA (dofa)¶