AgriFM (agrifm)¶
Quick Facts¶
| Field | Value |
|---|---|
| Model ID | agrifm |
| Family / Backbone | AgriFM (vendored Video Swin runtime + checkpoint loader) |
| Adapter type | on-the-fly |
| Training alignment | High when n_frames, normalization, and checkpoint version are fixed |
AgriFM In 30 Seconds
AgriFM is a Video-Swin-style temporal backbone trained on Sentinel-2 time series for agricultural/crop targets, and in rs-embed it fits best when you want a fixed-length multi-frame S2 stack turned into a pooled vector or a model feature-map grid.
In rs-embed, its most important characteristics are:
- fixed-
T-frame stack with silentCHW -> Trepeat behavior for single-frame inputs: see Input Contract - crop-oriented AgriFM S2 normalization statistics (
agrifm_stats) as the default path: see Environment Variables / Tuning Knobs - model feature-grid output
(D,H,W)rather than a ViT patch-token reshape: see Output Semantics
Input Contract¶
| Field | Value |
|---|---|
| Backend | provider only (gee / auto) |
TemporalSpec |
range recommended — window split into T frames by the shared helper |
| Default collection | COPERNICUS/S2_SR_HARMONIZED |
| Default bands (order) | B2, B3, B4, B5, B6, B7, B8, B8A, B11, B12 (10-band) |
| Default fetch | scale_m=10, cloudy_pct=30, composite="median", fill_value=0.0 |
input_chw |
CHW (C=10, repeated to T) or TCHW (C=10, padded/truncated to exact T); raw SR 0..10000 |
| Side inputs | none (adapter builds the T-frame stack from provider fetch) |
T is controlled by RS_EMBED_AGRIFM_FRAMES (default 8). Values are clipped to raw S2 SR range 0..10000 before normalization.
Preprocessing Pipeline¶
Resize is the default — tiling is also available
The pipeline below shows the default input_prep="resize" path. For large ROIs, use input_prep="tile" to split the input into tiles and preserve spatial detail. See Choosing Settings.
flowchart LR
INPUT["S2 10-band TCHW\n(coerce to T frames)"] --> PREP["Normalize → resize\nall frames 224×224"]
PREP --> FWD["Video-Swin forward\n→ feature grid"]
FWD --> POOL["pooled: spatial mean/max"]
FWD --> GRID["grid: (D,H,W)"]
Architecture Concept¶
flowchart LR
subgraph Input
F["S2 10-band"] --> T{Temporal?}
T -- "multi-frame TCHW" --> USE["Use as-is\n(trim/pad to T)"]
T -- "single-frame CHW" --> REP["Repeat to\nT frames"]
end
subgraph "Video-Swin Backbone"
USE --> VS["Video-Swin\nencoder"]
REP --> VS
end
subgraph "Output (feature-map, not patch tokens)"
VS --> POOL["pooled:\nspatial mean/max"]
VS --> GRID["grid: (D,H,W)\nfeature-map"]
end
Environment Variables / Tuning Knobs¶
Temporal / preprocessing¶
| Env var | Default | Effect |
|---|---|---|
RS_EMBED_AGRIFM_FRAMES |
8 |
Fixed frame count T |
RS_EMBED_AGRIFM_IMG |
224 |
Resize target image size |
RS_EMBED_AGRIFM_NORM |
agrifm_stats |
agrifm_stats, unit_scale, or none |
RS_EMBED_AGRIFM_FETCH_WORKERS |
8 |
Provider prefetch workers for batch APIs |
Checkpoint loading¶
| Env var | Default | Effect |
|---|---|---|
RS_EMBED_AGRIFM_CKPT |
unset | Local checkpoint path |
RS_EMBED_AGRIFM_AUTO_DOWNLOAD |
1 |
Allow checkpoint auto-download |
RS_EMBED_AGRIFM_CACHE_DIR |
~/.cache/rs_embed/agrifm |
Checkpoint cache dir |
RS_EMBED_AGRIFM_CKPT_FILE |
AgriFM.pth |
Cached checkpoint filename |
RS_EMBED_AGRIFM_CKPT_URL |
project default URL | Checkpoint download URL |
RS_EMBED_AGRIFM_CKPT_MIN_BYTES |
large-size threshold | Download validation threshold |
Output Semantics¶
pooled: spatial mean/max pooling over the AgriFM feature grid; metadata records frame count and normalization settings.
grid: (D,H,W) feature-map output from the Video-Swin encoder, not a ViT patch-token reshape.
Examples¶
Minimal provider-backed example¶
from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec
emb = get_embedding(
"agrifm",
spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=2048),
temporal=TemporalSpec.range("2022-01-01", "2022-12-31"),
output=OutputSpec.pooled(),
backend="gee",
)
Example temporal packaging and normalization tuning¶
# Example (shell):
export RS_EMBED_AGRIFM_FRAMES=8
export RS_EMBED_AGRIFM_IMG=224
export RS_EMBED_AGRIFM_NORM=agrifm_stats
Paper & Links¶
- Publication: RSE 2026
- Code: flyakon/AgriFM
Reference¶
- Single-frame
CHWinput is silently repeated toTidentical frames — this runs but does not represent real temporal variation. - Output is a Video-Swin feature-map grid, not ViT patch tokens — the spatial dimensions are backbone-dependent, not
image_size / patch_size. - The default normalization (
agrifm_stats) uses crop-oriented statistics;unit_scalemay be more appropriate for non-agricultural scenes.