Copernicus Embed (copernicus_embed)¶
Precomputed embedding adapter using
torchgeo.datasets.CopernicusEmbed, with bbox slicing and optional bbox expansion to improve tile overlap on small ROIs.
Quick Facts¶
| Field | Value |
|---|---|
| Model ID | copernicus_embed |
| Family / Source | TorchGeo CopernicusEmbed dataset |
| Adapter type | precomputed |
| Typical backend | auto |
| Primary input | BBox / PointBuffer in EPSG:4326, sliced via TorchGeo dataset bbox indexing |
| Temporal mode | strict TemporalSpec.year(2021) in v0.1 |
| Output modes | pooled, grid |
| Extra side inputs | none |
| Training alignment (adapter path) | N/A (precomputed product) |
When To Use This Model¶
Good fit for¶
- precomputed embedding workflows via TorchGeo
- quick spatial baseline features without provider requests
- experiments where coarse precomputed coverage is acceptable
Be careful when¶
- requesting years other than
2021(unsupported in current adapter) - assuming exact ROI slicing without expansion (adapter expands bbox by default)
- using non-auto backends (
copernicus_embedcurrently expectsbackend="auto")
Input Contract (Current Adapter Path)¶
Spatial¶
Accepted SpatialSpec:
BBoxPointBuffer(converted to EPSG:4326 bbox)
The adapter internally slices CopernicusEmbed with bbox indexing:
ds[minlon:maxlon, minlat:maxlat]
Temporal¶
- requires
TemporalSpec.year(...) - current adapter supports only
2021 - adapter validates the year before dataset access
Backend / data directory¶
- backend should be
auto(legacylocalis accepted for compatibility) - data directory resolution:
RS_EMBED_COP_DIR(defaultdata/copernicus_embed)- optional per-call override via
sensor.collection="dir:/path/to/copernicus_embed"
Retrieval Pipeline (Current rs-embed Path)¶
- Validate
TemporalSpec.year(...)and supported year (2021) - Resolve
data_dir(env orsensor.collectionoverride) - Load/cache TorchGeo
CopernicusEmbeddataset (download=Truein current adapter) - Convert
SpatialSpecto EPSG:4326 bbox - Expand bbox by fixed
expand_deg=1.0(centered) to increase tile overlap chance for small ROIs - Slice dataset with bbox indexing and get
sample["image"](CHW) - Return pooled vector or grid
Notes:
temporalis validated but metadata in current adapter is built withtemporal=None; record the requested year externally if strict provenance matters.
Environment Variables / Tuning Knobs¶
| Env var | Default | Effect |
|---|---|---|
RS_EMBED_COP_DIR |
data/copernicus_embed |
Local TorchGeo CopernicusEmbed data directory |
RS_EMBED_COPERNICUS_BATCH_WORKERS |
4 |
Batch worker count for get_embeddings_batch(...) |
Non-env override:
sensor.collection="dir:/path/to/copernicus_embed"overrides data directory per call
Current fixed adapter behavior (not env-configurable in v0.1):
download=Trueexpand_deg=1.0
Output Semantics¶
OutputSpec.pooled()¶
- Pools
CHWembedding grid over spatial dims: mean->mean_hwmax->max_hw
OutputSpec.grid()¶
- Returns TorchGeo sample embedding tensor as
xarray.DataArray(D,H,W) - Grid is precomputed product space (dataset slice), not raw imagery pixels
Examples¶
Minimal example¶
from rs_embed import get_embedding, PointBuffer, TemporalSpec, OutputSpec
emb = get_embedding(
"copernicus_embed",
spatial=PointBuffer(lon=121.5, lat=31.2, buffer_m=5000),
temporal=TemporalSpec.year(2021),
output=OutputSpec.pooled(),
backend="auto",
)
Local dataset directory override¶
# Example (shell):
# export RS_EMBED_COP_DIR=/data/copernicus_embed
Common Failure Modes / Debugging¶
- year not supported (
2021only in current adapter) - backend is not
auto - missing
torchgeodependency - dataset files missing/corrupt under
RS_EMBED_COP_DIR - small ROI misses coverage even after expansion (returns dataset slicing issues)
Recommended first checks:
- confirm
TemporalSpec.year(2021) - inspect metadata
data_dir,expand_deg,chw_shape,bbox_4326 - test a larger ROI if coverage seems empty
Reproducibility Notes¶
Keep fixed and record:
- dataset path/version snapshot
- requested year (must be
2021) - ROI geometry
- output mode / pooling choice
Source of Truth (Code Pointers)¶
- Registration/catalog:
src/rs_embed/embedders/catalog.py - Adapter implementation:
src/rs_embed/embedders/precomputed_copernicus_embed.py