API Specs & Data Structures¶

This page documents the core spec/data types used across the public API.

For task-oriented usage, see Common Workflows. For exact embedding/export/inspect functions, see:

Spatial Temporal Output

Start with SpatialSpec (BBox or PointBuffer) to define the ROI.

Use TemporalSpec.year(...) for precomputed/year-indexed products and TemporalSpec.range(...) for provider/on-the-fly fetch windows.

Use OutputSpec.pooled() first unless you specifically need spatial structure (grid).

Data Structures¶

SpatialSpec¶

SpatialSpec describes the spatial region for which you want to extract an embedding.

`BBox`¶

BBox(minlon: float, minlat: float, maxlon: float, maxlat: float, crs: str = "EPSG:4326")

An EPSG:4326 lat/lon bounding box (the current version supports only EPSG:4326).
validate() checks that bounds are valid.

`PointBuffer`¶

PointBuffer(lon: float, lat: float, buffer_m: float, crs: str = "EPSG:4326")

A buffer centered at a point, measured in meters (a square ROI; internally projected into the coordinate system required by the provider).
Requires buffer_m > 0.

TemporalSpec¶

TemporalSpec describes the time range (by year or by date range).

TemporalSpec(mode: Literal["year", "range"], year: int | None, start: str | None, end: str | None)

Recommended constructors:

TemporalSpec.year(2022)
TemporalSpec.range("2022-06-01", "2022-09-01")

Temporal range is a window

TemporalSpec.range(start, end) is treated as a half-open interval [start, end), so end is excluded.

Temporal semantics in provider/on-the-fly paths:

TemporalSpec.range(start, end) is interpreted as a half-open window [start, end), where end is excluded.
In GEE-backed on-the-fly fetch, range is used to filter an image collection over the full window, then apply a compositing reducer (default median, optional mosaic).
So the fetched input is usually a composite over the whole time window, not an automatically selected single-day scene.
To approximate a single-day query, pass a one-day window such as TemporalSpec.range("2022-06-01", "2022-06-02").

About input_time in metadata:

Many embedders store meta["input_time"] as the midpoint date of the temporal window.
This midpoint is metadata (and for some models, an auxiliary time signal), not evidence that imagery was fetched from exactly that single date.

Common gotcha

input_time often looks like a single date, but the actual provider fetch may still be a composite over the full temporal window.

SensorSpec¶

SensorSpec is mainly for on-the-fly models (fetch a patch from GEE online and feed it into the model). It specifies which collection to pull from, which bands, and what resolution/compositing strategy to use.

SensorSpec(
    collection: str,
    bands: Tuple[str, ...],
    scale_m: int = 10,
    cloudy_pct: int = 30,
    fill_value: float = 0.0,
    composite: Literal["median", "mosaic"] = "median",
    modality: Optional[str] = None,
    orbit: Optional[str] = None,
    use_float_linear: bool = True,
    check_input: bool = False,
    check_raise: bool = True,
    check_save_dir: Optional[str] = None,
)

collection: GEE collection or image ID
bands: band names (tuple)
scale_m: sampling resolution (meters)
cloudy_pct: cloud filter (best-effort; depends on collection properties)
fill_value: no-data fill value
composite: image compositing method over the temporal window (median/mosaic)
modality: optional model-facing modality selector used by models with multiple input branches
orbit: optional orbit/pass filter for sensor families that support it
use_float_linear: selects linear-scale floating-point products when a sensor family offers both linear and dB variants
check_*: optional input checks and quicklook saving (see inspect_gee_patch)

Note

For precomputed models (e.g., directly reading offline embedding products), sensor is usually ignored or set to None.

Note

Public embedding/export APIs also accept a top-level modality=... convenience argument. When provided, rs-embed resolves it into the model's sensor/input contract and validates that the model explicitly supports that modality.

FetchSpec¶

FetchSpec is the lightweight public override for sampling / fetch policy. Use it when you want to change common knobs such as resolution or compositing, but do not want to define a full SensorSpec.

FetchSpec(
    scale_m: int | None = None,
    cloudy_pct: int | None = None,
    fill_value: float | None = None,
    composite: Literal["median", "mosaic"] | None = None,
)

scale_m: sampling resolution override
cloudy_pct: cloud filter override
fill_value: no-data fill override
composite: temporal compositing override

Recommended rule:

use fetch=FetchSpec(...) for normal public API usage
use sensor=SensorSpec(...) only when you need advanced control over collection, bands, modality, or similar source-level details

Important constraints:

fetch and sensor cannot be passed together in the same request
fetch is applied on top of the model's default sensor contract
some models use scale_m as more than fetch resolution
for example, scalemae uses it as semantic scale conditioning
anysat uses it as both fetch resolution and patch-size control

Example:

from rs_embed import FetchSpec, get_embedding

emb = get_embedding(
    "prithvi",
    spatial=...,
    temporal=...,
    fetch=FetchSpec(scale_m=10, cloudy_pct=10),
)

OutputSpec¶

OutputSpec controls the embedding output shape: a pooled vector or a dense grid.

OutputSpec(
    mode: Literal["grid", "pooled"],
    pooling: Literal["mean", "max"] = "mean",
    grid_orientation: Literal["north_up", "native"] = "north_up",
)

Recommended constructors:

Pooled (default) Grid (spatial)

OutputSpec.pooled(pooling="mean")   # shape: (D,)

OutputSpec.grid()         # shape: (D, H, W), normalized to north-up when possible
OutputSpec.grid(grid_orientation="native")  # keep model/provider native orientation

Sampling resolution is no longer configured on OutputSpec. Use fetch=FetchSpec(scale_m=...) for resolution overrides.

`pooled`¶

ROI-level Vector Embedding

Semantic meaning

pooled represents one whole ROI (Region of Interest) using a single vector (D,).

Best suited for:

Classification / regression
Retrieval / similarity search
Clustering
Cross-model comparison (recommended)

Unified output format:

Embedding.data.shape == (D,)

How it is produced:

ViT / MAE-style models (e.g., RemoteCLIP / Prithvi / SatMAE / ScaleMAE):

Native output is patch tokens (N, D) (with optional CLS token)
Remove CLS token if present, then pool tokens across the token axis (mean by default, optional max)

Mean-pooling formula:

\[ v_d = \frac{1}{N'} \sum_{i=1}^{N'} t_{i,d} \]

Precomputed embeddings (e.g., Tessera / GSE / Copernicus):

Native output is an embedding grid (D, H, W)
Pool over spatial dimensions (H, W)

\[ v_d = \frac{1}{HW} \sum_{y,x} g_{d,y,x} \]

Why prefer pooled for benchmarks:

Model-agnostic and stable
Less sensitive to spatial/token layout differences
Easiest output to compare across models

`grid`¶

ROI-level Spatial Embedding Field

Semantic meaning

grid returns a spatial embedding field (D, H, W), where each spatial location maps to a vector.

Best suited for:

Spatial visualization (PCA / norm / similarity maps)
Pixel-wise / patch-wise tasks
Intra-ROI structure analysis

Unified output format:

Embedding.data.shape == (D, H, W)

Notes:

data can be returned as xarray.DataArray with metadata in meta/attrs
For precomputed geospatial products, metadata may include CRS/crop context
For ViT token grids, this is usually patch-grid metadata (not georeferenced pixel coordinates)

How it is produced:

ViT / MAE-style models:

Native output: tokens (N, D)
Remove CLS token if present, reshape remaining tokens:
(N', D) -> (H, W, D) -> (D, H, W)
(H, W) comes from patch layout (for example, 8x8, 14x14)

Precomputed embeddings:

Native output is already (D, H, W)

InputPrepSpec¶

Optional Large-ROI Input Policy

InputPrepSpec controls API-level handling of large on-the-fly inputs before model inference. This is mainly useful when you want to choose between the model's normal resize path and API-side tiled inference.

InputPrepSpec(
    mode: Literal["resize", "tile", "auto"] = "resize",
    tile_size: Optional[int] = None,
    tile_stride: Optional[int] = None,
    max_tiles: int = 9,
    pad_edges: bool = True,
)

Recommended constructors:

InputPrepSpec.resize()               # default behavior (fastest)
InputPrepSpec.tile()                 # tile size inferred from model defaults.image_size when available
InputPrepSpec.auto(max_tiles=4)      # choose tile or resize automatically
InputPrepSpec.tile(tile_size=224)    # explicit tile size override

You can also pass a string to public APIs as a shorthand:

input_prep="resize"   # default
input_prep="tile"
input_prep="auto"

Current tiled design (API layer):

Tile size defaults to embedder.describe()["defaults"]["image_size"] when available (can be overridden).
Boundary tiles use a cover-shift layout (for example 300 -> [0,224] and [76,300]) to avoid edge padding when possible.
Grid stitching uses midpoint-cut ownership in overlap regions (instead of hard overwrite).
tile_stride currently must equal tile_size (explicit overlap/gap configuration is not enabled yet), but boundary shifting can still create overlap on the last tile.
auto is conservative and currently prefers tiling mainly for OutputSpec.grid() when tile count is small enough (max_tiles).

tiles

ExportTarget / ExportConfig / ExportModelRequest¶

export_batch(...) now uses small public request objects so large export jobs do not need dozens of top-level keywords.

ExportTarget.combined("exports/run")
ExportTarget.per_item("exports/items", names=["p1", "p2"])

ExportConfig(
    save_inputs=True,
    save_embeddings=True,
    chunk_size=32,
    num_workers=8,
    resume=True,
)

ExportModelRequest("remoteclip")
ExportModelRequest("terrafm", modality="s1", sensor=my_s1_sensor)
ExportModelRequest("thor", model_config={"variant": "large"})

ExportTarget: where outputs should be written
ExportConfig: how the export should run
ExportModelRequest: optional per-model overrides when one export job mixes different model-specific settings such as sensor, modality, or model_config

Legacy out + layout, out_dir / out_path, and per-model dict overrides are still accepted for backward compatibility.

Embedding¶

get_embedding / get_embeddings_batch return an Embedding:

from rs_embed.core.embedding import Embedding

Embedding(
    data: np.ndarray | xarray.DataArray,
    meta: Dict[str, Any],
)

data: the embedding data (float32, vector or grid)
meta: includes model info, input info (optional), and export/check reports, etc.

API Specs & Data Structures¶

Data Structures¶

SpatialSpec¶

BBox¶

PointBuffer¶

TemporalSpec¶

SensorSpec¶

FetchSpec¶

OutputSpec¶

pooled¶

grid¶

InputPrepSpec¶

ExportTarget / ExportConfig / ExportModelRequest¶

Embedding¶

`BBox`¶

`PointBuffer`¶

`pooled`¶

`grid`¶