API: Export¶
This page covers dataset export APIs.
If you only remember one function, remember export_batch(...).
Related pages:
export_batch (primary / recommended)¶
export_batch(
*,
spatials: List[SpatialSpec],
temporal: Optional[TemporalSpec],
models: List[str | ExportModelRequest],
target: ExportTarget,
config: ExportConfig,
backend: str = "auto",
device: str = "auto",
output: OutputSpec = OutputSpec.pooled(),
sensor: Optional[SensorSpec] = None,
fetch: Optional[FetchSpec] = None,
modality: Optional[str] = None,
) -> Any
Use export_batch(...) when you want to export:
- one or many ROIs
- one or many models
- inputs, embeddings, and manifests together
Although the public function still exposes many keyword arguments, the actual implementation first normalizes requests into:
ExportTargetExportConfigExportModelRequestentries
That is the real shape of the API internally, and it is the shape new code should follow.
Mental Model¶
Think about export_batch(...) as 4 decisions:
- What to export:
spatials,temporal,models - Where to write:
target=ExportTarget(...) - How to run:
config=ExportConfig(...) - Any shared model settings:
backend,sensor,modality,output - in most public usage, prefer
fetchoversensor
For new code, prefer the object-style API:
target=ExportTarget(...)config=ExportConfig(...)models=[..., ExportModelRequest(...)]only when one model needs special overrides
Start Here¶
If you are not sure what to pass, this is the default pattern:
from rs_embed import export_batch, ExportConfig, ExportTarget, PointBuffer, TemporalSpec
export_batch(
spatials=[PointBuffer(121.5, 31.2, 2048)],
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=["remoteclip"],
target=ExportTarget.combined("exports/run"),
config=ExportConfig(),
)
That gives you:
- one combined export artifact
- default
.npzformat - inputs + embeddings + manifest
- default runtime behavior
Parameters, Grouped by Job¶
1. Required dataset definition¶
These are the inputs most users always set:
spatials: non-empty list ofBBoxorPointBuffertemporal:TemporalSpecorNonemodels: non-empty list of model IDs orExportModelRequest(...)
2. Output location and layout¶
Prefer target=ExportTarget(...) in new code.
from rs_embed import ExportTarget
ExportTarget.per_item("exports", names=["p1", "p2"])
ExportTarget.combined("exports/run")
ExportTarget.per_item(...): one file per ROIExportTarget.combined(...): one merged file for the whole run
3. Shared model/runtime settings¶
These usually apply to all models in the call:
backend: keepbackend="auto"unless you need a specific provider such as"gee"device:"auto"is the normal choiceoutput: usuallyOutputSpec.pooled()fetch: sharedFetchSpecfor resolution/compositing overridessensor: sharedSensorSpecfor advanced on-the-fly source overridesmodality: shared modality override for models that expose multiple public branches
Use per-model overrides only when one model needs different settings.
Recommended rule:
- use
fetch=FetchSpec(...)for shared resolution/composite overrides - use
sensor=SensorSpec(...)only when one job really needs customcollection/bands fetchandsensorcannot be passed together
4. ExportConfig: the knobs that matter most¶
config=ExportConfig(...) is the recommended place for runtime settings.
In the implementation, flat config keywords are folded into an ExportConfig object anyway, so new code should construct that object directly.
The most important ones are:
format:"npz"or"netcdf"save_inputs: save model-ready input patchessave_embeddings: save embedding arrayssave_manifest: save JSON manifest metadataresume: skip items already exportedinput_prep: large-ROI policy, usually"resize"or"tile"
You can usually ignore the rest until you need performance tuning or failure recovery.
5. Advanced runtime controls¶
These matter mainly for larger runs:
chunk_size: how many ROIs to process at a timeinfer_batch_size: batch size for models that implement batch inferencenum_workers: provider fetch concurrencycontinue_on_error: keep going if one item/model failsmax_retries,retry_backoff_s: retry policyasync_write,writer_workers: asynchronous writing in per-item modeshow_progress: enable progress displayfail_on_bad_input: fail immediately on invalid inputs
If you do not know what these mean, leave them at defaults.
Example:
from rs_embed import ExportConfig
config = ExportConfig(
format="npz",
save_inputs=True,
save_embeddings=True,
save_manifest=True,
resume=True,
input_prep="resize",
)
Per-Model Overrides¶
Most runs should pass plain model IDs:
models=["remoteclip", "prithvi"]
Use ExportModelRequest(...) only when a specific model needs its own fetch, sensor, or modality:
from rs_embed import ExportModelRequest, FetchSpec
models=[
"remoteclip",
ExportModelRequest("prithvi", fetch=FetchSpec(scale_m=30)),
]
ExportModelRequest(...) also carries per-model model_config, for example:
from rs_embed import ExportModelRequest
models=[
"remoteclip",
ExportModelRequest("thor", model_config={"variant": "large"}),
]
Typical use cases:
- one model needs a different
FetchSpec - one model needs
modality="s1" - one model needs a different
SensorSpec - one model needs a different
model_configsuch as{"variant": "large"} - one model should override the shared export settings
This also matches the implementation path: string model IDs are first converted into ExportModelRequest(name=...), then resolved.
Modality rules:
export_batch(...)accepts a globalmodality- one model can override it via
ExportModelRequest(...) - unsupported modality choices raise
ModelError
model_config rules:
export_batch(...)does not have one globalmodel_configshared across all models- pass per-model runtime settings through
ExportModelRequest(..., model_config=...) - unsupported
model_configusage raisesModelError
What Gets Returned¶
ExportTarget.per_item(...): returnsList[dict]ExportTarget.combined(...): returnsdict
In both cases, the return value is manifest-style metadata describing what was exported.
Common Patterns¶
One combined export file¶
from rs_embed import export_batch, ExportConfig, ExportTarget, PointBuffer, TemporalSpec
export_batch(
spatials=[PointBuffer(121.5, 31.2, 2048)],
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=["remoteclip"],
target=ExportTarget.combined("exports/combined_run"),
config=ExportConfig(save_inputs=True, resume=True),
)
One file per ROI¶
from rs_embed import export_batch, ExportConfig, ExportTarget, PointBuffer, TemporalSpec
spatials = [
PointBuffer(121.5, 31.2, 2048),
PointBuffer(120.5, 30.2, 2048),
]
export_batch(
spatials=spatials,
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=["remoteclip", "prithvi"],
target=ExportTarget.per_item("exports", names=["p1", "p2"]),
config=ExportConfig(
input_prep="tile",
chunk_size=32,
num_workers=8,
),
)
One model needs its own modality¶
from rs_embed import (
export_batch,
ExportModelRequest,
ExportTarget,
PointBuffer,
TemporalSpec,
)
export_batch(
spatials=[PointBuffer(121.5, 31.2, 2048)],
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=[ExportModelRequest("terrafm", modality="s1")],
target=ExportTarget.combined("exports/terrafm_s1_run"),
backend="gee",
)
Shared fetch override across models¶
from rs_embed import FetchSpec, export_batch, ExportTarget, PointBuffer, TemporalSpec
export_batch(
spatials=[PointBuffer(121.5, 31.2, 2048)],
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=["remoteclip", "prithvi"],
fetch=FetchSpec(scale_m=10),
target=ExportTarget.combined("exports/shared_sampling"),
)
One model needs its own variant¶
from rs_embed import (
export_batch,
ExportModelRequest,
ExportTarget,
PointBuffer,
TemporalSpec,
)
export_batch(
spatials=[PointBuffer(121.5, 31.2, 2048)],
temporal=TemporalSpec.range("2022-06-01", "2022-09-01"),
models=[ExportModelRequest("thor", model_config={"variant": "large"})],
target=ExportTarget.combined("exports/thor_large_run"),
backend="gee",
)
Runtime Behavior You Usually Need to Know¶
Inference scheduling¶
- model scheduling is serial: one model at a time
- batch inference is used when the embedder supports it
- GPU or accelerator backends benefit the most from batch inference
Per-item vs combined mode¶
per_itemmode writes one artifact per ROIcombinedmode writes one merged artifact for the run- combined mode keeps the older behavior of preferring batch model APIs when possible
Input reuse¶
If provider-backed export is used and both save_inputs=True and save_embeddings=True, rs-embed reuses the fetched input patch for both writing and embedding inference instead of downloading it twice.
Simple rule
Start with ExportTarget.combined(...) + ExportConfig().
Add ExportModelRequest(...) only for the few models that need per-model sensor, modality, or model_config overrides.