Skip to content

Datasets

Dataset abstractions for DEM and landcover data access.

s2gos_generator.dataset.dataset.Dataset pydantic-model

Bases: ABC, BaseModel

Fields:

Attributes

crs pydantic-field

crs: str = 'EPSG:4326'

Coordinate reference system.

name pydantic-field

name: str

Name of the dataset, used for logging.

Functions

open

open(path=None) -> Any

Use this function to open the dataset.

Parameters:

Name Type Description Default
path default=None

specific file to open in the dataset.

None

Returns:

Type Description
Any

The opened dataset. Can be used within a with statement.

query

query(
    polygon: Polygon, ctx: dict | None = None
) -> list[PathLike]

Use this function to query whether data is present within a polygon shape.

Parameters:

Name Type Description Default
polygon Polygon

shape to query against.

required
ctx dict | None

context used to pass additional query information e.g. time.

None

Returns:

Type Description
list[PathLike]

List of paths of data files that have overlapping spatial polygon.

s2gos_generator.dataset.indexed_geotiff.IndexedGeoTiff pydantic-model

Bases: Dataset

Spatially indexed collection of GeoTIFF tiles.

Provides efficient spatial querying over a large archive of GeoTIFF files using a pre-built GeoDataFrame index (stored as a .feather file). Each row in the index describes one tile and includes a path column pointing to the corresponding GeoTIFF relative to root_directory.

Attributes:

Name Type Description
index_path PathRef

Path to the .feather index file.

root_directory PathRef

Root directory under which tile paths in the index are resolved.

path_column str | None

Column name in the index that holds relative tile paths. Auto-detected from any column whose name contains "path" when left as None.

variable_name str | None

Optional variable name used when opening tiles with xarray.

Fields:

  • name (str)
  • crs (str)
  • index_path (PathRef)
  • root_directory (PathRef)
  • path_column (str | None)
  • variable_name (str | None)

Validators:

Functions

open

open(path: UPath | str, **kwargs: Any) -> xr.Dataset

Open a single GeoTIFF tile as an xarray.Dataset.

Parameters:

Name Type Description Default
path UPath | str

Path to the GeoTIFF file to open.

required
**kwargs Any

Additional keyword arguments forwarded to open_dataset (e.g. chunks for Dask).

{}

Returns:

Type Description
Dataset

An xarray.Dataset backed by rasterio.

query

query(polygon: Polygon, **kwargs: Any) -> list[UPath]

Return paths of all GeoTIFF tiles that intersect polygon.

Performs a spatial join between the index GeoDataFrame and the supplied polygon to identify overlapping tiles.

Parameters:

Name Type Description Default
polygon Polygon

Query region in EPSG:4326 coordinates.

required
**kwargs Any

Optional keyword arguments. path_column overrides the instance-level column name for this query.

{}

Returns:

Type Description
list[UPath]

List of authenticated UPath objects for each matching tile.

Raises:

Type Description
ValueError

If no path column can be determined.

FileNotFoundError

If no tiles intersect polygon.

validate_path_exists pydantic-validator

validate_path_exists(v)

Validate that local files or directories exist.

s2gos_generator.dataset.zarr.Zarr pydantic-model

Bases: Dataset

Zarr-backed dataset for cloud-optimised geospatial data.

Wraps a single Zarr store (local or remote) and provides spatial querying by comparing the polygon extent against the dataset's bounding box. The entire store is returned when it intersects the query polygon.

Attributes:

Name Type Description
path PathRef

Path or URL to the Zarr store.

variable_name str | None

Optional variable name used when slicing the opened dataset.

Fields:

  • name (str)
  • crs (str)
  • path (PathRef)
  • variable_name (str | None)

Validators:

Functions

open

open(
    path: PathRef | None = None, **kwargs: Any
) -> xr.Dataset

Open the Zarr store as an xarray.Dataset.

The path argument is accepted for interface compatibility but ignored; the store is always opened from self.path.

Parameters:

Name Type Description Default
path PathRef | None

Unused. Present for compatibility with the Dataset base class interface.

None
**kwargs Any

Additional keyword arguments forwarded to open_dataset (e.g. chunks for Dask lazy loading).

{}

Returns:

Type Description
Dataset

An xarray.Dataset backed by the Zarr engine.

query

query(polygon: Polygon, **kwargs: Any) -> list[PathRef]

Return the store path if its spatial extent intersects polygon.

Opens the Zarr store to read coordinate bounds, then checks whether the dataset bounding box overlaps the supplied polygon. Supports datasets with (x, y) or (lon, lat) coordinate dimensions.

Parameters:

Name Type Description Default
polygon Polygon

Query region in EPSG:4326 coordinates.

required
**kwargs Any

Accepted but unused; present for interface compatibility.

{}

Returns:

Type Description
list[PathRef]

A single-element list [self.path] when there is spatial

list[PathRef]

overlap, or an empty list when there is none.

validate_path_exists pydantic-validator

validate_path_exists(v)

Validate that local files or directories exist.