API Reference

Python Brazil Data Cube Collection Builder.

Command Line

bdc_collection_builder.cli.load_providers(*args: Any, **kwargs: Any) Any

Command line to load providers JSON into database.

Note

Make sure you have exported variable SQLALCHEMY_DATABASE_URI before like SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost/bdc.

Note

It skips provider that already exists. You must give at least --ifile or --from_dir parameter.

To load a single JSON file, use parameter -i or verbose --ifile path/to/json:

bdc-collection-builder load-provider --ifile examples/data/providers/nasa-usgs.json -v

The following output will be displayed:

Provider USGS created

If you would like to read a directory containing several JSON collection files:

bdc-collection-builder load-provider --from-dir examples/data/providers
Parameters:
  • ifile (str) – Path to JSON file. Default is None.

  • from_dir (str) – Readable directory containing JSON files. Defaults to None.

  • update (bool) – Update entries if data already exists. Defaults to False.

  • verbose (bool) – Display verbose output. Defaults to False.

bdc_collection_builder.cli.set_provider(*args: Any, **kwargs: Any) Any
bdc_collection_builder.cli.overview(*args: Any, **kwargs: Any) Any

Describe information for Collection, which includes the data collect order by default.

Collect & Processing

bdc_collection_builder.collections.collect.get_provider_order(collection: Any, include_inactive=False, **kwargs) List[DataCollector]

Retrieve a list of providers which the bdc_catalog.models.Collection is associated.

Note

This method requires the initialization of extension bdc_catalog.ext.BDCCatalog.

With a given collection, it seeks in ProviderSetting and CollectionsProvidersSetting association and then look for provider supported in the entry point bdc_collectors.providers.

Parameters:
  • bdc_catalog.models.Collection (collection - An instance of)

  • Default=False (include_inactive - List also the inactive providers.)

  • instance. (**kwargs - Extra parameters to pass to the Provider)

Returns:

A list of DataCollector, ordered by priority.

bdc_collection_builder.collections.processor.sen2cor(scene_id: str, input_dir: str, output_dir: str, docker_container_work_dir: list, version: str | None = None, timeout=None, **env)

Execute Sen2Cor data processor using Docker images.

Note

Make sure you have exported the variables SEN2COR_AUX_DIR, SEN2COR_DOCKER_IMAGE, and SEN2COR_DIR properly.

This method calls the processor Sen2Cor and generate the Surface Reflectance products. Once the required variables is set, it tries to execute Sen2Cor from the given versions: ‘2.10.0’, ‘2.8.0’, ‘2.5.5’.

Parameters:
  • scene_id (str) – The Scene Identifier (Item id)

  • input_dir (str) – Base input directory of scene id.

  • output_dir (str) – Path where Surface reflectance product will be generated.

  • docker_container_work_dir (str) – Base directory list of workdir for docker.

  • version (str) – Sen2Cor version to execute. Remember that you must exist the version in docker registry. Defaults is None, which automatically tries the versions ‘2.10.0’, ‘2.8.0’, ‘2.5.5’, respectively.

  • timeout (int) – Timeout for Sen2Cor exec. Defaults to SEN2COR_TIMEOUT.

Keyword Arguments:

any – Custom Environment variables, use Python spread kwargs.

Module to publish an collection item on database.

bdc_collection_builder.celery.publish.compress_raster(input_path: str, output_path: str, algorithm: str = 'deflate')

Compress a raster using GDAL compression algorithm.

bdc_collection_builder.celery.publish.create_quick_look(file_output, red_file, green_file, blue_file, rows=768, cols=768, no_data=-9999)

Generate a Quick Look file (PNG based) from a list of files.

Note

The file order in files represents the bands Red, Green and Blue, respectively.

Exceptions:

RasterIOError when could not open a raster file band

Parameters:
  • file_output – Path to store the quicklook file.

  • red_file – Path to the band attached into red channel.

  • green_file – Path to the band attached into green channel.

  • blue_file – Path to the band attached into blue channel.

  • rows – Image height. Default is 768.

  • cols – Image width. Default is 768.

  • no_data – Use custom value for nodata.

bdc_collection_builder.celery.publish.generate_quicklook_pvi(safe_folder: Path, quicklook: Path)

Generate QuickLook preview from a Sentinel-2 PVI file.

bdc_collection_builder.celery.publish.get_footprint_sentinel(mtd_file: str) Polygon

Get image footprint from a Sentinel-2 MTD file.

bdc_collection_builder.celery.publish.get_item_path(relative: str) str

Retrieve the Item absolute path from published asset.

bdc_collection_builder.celery.publish.guess_mime_type(extension: str, cog=False) str | None

Try to identify file mimetype.

bdc_collection_builder.celery.publish.publish_collection_item(scene_id: str, data: BaseCollection, collection: Collection, file: str, cloud_cover=None, provider_id: int | None = None, scene_meta=None, **kwargs) Item

Generate the Cloud Optimized Files for Image Collection and publish meta information in database.

Notes

This method relies on bdc_collectors.base.BaseCollection definition.

Raises:
  • NotFound When tile information not found in database.

  • Exception When could not generate Cloud Optimized File.

Parameters:
  • reference (scene_id - Scene id)

  • structure (data - Provider collection)

  • scope (collection - Current collection)

  • seek (file - Path to)

Returns:

The created collection item.

Band Index Generator

Module to generate collection bands dynamically using bdc.bands.metadata property.

class bdc_collection_builder.collections.index_generator.AutoCloseDataSet(file_path: str, mode='r', **options)

Class to wraps the rasterio.io.Dataset to auto close data set out of scope.

close()

Try to close a data set.

bdc_collection_builder.collections.index_generator.BandMapFile

Type which a key (represented as collection band name) points to generated file in disk.

alias of Dict[str, str]

bdc_collection_builder.collections.index_generator.generate_band_indexes(scene_id: str, collection: Collection, scenes: dict) Dict[str, str]

Generate Collection custom bands based in string-expression on table band_indexes.

This method seeks for custom bands on Collection Band definition. A custom band must have metadata property filled out according the bdc_catalog.jsonschemas.band-metadata.json.

Notes

When collection does not have any index band, returns empty dict.

Raises:

RuntimeError when an error occurs while interpreting the band expression in Python Virtual Machine.

Returns:

A dict values with generated bands.

Utils

Define the utilities to execute string expressions in Python Interpreter.

bdc_collection_builder.interpreter.execute_expression(expression: str, context: dict) Dict[str, Any]

Evaluate a string expression as Python object and execute in Python Interpreter.

This method allows to execute dynamic expression into a Python Virtual Machine. With this, you can generate custom bands based in user-defined values. The context defines the scope of which values will be available by default.

TODO: Ensure that non-exported variables (context) can’t be executed like os to avoid internal issues.

Parameters:
  • expression – String-like python expression

  • context – Context loaded variables

Examples

>>> import numpy
... # Evaluate half of coastal band using numpy.random
>>> res = execute_expression('coastalHalf = B1 / 2', context=dict(B1=numpy.random.rand(1000, 1000) * 10000))
>>> res['coastalHalf']

Notes

You can set loaded variables in context and it will be available during code execution.

Returns:

Map of context values loaded in memory.

Define Brazil Data Cube utils.

bdc_collection_builder.collections.utils.extract_and_get_internal_name(zip_file_name, extract_to=None)

Extract zipfile and return internal folder path.

bdc_collection_builder.collections.utils.extractall(file, destination=None)

Extract zipfile.

bdc_collection_builder.collections.utils.generate_cogs(input_data_set_path, file_path, profile='deflate', profile_options=None, **options)

Generate Cloud Optimized GeoTIFF files (COG).

Example

>>> tif_file = '/path/to/tif'
>>> generate_cogs(tif_file, '/tmp/cog.tif')
Parameters:
  • input_data_set_path (str)

  • file_path (str)

  • profile (str)

  • profile_options (dict)

Returns:

Path to COG.

bdc_collection_builder.collections.utils.get_collector_ext() CollectorExtension

Retrieve the loaded collector extension (BDC-Collectors).

bdc_collection_builder.collections.utils.get_credentials()

Retrieve global secrets with credentials.

bdc_collection_builder.collections.utils.get_epsg_srid(file_path: str) int

Get the Authority Code from a data set path.

Note

This function depends GDAL.

When no code found, returns None.

bdc_collection_builder.collections.utils.get_or_create_model(model_class, defaults=None, engine=None, **restrictions)

Get or create Brazil Data Cube model.

Utility method for looking up an object with the given restrictions, creating one if necessary.

Parameters:
  • model_class (BaseModel)

  • defaults (dict)

  • restrictions (dict)

Returns:

BaseModel Retrieves model instance

bdc_collection_builder.collections.utils.get_provider(catalog, **kwargs) Tuple[ProviderSetting, BaseProvider]

Retrieve ProviderSetting related with bdc_catalog.models.Provider.

bdc_collection_builder.collections.utils.get_provider_type(catalog: str)

Retrieve the driver for Data Collector.

Seek in bdc-collectors app for the driver type for catalog representation.

bdc_collection_builder.collections.utils.is_sen2cor(collection: Collection) bool

Check if the given collection is a Sen2cor product.

bdc_collection_builder.collections.utils.is_valid_compressed(file)

Check tar gz or zip is valid.

bdc_collection_builder.collections.utils.is_valid_compressed_file(file_path: str) bool

Check if given file is a compressed file and hen check file integrity.

bdc_collection_builder.collections.utils.is_valid_tar(file_path: str) bool

Check file integrity of a tar file.

bdc_collection_builder.collections.utils.is_valid_tar_gz(file_path: str)

Check tar file integrity.

bdc_collection_builder.collections.utils.post_processing(quality_file_path: str, collection: Collection, scenes: dict, resample_to=None)

Stack the merge bands in order to apply a filter on the quality band.

We have faced some issues regarding nodata value in spectral bands, which was resulting in wrong provenance date on STACK data cubes, since the Fmask tells the pixel is valid (0) but a nodata value is found in other bands. To avoid that, we read all the others bands, seeking for nodata value. When found, we set this to nodata in Fmask output:

Quality             Nir                   Quality

0 0 2 4      702  876 7000 9000      =>    0 0 2 4
0 0 0 0      687  987 1022 1029      =>    0 0 0 0
0 2 2 4    -9999 7100 7322 9564      =>  255 2 2 4

Notes

It may take too long to execute for a large grid.

Parameters:
  • quality_file_path – Path to the cloud masking file.

  • collection – The collection instance.

  • scenes – Map of band and file path

  • resample_to – Resolution to re-sample. Default is None, which uses default value.

bdc_collection_builder.collections.utils.raster_convexhull(file_path: str, epsg='EPSG:4326', no_data=None) dict

Get raster image footprint.

Parameters:
  • file_path (str) – image file

  • epsg (str) – geometry EPSG

  • no_data – Use custom no data value. Default is dataset.nodata

See:

https://rasterio.readthedocs.io/en/latest/topics/masks.html

bdc_collection_builder.collections.utils.raster_extent(file_path: str, epsg='EPSG:4326') Polygon

Get raster extent in arbitrary CRS.

Parameters:
  • file_path (str) – Path to image

  • epsg (str) – EPSG Code of result crs

Returns:

geojson-like geometry

Return type:

dict

bdc_collection_builder.collections.utils.remove_file(file_path: str)

Remove file if exists.

Throws Error when user doesn’t have access to the file at given path

bdc_collection_builder.collections.utils.safe_request()

Define a decorator to disable any SSL Certificate Validation while requesting data.

This snippet was adapted from https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests.

bdc_collection_builder.collections.utils.save_as_cog(destination: str, raster, mode='w', **profile)

Save the raster file as Cloud Optimized GeoTIFF.

See also

Cloud Optimized GeoTiff https://gdal.org/drivers/raster/cog.html

Parameters:
  • destination – Path to store the data set.

  • raster – Numpy raster values to persist in disk

  • mode – Default rasterio mode. Default is ‘w’ but you also can set ‘r+’.

  • **profile – Rasterio profile values to add in dataset.

bdc_collection_builder.collections.utils.upload_file(file_name, bucket='bdc-ds-datacube', object_name=None)

Upload a file to an S3 bucket.

Adapted code from boto3 example.

Parameters:
  • file_name (str|_io.TextIO) – File to upload

  • bucket (str) – Bucket to upload to

  • object_name (str) – S3 object name. If not specified then file_name is used