API Reference
Python Brazil Data Cube Collection Builder.
Command Line
- bdc_collection_builder.cli.load_providers(*args: Any, **kwargs: Any) Any
Command line to load providers JSON into database.
Note
Make sure you have exported variable
SQLALCHEMY_DATABASE_URIbefore likeSQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost/bdc.Note
It skips provider that already exists. You must give at least
--ifileor--from_dirparameter.To load a single JSON file, use parameter
-ior verbose--ifile path/to/json:bdc-collection-builder load-provider --ifile examples/data/providers/nasa-usgs.json -v
The following output will be displayed:
Provider USGS created
If you would like to read a directory containing several JSON collection files:
bdc-collection-builder load-provider --from-dir examples/data/providers
- Parameters:
ifile (str) – Path to JSON file. Default is
None.from_dir (str) – Readable directory containing JSON files. Defaults to
None.update (bool) – Update entries if data already exists. Defaults to
False.verbose (bool) – Display verbose output. Defaults to
False.
- bdc_collection_builder.cli.set_provider(*args: Any, **kwargs: Any) Any
- bdc_collection_builder.cli.overview(*args: Any, **kwargs: Any) Any
Describe information for Collection, which includes the data collect order by default.
Collect & Processing
- bdc_collection_builder.collections.collect.get_provider_order(collection: Any, include_inactive=False, **kwargs) List[DataCollector]
Retrieve a list of providers which the bdc_catalog.models.Collection is associated.
Note
This method requires the initialization of extension bdc_catalog.ext.BDCCatalog.
With a given collection, it seeks in ProviderSetting and CollectionsProvidersSetting association and then look for provider supported in the entry point bdc_collectors.providers.
- Parameters:
bdc_catalog.models.Collection (collection - An instance of)
Default=False (include_inactive - List also the inactive providers.)
instance. (**kwargs - Extra parameters to pass to the Provider)
- Returns:
A list of DataCollector, ordered by priority.
- bdc_collection_builder.collections.processor.sen2cor(scene_id: str, input_dir: str, output_dir: str, docker_container_work_dir: list, version: str | None = None, timeout=None, **env)
Execute Sen2Cor data processor using Docker images.
Note
Make sure you have exported the variables
SEN2COR_AUX_DIR,SEN2COR_DOCKER_IMAGE, andSEN2COR_DIRproperly.This method calls the processor
Sen2Corand generate theSurface Reflectanceproducts. Once the required variables is set, it tries to execute Sen2Cor from the given versions: ‘2.10.0’, ‘2.8.0’, ‘2.5.5’.- Parameters:
scene_id (str) – The Scene Identifier (Item id)
input_dir (str) – Base input directory of scene id.
output_dir (str) – Path where Surface reflectance product will be generated.
docker_container_work_dir (str) – Base directory list of workdir for docker.
version (str) – Sen2Cor version to execute. Remember that you must exist the version in docker registry. Defaults is
None, which automatically tries the versions ‘2.10.0’, ‘2.8.0’, ‘2.5.5’, respectively.timeout (int) – Timeout for Sen2Cor exec. Defaults to
SEN2COR_TIMEOUT.
- Keyword Arguments:
any – Custom Environment variables, use Python spread kwargs.
Module to publish an collection item on database.
- bdc_collection_builder.celery.publish.compress_raster(input_path: str, output_path: str, algorithm: str = 'deflate')
Compress a raster using GDAL compression algorithm.
- bdc_collection_builder.celery.publish.create_quick_look(file_output, red_file, green_file, blue_file, rows=768, cols=768, no_data=-9999)
Generate a Quick Look file (PNG based) from a list of files.
Note
The file order in
filesrepresents the bands Red, Green and Blue, respectively.- Exceptions:
RasterIOError when could not open a raster file band
- Parameters:
file_output – Path to store the quicklook file.
red_file – Path to the band attached into red channel.
green_file – Path to the band attached into green channel.
blue_file – Path to the band attached into blue channel.
rows – Image height. Default is 768.
cols – Image width. Default is 768.
no_data – Use custom value for nodata.
- bdc_collection_builder.celery.publish.generate_quicklook_pvi(safe_folder: Path, quicklook: Path)
Generate QuickLook preview from a Sentinel-2 PVI file.
- bdc_collection_builder.celery.publish.get_footprint_sentinel(mtd_file: str) Polygon
Get image footprint from a Sentinel-2 MTD file.
- bdc_collection_builder.celery.publish.get_item_path(relative: str) str
Retrieve the Item absolute path from published asset.
- bdc_collection_builder.celery.publish.guess_mime_type(extension: str, cog=False) str | None
Try to identify file mimetype.
- bdc_collection_builder.celery.publish.publish_collection_item(scene_id: str, data: BaseCollection, collection: Collection, file: str, cloud_cover=None, provider_id: int | None = None, scene_meta=None, **kwargs) Item
Generate the Cloud Optimized Files for Image Collection and publish meta information in database.
Notes
This method relies on bdc_collectors.base.BaseCollection definition.
- Raises:
NotFound When tile information not found in database. –
Exception When could not generate Cloud Optimized File. –
- Parameters:
reference (scene_id - Scene id)
structure (data - Provider collection)
scope (collection - Current collection)
seek (file - Path to)
- Returns:
The created collection item.
Band Index Generator
Module to generate collection bands dynamically using bdc.bands.metadata property.
- class bdc_collection_builder.collections.index_generator.AutoCloseDataSet(file_path: str, mode='r', **options)
Class to wraps the rasterio.io.Dataset to auto close data set out of scope.
- close()
Try to close a data set.
- bdc_collection_builder.collections.index_generator.BandMapFile
Type which a key (represented as collection band name) points to generated file in disk.
alias of
Dict[str,str]
- bdc_collection_builder.collections.index_generator.generate_band_indexes(scene_id: str, collection: Collection, scenes: dict) Dict[str, str]
Generate Collection custom bands based in string-expression on table band_indexes.
This method seeks for custom bands on Collection Band definition. A custom band must have metadata property filled out according the
bdc_catalog.jsonschemas.band-metadata.json.Notes
When collection does not have any index band, returns empty dict.
- Raises:
RuntimeError when an error occurs while interpreting the band expression in Python Virtual Machine. –
- Returns:
A dict values with generated bands.
Utils
Define the utilities to execute string expressions in Python Interpreter.
- bdc_collection_builder.interpreter.execute_expression(expression: str, context: dict) Dict[str, Any]
Evaluate a string expression as Python object and execute in Python Interpreter.
This method allows to execute dynamic expression into a Python Virtual Machine. With this, you can generate custom bands based in user-defined values. The context defines the scope of which values will be available by default.
TODO: Ensure that non-exported variables (context) can’t be executed like os to avoid internal issues.
- Parameters:
expression – String-like python expression
context – Context loaded variables
Examples
>>> import numpy ... # Evaluate half of coastal band using numpy.random >>> res = execute_expression('coastalHalf = B1 / 2', context=dict(B1=numpy.random.rand(1000, 1000) * 10000)) >>> res['coastalHalf']
Notes
You can set loaded variables in context and it will be available during code execution.
- Returns:
Map of context values loaded in memory.
Define Brazil Data Cube utils.
- bdc_collection_builder.collections.utils.extract_and_get_internal_name(zip_file_name, extract_to=None)
Extract zipfile and return internal folder path.
- bdc_collection_builder.collections.utils.extractall(file, destination=None)
Extract zipfile.
- bdc_collection_builder.collections.utils.generate_cogs(input_data_set_path, file_path, profile='deflate', profile_options=None, **options)
Generate Cloud Optimized GeoTIFF files (COG).
Example
>>> tif_file = '/path/to/tif' >>> generate_cogs(tif_file, '/tmp/cog.tif')
- Parameters:
input_data_set_path (str)
file_path (str)
profile (str)
profile_options (dict)
- Returns:
Path to COG.
- bdc_collection_builder.collections.utils.get_collector_ext() CollectorExtension
Retrieve the loaded collector extension (BDC-Collectors).
- bdc_collection_builder.collections.utils.get_credentials()
Retrieve global secrets with credentials.
- bdc_collection_builder.collections.utils.get_epsg_srid(file_path: str) int
Get the Authority Code from a data set path.
Note
This function depends GDAL.
When no code found, returns None.
- bdc_collection_builder.collections.utils.get_or_create_model(model_class, defaults=None, engine=None, **restrictions)
Get or create Brazil Data Cube model.
Utility method for looking up an object with the given restrictions, creating one if necessary.
- Parameters:
model_class (BaseModel)
defaults (dict)
restrictions (dict)
- Returns:
BaseModel Retrieves model instance
- bdc_collection_builder.collections.utils.get_provider(catalog, **kwargs) Tuple[ProviderSetting, BaseProvider]
Retrieve ProviderSetting related with bdc_catalog.models.Provider.
- bdc_collection_builder.collections.utils.get_provider_type(catalog: str)
Retrieve the driver for Data Collector.
Seek in bdc-collectors app for the driver type for catalog representation.
- bdc_collection_builder.collections.utils.is_sen2cor(collection: Collection) bool
Check if the given collection is a Sen2cor product.
- bdc_collection_builder.collections.utils.is_valid_compressed(file)
Check tar gz or zip is valid.
- bdc_collection_builder.collections.utils.is_valid_compressed_file(file_path: str) bool
Check if given file is a compressed file and hen check file integrity.
- bdc_collection_builder.collections.utils.is_valid_tar(file_path: str) bool
Check file integrity of a tar file.
- bdc_collection_builder.collections.utils.is_valid_tar_gz(file_path: str)
Check tar file integrity.
- bdc_collection_builder.collections.utils.post_processing(quality_file_path: str, collection: Collection, scenes: dict, resample_to=None)
Stack the merge bands in order to apply a filter on the quality band.
We have faced some issues regarding nodata value in spectral bands, which was resulting in wrong provenance date on STACK data cubes, since the Fmask tells the pixel is valid (0) but a nodata value is found in other bands. To avoid that, we read all the others bands, seeking for nodata value. When found, we set this to nodata in Fmask output:
Quality Nir Quality 0 0 2 4 702 876 7000 9000 => 0 0 2 4 0 0 0 0 687 987 1022 1029 => 0 0 0 0 0 2 2 4 -9999 7100 7322 9564 => 255 2 2 4
Notes
It may take too long to execute for a large grid.
- Parameters:
quality_file_path – Path to the cloud masking file.
collection – The collection instance.
scenes – Map of band and file path
resample_to – Resolution to re-sample. Default is None, which uses default value.
- bdc_collection_builder.collections.utils.raster_convexhull(file_path: str, epsg='EPSG:4326', no_data=None) dict
Get raster image footprint.
- Parameters:
file_path (str) – image file
epsg (str) – geometry EPSG
no_data – Use custom no data value. Default is dataset.nodata
- bdc_collection_builder.collections.utils.raster_extent(file_path: str, epsg='EPSG:4326') Polygon
Get raster extent in arbitrary CRS.
- Parameters:
file_path (str) – Path to image
epsg (str) – EPSG Code of result crs
- Returns:
geojson-like geometry
- Return type:
dict
- bdc_collection_builder.collections.utils.remove_file(file_path: str)
Remove file if exists.
Throws Error when user doesn’t have access to the file at given path
- bdc_collection_builder.collections.utils.safe_request()
Define a decorator to disable any SSL Certificate Validation while requesting data.
This snippet was adapted from https://stackoverflow.com/questions/15445981/how-do-i-disable-the-security-certificate-check-in-python-requests.
- bdc_collection_builder.collections.utils.save_as_cog(destination: str, raster, mode='w', **profile)
Save the raster file as Cloud Optimized GeoTIFF.
See also
Cloud Optimized GeoTiff https://gdal.org/drivers/raster/cog.html
- Parameters:
destination – Path to store the data set.
raster – Numpy raster values to persist in disk
mode – Default rasterio mode. Default is ‘w’ but you also can set ‘r+’.
**profile – Rasterio profile values to add in dataset.
- bdc_collection_builder.collections.utils.upload_file(file_name, bucket='bdc-ds-datacube', object_name=None)
Upload a file to an S3 bucket.
Adapted code from boto3 example.
- Parameters:
file_name (str|_io.TextIO) – File to upload
bucket (str) – Bucket to upload to
object_name (str) – S3 object name. If not specified then file_name is used