Installation
The Brazil Data Cube Collection Builder (bdc-collection-builder) depends essentially on:
Sensor Harmonization (Optional)
Compatibility
Before deploy/install BDC-Collection-Builder, please, take a look into compatibility table:
BDC-Collection-Builder |
BDC-Catalog |
BDC-Collectors |
|---|---|---|
1.0.x |
1.0.1 |
0.9.0 |
0.8.4 |
0.8.2 |
0.6.0 |
0.6.x |
0.8.2 |
0.2.1 |
0.4.x |
0.2.x |
|
Development Installation
Clone the software repository:
$ git clone https://github.com/brazil-data-cube/bdc-collection-builder.git
Go to the source code folder:
$ cd bdc-collection-builder
Install in development mode:
$ pip3 install -e .[docs,tests,amqp]
Warning
The setuptools v67+ has breaking changes related
Pip versions requirements. For now, you should install setuptools<67 for compatibility.
The packages in BDC-Collection-Builder will be upgraded to support latest version.
Note
If you have problems during the librabbitmq install with autoreconf, please, install the autoconf build system. In Debian based systems (Ubuntu), you can install autoconf with:
$ sudo apt install autoconf
For more information, please, see [1].
Note
If you would like to publish Hierarchical Data Format (HDF) datasets, you may install the extra gdal.
Optionally, you can install all dependencies as following:
$ pip3 install -e .[all]
Make sure you have GDAL installed and available in PATH.
Generate the documentation:
$ python setup.py build_sphinx
The above command will generate the documentation in HTML and it will place it under:
docs/sphinx/_build/html/
Optionally, you can serve these files temporally on http://localhost:8000 using the following command:
cd docs/sphinx/_build/html/
python3 -m http.server
Running in Development Mode
Launch Redis, RabbitMQ and PostgreSQL Containers
The docker-compose command can be used to launch the Redis and RabbitMQ containers:
$ docker-compose up -d redis mq postgres
Let’s take a look at each parameter in the above command:
up: tells docker-compose to launch the containers.
-d: tells docker-compose that containers will run in detach mode (as a daemon).
redis: the name of a service in thedocker-compose.ymlfile with all information to prepare a Redis container.
mq: the name of a service in thedocker-compose.ymlfile with all information to prepare a RabbitMQ container.
postgres: the name of a service in thedocker-compose.ymlfile with all information to prepare a PostgreSQL container.
Note
Since docker-compose will map the services to the default system ports on localhost, make sure you are not running Redis, RabbitMQ or PostgreSQL on those ports in your system, otherwise you will have a port conflict during the attempt to launch the new containers.
Note
If you have a PostgreSQL DBMS you can omit the postgres service in the above command.
Note
After launching the containers, check if they are up and running:
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8c94877e7017 rabbitmq:3-management "docker-entrypoint.s…" 34 seconds ago Up 26 seconds 4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, 15671/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp bdc-collection-builder-rabbitmq
acc51ff02295 mdillon/postgis "docker-entrypoint.s…" 34 seconds ago Up 24 seconds 0.0.0.0:5432->5432/tcp bdc-collection-builder-pg
84bae6370fbb redis "docker-entrypoint.s…" 34 seconds ago Up 27 seconds 0.0.0.0:6379->6379/tcp bdc-collection-builder-redis
Prepare the Database System
You will need an instance of a PostgreSQL DBMS with a database prepared with the Collection Builder schema.
The following steps will show how to prepare the data model:
1. Create a PostgreSQL database and enable the PostGIS extension:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-db db init
2. Create extension PostGIS:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-db db create-extension-postgis
3. Create table namespaces:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-db db create-namespaces
4. After that, run Flask-Migrate command to prepare the Collection Builder data model:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-collection-builder alembic upgrade
5. Load BDC-Catalog triggers with command:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-db db create-triggers
Note
For a initial data of collections, the BDC-Catalog has a command line utility to load a JSON like structure
into database as Collection. We have prepared a minimal JSON files in examples/data.
You load them with the following command:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-catalog load-data --from-dir examples/data # or individual as --ifile examples/data/sentinel-2-l1.json
The BDC-Collection-Builder requires a list of providers registered in database to collect data.
Please, take a look into folder examples/data/providers and set the right credentials for this step.
Once credentials is set, you can load them with command:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-collection-builder load-providers --from-dir examples/data/providers
If you would like to link a collection with a default provider (S2_L1C-1 with ESA) use the command:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-collection-builder set-provider --collection S2_L1C-1 --provider ESA
Always related a Collection with Provider Name. Do not use driver_name.
You can check collection overview with command:
SQLALCHEMY_DATABASE_URI=postgresql://postgres:postgres@localhost:5432/bdc \
bdc-collection-builder overview --collection S2_L1C-1
The following output will be:
Collection S2_L1C-1
-> title: Sentinel-2 - MSI - Level-1C
-> name: S2_L1C
-> version: 1
-> description: Level-1C product provides orthorectified Top-Of-Atmosphere (TOA) reflectance.
-> collection_type: collection
-> Providers:
- ESA, driver=SciHub, priority=1, active=True
Note
Please refer to Configuration the section
Setting up the Credentials for EO Data Providers to set valid access credentials for data providers.
Prepare the containers Sen2Cor and LaSRC 1.3.0
Before launching Sen2Cor and LaSRC processors, please, read the Configuration documentation and make sure you have the right layout of auxiliary data in your filesystem.
If you have all the auxiliary data, edit docker-compose.yml the section atm-correction and fill the following configuration based in the directory where auxiliaries are stored:
# LaSRC / LEDAPS
- "LASRC_AUX_DIR=/path/to/landsat/auxiliaries/L8"
- "LEDAPS_AUX_DIR=/path/to/landsat/ledaps_auxiliaries"
# Sen2Cor
- "SEN2COR_AUX_DIR=/path/to/sen2cor/CCI4SEN2COR"
- "SEN2COR_CONFIG_DIR=/path/to/sen2cor/config/2.8"
Note
Remember that these variables are relative inside container. You may change the mount volume in the section volumes.
The ‘SEN2COR_CONFIG_DIR` is base configuration of Sen2Cor instance with folder cfg and file L2A_GIPP.xml.
Launching Collection Builder Workers
1. In order to launch the worker responsible for downloading data, run the following Celery command:
$ DATA_DIR="/home/user/data/bdc-collection-builder" \
SQLALCHEMY_DATABASE_URI="postgresql://postgres:postgres@localhost:5432/bdc" \
REDIS_URL="redis://localhost:6379" \
RABBIT_MQ_URL="pyamqp://guest@localhost" \
celery -A bdc_collection_builder.celery.worker:celery worker -l INFO --concurrency 2 -Q download
As soon as the worker is launched, it will present a message like:
-------------- celery@enghaw-dell-note v4.4.2 (cliffs)
--- ***** -----
-- ******* ---- Linux-5.3.0-46-generic-x86_64-with-Ubuntu-18.04-bionic 2020-04-30 08:51:18
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: bdc_collection_builder:0x7fa166e9a490
- ** ---------- .> transport: amqp://guest:**@localhost:5672//
- ** ---------- .> results: postgresql://postgres:**@localhost:5432/bdc
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> download exchange=download(direct) key=download
[tasks]
. bdc_collection_builder.celery.tasks.correction
. bdc_collection_builder.celery.tasks.download
. bdc_collection_builder.celery.tasks.harmonization
. bdc_collection_builder.celery.tasks.post
. bdc_collection_builder.celery.tasks.publish
[2020-04-30 08:51:18,737: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2020-04-30 08:51:18,746: INFO/MainProcess] mingle: searching for neighbors
[2020-04-30 08:51:20,040: INFO/MainProcess] mingle: all alone
[2020-04-30 08:51:20,075: INFO/MainProcess] celery@enghaw-dell-note ready.
2. To launch the worker responsible for surface reflection generation (L2A processor based on Sen2Cor or LaSRC for Landsat 8), use the following Celery command:
$ DATA_DIR="/home/user/data/bdc-collection-builder" \
SQLALCHEMY_DATABASE_URI="postgresql://postgres:postgres@localhost:5432/bdc" \
REDIS_URL="redis://localhost:6379" \
RABBIT_MQ_URL="pyamqp://guest@localhost" \
LASRC_AUX_DIR=/path/to/auxiliaries/L8 \
LEDAPS_AUX_DIR=/path/to/auxiliaries/ledaps \
celery -A bdc_collection_builder.celery.worker:celery worker -l INFO --concurrency 4 -Q correction
As soon as the worker is launched, it will present a message like:
-------------- celery@enghaw-dell-note v4.4.2 (cliffs)
--- ***** -----
-- ******* ---- Linux-5.3.0-46-generic-x86_64-with-Ubuntu-18.04-bionic 2020-04-30 08:53:57
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: bdc_collection_builder:0x7ff25bff5390
- ** ---------- .> transport: amqp://guest:**@localhost:5672//
- ** ---------- .> results: postgresql://postgres:**@localhost:5432/bdc
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> atm-correction exchange=atm-correction(direct) key=atm-correction
[tasks]
. bdc_collection_builder.celery.tasks.correction
. bdc_collection_builder.celery.tasks.download
. bdc_collection_builder.celery.tasks.harmonization
. bdc_collection_builder.celery.tasks.post
. bdc_collection_builder.celery.tasks.publish
[2020-04-30 08:53:57,977: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2020-04-30 08:53:58,055: INFO/MainProcess] mingle: searching for neighbors
[2020-04-30 08:53:59,389: INFO/MainProcess] mingle: all alone
[2020-04-30 08:53:59,457: INFO/MainProcess] celery@enghaw-dell-note ready.
Note
This configuration is only for LaSRC/LEDAPS with Fmask4. If you would like to run with Sen2Cor, check CONFIG.
3. To launch the worker responsible for publishing the generated surface reflection data products, use the following Celery command:
$ DATA_DIR="/home/user/data/bdc-collection-builder" \
SQLALCHEMY_DATABASE_URI="postgresql://postgres:postgres@localhost:5432/bdc" \
REDIS_URL="redis://localhost:6379" \
RABBIT_MQ_URL="pyamqp://guest@localhost" \
celery -A bdc_collection_builder.celery.worker:celery worker -l INFO --concurrency 4 -Q publish
As soon as the worker is launched, it will present a message like:
-------------- celery@enghaw-dell-note v4.4.2 (cliffs)
--- ***** -----
-- ******* ---- Linux-5.3.0-46-generic-x86_64-with-Ubuntu-18.04-bionic 2020-04-30 08:54:19
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: bdc_collection_builder:0x7f52d876e3d0
- ** ---------- .> transport: amqp://guest:**@localhost:5672//
- ** ---------- .> results: postgresql://postgres:**@localhost:5432/bdc
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> publish exchange=publish(direct) key=publish
[tasks]
. bdc_collection_builder.celery.tasks.correction
. bdc_collection_builder.celery.tasks.download
. bdc_collection_builder.celery.tasks.harmonization
. bdc_collection_builder.celery.tasks.post
. bdc_collection_builder.celery.tasks.publish
[2020-04-30 08:54:19,361: INFO/MainProcess] Connected to amqp://guest:**@127.0.0.1:5672//
[2020-04-30 08:54:19,400: INFO/MainProcess] mingle: searching for neighbors
[2020-04-30 08:54:20,504: INFO/MainProcess] mingle: all alone
[2020-04-30 08:54:20,602: INFO/MainProcess] celery@enghaw-dell-note ready.
Note
In these examples, we have launched individual workers download, atm-correction,
publish listening in different queues.
For convenience, you may set the parameter -Q download,atm-correction,publish to make the
worker listen all these queues in runtime.
Just make sure that the worker has the required variables for each kind of processing.
Launching Collection Builder
To launch the Flask application responsible for orchestrating the collection builder components, use the following command:
$ DATA_DIR="/home/user/data/bdc-collection-builder" \
SQLALCHEMY_DATABASE_URI="postgresql://postgres:postgres@localhost:5432/bdc" \
REDIS_URL="redis://localhost:6379" \
RABBIT_MQ_URL="pyamqp://guest@localhost" \
bdc-collection-builder run
As soon as the Flask application is up and running, it will present a message like:
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Usage
Please, refer to the document Usage for information on how to use the collection builder to download and generate surface reflectance data products.
Footnotes