Master Thesis of Hubert Kriebitzsch at the TU Berlin Faculty IV Computer Vision and Remote Sensing Department
Data augmentation is a crucial challenge in deep learning and especially in remote sensing where data is often more difficult and costly to acquire especially when collecting data of rare events such as natural disasters. Many solutions have been proposed to this problem and data augmentation using synthetic data, mainly generated using Generative Adversarial Networks, is one of the most recent and efficient approaches to counter the effects of class imbalance. In this thesis, we further study data augmentation with synthetic data using state-of-the-art generative models. We use diffusion models to generate new remote sensing images for data augmentation purposes. To generate high-fidelity satellite images of active fire, we finetune the foundation model Stable Diffusion using Dreambooth and existing wildfire images. We apply it to the task of active fire detection by inpainting synthetic wildfires into existing satellite images. This allows us to augment semantic segmentation datasets and not only image classification datasets. We conduct a series of experiments to measure the efficiency of the methods proposed and compare different pretrained and finetuned diffusion models as well as different inpainting masks. We evaluate this approach on a small manually annotated active fire detection dataset and achieve an improvement of the dice coefficient from 58.5% up to 72.7%. This work provides new insights on remote sensing data generation with diffusion models, as well as the efficiency of data augmentation using synthetic data generated with them. It presents a novel way to generate semantic segmentation data in remote sensing.
We use the manually annotated dataset of active fire detection created by de Almeida Pereira et al. available here. It consists of 13 Landsat-8 images captured in September 2020 from different location around the world. They were split into 9,044 256x256 pixels non-overlapping patches. To simplify the image synthesis, we restrict the data to the RGB visible bands.
pip install -r requirements.txt
To finetune Stable Diffusion Inpainting you can use the inpainting_fintenuning.ipynb file, preferably run it using Google Colab. It uses the following repository to finetune the model. We use Google Drive to store training images and model checkpoints and connect the notebook to it.
To generate synthetic data we use a modified version of Automatic1111 Stable Diffusion WebUI that allows to generate a synthetic image for each pair of background and mask in batch mode.
Provided synthetic images and the corresponding masks in tif format (you can use png_to_tif.py to convert them), you can generate an augmented dataset using augment.py.
python data_augmentation/augment.py \
--input_path path_to_synthetic_images \
--output_path path_to_augmented_dataset_landsat_patches \
--mask_path path_to_inpainting_masks \
--output_mask_path path_to_augmented_dataset_masks \
--original_image_path path_to_original_dataset_landsat_patches \
--original_mask_path path_to_original_dataset_manual_annotations_patches \
--split_path path_to_original_dataset_split \
--output_split_path path_to_augmented_dataset_split \
--lmdb_path path_to_lmdb
Once you have a ready-to-use LMDB database, you can train the U-Net model.
python semantic_segmentation/run.py \
--epochs 200 \
--batch_size 8 \
--n_channels 3 \
--checkpoint_interval 10 \
--checkpoint_path path_to_checkpoints \
--lmdb_path path_to_lmdb_dir \
--split_path path_to_split