Watersheds Extraction From Satellite Images Using Attention U-Net

Watersheds Extraction From Satellite Images Using Attention U-Net

The accurate and efficient extraction of watersheds from satellite images enables the identification of watersheds and their boundaries, which can be used to assess the health of watersheds, monitor changes in their characteristics over time due to global warming, and identify potential sources of water pollution. While significant progress has been made in this area, further research is needed to address the challenges of accurately and efficiently extracting watersheds from satellite images in different environmental settings. This research aims to develop a more efficient method for watershed extraction. In this project, we have developed a balanced augmentation-based deep learning method to solve the problem.

Watersheds Extraction From Satellite Images Using Attention U-Net

The process of watershed extraction involves the identification of water channels, such as streams and rivers, and the delineation of the areas that contribute to the flow of water in these channels. The analysis of satellite imagery can provide valuable information about the distribution of water resources, land use patterns, and the impact of human activities on the environment. The mapping of watersheds has been a labor-intensive and time-consuming, requiring extensive fieldwork and data collection. In recent years, there has been significant research on the use of satellite imagery for watershed extraction, which involves the identification and delineation of the boundaries of watersheds using digital image processing techniques and deep learning.

Sentinel-2 satellite photos of water bodies submitted through Kaggle have served as the dataset for this project. Each photograph features a black-and-white mask that depicts everything save water in white and black, which contains 2269 photos in each of the two directories. Since training time is one of the bottlenecks in recent days, we want to avoid unnecessary computation time added by augmented samples. In this project, we have implemented a deep learning algorithm for watershed extraction using satellite imagery, including different methods, techniques, and applications, highlighting the challenges and opportunities for future research in this field.

Using the augmentation step, balancing the dataset while augmenting is another issue to overcome. When employing augmentation to detect minority classes, it is crucial to clarify which sample needs to be oversampled.

The accuracy, computational complexity, and appropriateness of the available algorithms vary for various kinds of satellite photos. Our study has therefore concentrated on enhancing the reliability and accuracy of watershed extraction and creating new techniques for river network analysis and monitoring.

Related Works

In recent years, several methods have been proposed to extract watersheds from high-resolution satellite images, including texture-based segmentation, multi-scale watershed segmentation, and deep learning approaches. Additionally, there has been significant research on watershed monitoring and analysis using satellite images, such as using machine learning techniques to identify changes in watershed features over time. One common approach involves using thresholding techniques. Yan et al. [3] proposed an interesting approach using UAVs or drones to extract river networks. The method uses a UAV-mounted camera to capture high-resolution images of rivers and a deep learning approach to detect the river network, trained using a large dataset of manually labeled river networks.

Additionally, there has been significant research on river monitoring and analysis using satellite images, such as Xu et al.’s [7] proposed method for monitoring river morphology. The method involves extracting features such as river width and curvature from multi-temporal satellite images and using machine learning to analyze the changes over time. Dhanachandra et al.[8] proposed a color histogram-based methodology to identify river pixels from satellite images. The method involved calculating the color histograms of the image, detecting the peaks corresponding to river pixels, and then refining the results using morphological degradation.

Other studies have investigated the use of radar images for river network extraction. Jia et al.[9] proposed a scattering mechanism-based feature extraction approach for river network extraction from polarimetric synthetic aperture radar (PolSAR) images. This method involves identifying the scattering mechanisms of the river pixels and using a minimum spanning tree algorithm to connect the river pixels. Rishikeshan and Ramesh et al.[10] proposed a Mathematical Morphological (MM)-driven approach to extract water features from satellite data. This approach involves using morphological operations to extract water bodies from the image and then detecting the river pixels based on specific criteria.

Tao et al.[11] developed a method that uses a texture-based segmentation algorithm to extract river networks from high-resolution satellite images. The process starts with a texture analysis approach to divide the image into distinct areas, followed by a graph-based approach to connect the river pixels. In a similar approach, Li et al. [12] proposed a multi-scale watershed segmentation method for river network extraction. They also used a watershed segmentation algorithm to segment the image into distinct areas, then used a set of criteria to detect the river pixels and connect them to form the river network.

Several studies have explored the use of Attention U-Net for river network extraction. For example, Wu et al. [15] proposed a method that uses Attention U-Net to extract river networks from remote sensing images. The method involves dividing the image into blocks, applying Attention U-Net to each block, and then merging the results to obtain the final river network. The proposed method outperformed several existing approaches regarding accuracy and computational efficiency.

Similarly, Zhang et al. [16] used Attention U-Net for water body segmentation in satellite images. The proposed method involved dividing the image into patches, applying Attention U-Net to each patch, and then merging the results to obtain the final segmentation map. The authors reported that their method achieved higher accuracy and faster processing time compared to other state-of-the-art methods.

Another study by Guo et al. [17] explored the use of Attention U-Net for river extraction from UAV images. The proposed method divided the image into patches and applied Attention U-Net to each patch. The authors reported that their method achieved high accuracy and robustness in detecting river features from complex UAV images.

Attention U-Net’s ability to capture relevant features and localize objects of interest makes it a promising approach for monitoring and analyzing changes in watersheds and river networks over time.

Ling Chen et al. [5] mentioned the impact of results by variation of training sample labels for labeled data and supervised learning. Amir et al. [6] mentioned the challenges of data augmentations for unsupervised learning. Cody Coleman [4] worked on data selection methods for active learning. His idea was to run a simple and light deep learning model before running the actual model. This way, the overhead time was far less than the convergence time. This is more like a part of finding important samples.


We have an image dataset of river bodies where each image has a corresponding masked image. To prepare the dataset, we observed that augmentation techniques were used on sentinel-1 river image and chest X-ray data. We applied similar techniques to our water body image dataset to transform and increase its size.

To improve the quality of our dataset and demonstrate the effectiveness of our approach, we eliminated several samples with corrupted or entirely black masks. This led to a reduction in the dataset size of images with varying sizes and resolutions, each with corresponding mask images. We created two data sets to showcase our technique’s performance: ‘Regular Augmentation’ and ‘Weighted Augmentation.’ The former involves augmenting each image only once, while the latter utilizes a proposed technique that we will explain in more detail later.

Given that our data consists of images of water bodies, we found that flipping and rotation are effective augmentation techniques. To create unique augmented images using our proposed ‘Weighted Augmentation’ approach, we flipped and rotated each image by a distinct angle, generating new input images and expanding our dataset.

To simplify the high-dimensional image data, we applied dimension reduction techniques such as PCA and tSNE. After considering both options, we chose tSNE to reduce our input before running a k-means algorithm to cluster the dataset. This allowed us to determine which groups of water body images have more and which have fewer images. We conducted experiments to determine the optimal number of clusters and adjusted the size of smaller groups to match the larger groups to ensure all groups had the same size.


To solve the difficulty of balancing samples, our strategy is to classify data using a naive deep learning model, the simplest and lightest weight initially. Then, through data augmentation, we used oversampling smaller classes of samples to ensure the balance of training data.

To explain the properties of open water and highlight its existence in remote sensing digital images, we used the normalized differential water index (NDWI) [2], a new data analysis approach for evaluating water resources. We employed the Attention Unet technique, a variation of the U-Net architecture [1], a convolutional neural network (CNN) used for image segmentation, to prepare the model for river water-flow border delineation. To enhance segmentation performance, attention gates are added to the U-Net architecture in the Attention U-Net architecture.

Fully Convolutional Network: U-Net

FCN is a deep-learning algorithm for semantic segmentation in computer vision. It uses convolutional and transposed convolutional layers with skip connections to classify each pixel of an input image into predefined categories.

U-Net is a type of neural network architecture designed for image segmentation tasks. It is a convolutional neural network that is named after its U-shaped architecture, which consists of an encoder path and a decoder path.

The encoder path is responsible for extracting features from the input image, and it consists of a series of convolutional layers followed by max pooling layers. Each convolutional layer applies a set of filters to the input image to extract features, and the max pooling layers downsample the feature maps to reduce the spatial dimensions of the data.

The decoder path is responsible for upsampling the feature maps and generating the final segmentation mask. It consists of a series of up-convolutional layers (also known as transposed convolutional layers) that increase the spatial dimensions of the data, followed by a concatenation operation that merges the feature maps from the corresponding layer in the encoder path. This allows the decoder to combine low-level and high-level features from the input image to produce the final segmentation mask.

To improve the network’s performance, a skip connection is also added between the encoder and decoder paths. This allows the decoder to directly access the high-level features extracted by the encoder, which can be used to refine the segmentation mask.

Schematic of Additive Attention Gate
Attention Gates in U-Net

The Attention U-Net is a modification of the original U-Net architecture that integrates attention gates into its design. Attention gates are used to selectively filter out irrelevant features and highlight important ones, thereby improving the network’s performance in image segmentation tasks.

In the Attention U-Net [13] architecture, attention gates are added to the encoder-decoder pathway of the U-Net. In the encoder pathway, convolutional layers are used to downsample the image features into a smaller dimension. After each downsampling step, an attention gate selectively filters out irrelevant information and highlights important features. This attention gate is created by taking the encoder feature maps and generating a weight map using two convolutional layers followed by a sigmoid activation function. This weight map is multiplied element-wise with the encoder feature maps to create the attention output.

Architecture of Attention U-Net
Architecture of Attention U-Net

In the decoder pathway, the attention output is concatenated with the decoder input, and a series of upsampling layers are used to produce the final segmentation map. By using attention gates, the Attention U-Net can selectively highlight important image features while filtering out irrelevant ones, thereby improving its segmentation performance. The attention mechanism helps to address the limitations of the original U-Net architecture by allowing the network to focus on the most relevant features, resulting in more accurate segmentation.

Experiments & Results

The initial dataset contained 2269 samples, but after removing unwanted images, this reduced to the same number. With the addition of regular augmentation, the dataset size increased to 6757, and with weighted augmentation, it further increased to 8104. During the training process, we used normal augmentation for 200 epochs, which resulted in training on 1351400 samples. However, for weighted augmentation, we only trained for 150 epochs, resulting in 1215600 samples being trained, which is 135k less than compared to normal augmentation, equivalent to 20 epochs. Therefore, weighted augmentation was trained for 20 epochs less than regular augmentation.

The Attention U-Net and autoencoder share similarities in their basic structure and encoder-decoder architecture. So, before going through the broader domain, we worked on implementing our model on autoencoders [14].

Performance on Autoencoder

We created an encoder and a decoder with four convolutional layers and four transposed convolutional layers, respectively. The encoder takes RGB images of size 256×256 pixels as input and produces a feature map of size 128x8x8 as output. The decoder then takes the feature map from the encoder and generates grayscale images of 256×256 pixels as output. We trained our model on a dataset of masked images to reconstruct the original images from their masked versions. To achieve this, we used the mean squared error (MSE) loss function and the Adam optimizer. We evaluated the model’s accuracy on the normally augmented dataset, and an advanced augmented dataset with an epoch of 200 and a learning rate of 0.001, and the results are :

Mean Square Error Loss & Accuracy on various Dataset
Mean Square Error Loss & Accuracy on various Dataset
Performance of Auto-Encoder on Test Data
Performance of Auto-Encoder on Test Data
Performance on Attention U-Net

We have implemented an attention U-Net model utilizing PyTorch, which is composed of encoder and decoder blocks that are interconnected by attention gates. The encoder blocks are responsible for downsampling the input image, while the decoder blocks perform upsampling on the output obtained from the encoder blocks.

Our model was executed on Kaggle using GPU P100, and as anticipated, the Attention U-Net model implementation outperformed the Autoencoder model.

Performance of Attention U-Net on Train Test data
Performance of Attention U-Net on Train Test data

For hyper-parameter selection, we utilized the Adam optimizer with cross-entropy as the loss function. We measured model accuracy using MeanIoU as our metric and utilized the concept of callbacks with 200 epochs. We integrated the ‘EarlyStopping’ callback, which halts the training process if validation loss does not improve for three consecutive epochs. We also incorporated the ‘restore_best_weights’ parameter, which ensures that the model’s weights are restored to the best version seen during training.

To save the best model version during training, we added the ‘ModelCheckpoint’ callback and saved it to a file named “AttentionUNet.h5”. The ‘save_best_only’ parameter ensures that only the best version of the model is saved, ensuring the final model has optimal performance. The model performance of the model can be described using the graph below:

Attention U-Net on Test data
Attention U-Net on Test data

To summarize, our research has effectively developed an automated system for extracting river networks and watersheds from satellite images, and the advanced augmented dataset yielded better results than the normal augmented dataset. Our model produced comparable or even superior outcomes to the masked images.

Using satellite imagery for mapping natural resources such as forests and water bodies is increasingly significant, and monitoring them regularly is crucial for their sustainable management to prevent exploitation. Water bodies, in particular, have a significant role in the global carbon cycle and climate variations, and mapping them in the spatiotemporal domain is critical to assessing their degradation and disappearance.

Our research can serve as a basis for future studies on water resource management, which can aid in impact assessment and implementing conservation measures. Furthermore, the generic methodology we have proposed has potential applications in medical images, such as blood vessels, images of the nervous system, such as dendrites and axons, and more.

Future Works

Our future work entails exploring fine-tuning a pre-trained model, such as ResNet or VGG, to potentially enhance the model’s performance and accelerate the training process.

We also aim to integrate multiple scales of features into the attention U-Net model, which can be achieved through the concatenation of different layers in the encoder block or the use of a pyramid pooling module. Furthermore, time permitting, we will develop an interactive user interface to refine segmentation results through user feedback and manual masking, enhancing the tool’s usability and accessibility for non-experts.


[1] McFeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. Int. J. Remote Sens. 1996, 17, 1425–1432.

[2] Ronneberger, O., Fischer, P., \& Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241).

[3] Yan, H., Ma, Y., \& Liu, W. (2019). River network extraction from unmanned aerial vehicle imagery using deep learning. Remote Sensing, 11(6), 652.

[4] Codyc\ \& Christopher, Stephen\ (2019) Selection Via Proxy: Efficient Data Selection For Deep Learning. {\it CoRR} { 15}(7):5249-5262.

[5] Ling Chen,\ \& et al. (2019) Enhancement of DNN-based multilabel classification by grouping labels based on data imbalance and label correlation.

[6] Amir Namavar Jahromi, \ \& (2019) A deep unsupervised representation learning approach for effective cyber-physical attack detection and identification on highly imbalanced data.

[7] Xu, C., Yao, Y., \& Yang, X. (2020). Monitoring river morphology with multi-temporal satellite images using feature extraction and machine learning methods. ISPRS International Journal of Geo-Information, 9(8), 458.

[8] Dhanachandra, N., Singh, T. S., \& Singh, Y. (2016). River network extraction from satellite images: a review. International Journal of Remote Sensing, 37(23), 5661-5682.

[9] Jia, X., Zhang, L., Liu, Y., \& Sun, X. (2018). River network extraction using a scattering mechanism-based feature extraction approach and a minimum spanning tree algorithm for polarimetric SAR imagery. Remote Sensing, 10(10), 1610.

[10] Rishikeshan, K., \& Ramesh, H. (2018). An MM-driven approach for water feature extraction in satellite data. International Journal of Remote Sensing, 39(11), 3544-3564.

[11] Tao, J., Lin, H., Tang, L., Zhang, X., \& Lu, S. (2018). River network extraction from high-resolution remote sensing images using a texture-based segmentation algorithm. Remote Sensing, 10(7), 1064.

[12] Li, D., Guo, H., Wang, Y., \& Xu, H. (2019). A multi-scale watershed segmentation method for river network extraction from high-resolution remote sensing images. Remote Sensing, 11(7), 784.

[13] Oktay, O., Schlemper, J., Folgoc, L. L., Lee, M., Heinrich, M. P., Misawa \& Rueckert, D. (2018). Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999

[14] Hinton, G. E., \& Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507

[15] Wu, J., Liu, H., Mao, H., Huang, Y., \& Chen, Y. (2019). River network extraction from remote sensing images using attention U-Net. Remote Sensing, 11(23), 2812.

[16] Zhang, J., Wang, Z., Huang, H., \& Liu, Z. (2020). Attention U-Net for water body segmentation in satellite images. ISPRS Journal of Photogrammetry and Remote Sensing, 162, 229-243.

[17] Guo, C., Wang, H., Guo, B., \& Zou, W. (2021). A novel attention U-Net method for river extraction from UAV images. Water, 13(3), 413.

About Diwas

🚀 I'm Diwas Pandey, a Computer Engineer with an unyielding passion for Artificial Intelligence, currently pursuing a Master's in Computer Science at Washington State University, USA. As a dedicated blogger at AIHUBPROJECTS.COM, I share insights into the cutting-edge developments in AI, and as a Freelancer, I leverage my technical expertise to craft innovative solutions. Join me in bridging the gap between technology and healthcare as we shape a brighter future together! 🌍🤖🔬

View all posts by Diwas →

Leave a Reply

Your email address will not be published. Required fields are marked *