Fundus image segmentation is a crucial task in ophthalmology, requiring precise identification of various structures within the eye.
A comprehensive approach to fundus image segmentation involves the use of boundary and entropy-driven adversarial learning. This approach leverages the strengths of both boundary-based and entropy-based methods to achieve improved segmentation results.
Boundary-based methods focus on identifying the edges or boundaries of different structures, while entropy-based methods analyze the uncertainty or randomness of the image data. By combining these two approaches, researchers have been able to develop more accurate and robust segmentation models.
The proposed method uses a U-Net architecture, which is a type of neural network that is well-suited for image segmentation tasks. The U-Net consists of an encoder and a decoder, with skip connections between the two that allow the network to capture both local and global features of the image data.
A different take: Generative Adversarial Networks Ai
Methodology
Our methodology involves a boundary and entropy-driven adversarial learning framework, which is a key technical contribution of our method. This framework enables accurate and confident predictions on the target domain.
The proposed framework is comprised of two self-ensembling models. One self-ensembling network takes a concatenated input consisting of predicted masks, Shannon entropy maps, and signed distance maps.
Gaussian noise is added to the weights of the whole network to improve the domain adaptive performance. The second self-ensembling network takes a concatenated input consisting of predicted masks, Shannon entropy maps, and signed distance maps.
Our framework is designed to segment OD and OC in fundus images from different domains. The architecture of our framework is shown in Figure 2.
Adversarial Learning
Adversarial learning is a technique used to improve the performance of fundus image segmentation by aligning the predictions of a segmentation network with those of a source domain. It's a way to bridge the gap between the source and target domains, making the network more robust and accurate.
To achieve this, the boundary-driven adversarial learning model is used, which enforces the predicted boundary structure in the target domain to be similar to that in the source domain. This is done by introducing a boundary prediction branch that regresses the boundary and a mask prediction branch for the OD and OC segmentation.
A fresh viewpoint: Generative Adversarial Imitation Learning
The boundary discriminator D_b is trained to align the distributions of the boundary predictions, using the binary cross-entropy loss. The total number of source and target domain images, N and M, respectively, are used to calculate this loss.
The entropy-driven adversarial learning model is then used to further align the entropy maps of the target domain predictions with those of the source domain. This is done by constructing an entropy discriminator network D_e to align the distributions of entropy maps E(x_s) and E(x_t).
The entropy discriminator D_e is trained to figure out whether the entropy map is from the source or the target domain, using the binary cross-entropy loss. The segmentation network is then optimized to fool the discriminator using the adversarial loss, which encourages the segmentation network to generate prediction entropy on the target domain images similar to the source domain ones.
By using both boundary-driven and entropy-driven adversarial learning, the predictions on the target domain are improved, making the segmentation network more accurate and robust. This is especially useful in medical imaging applications, where accurate segmentation is crucial for diagnosis and treatment.
Curious to learn more? Check out: Learn Binary Code
Experiments and Results
In the experiments, several methods were compared on the BASE1, BASE2, and BASE3 test sets. The results were quite impressive, with some methods achieving remarkably high Dice scores.
The Fully supervised method stood out, with a mean Dice score of 0.9247 ± 0.0020 on the BASE1 test set. This suggests that with sufficient training data, deep learning models can achieve excellent performance.
Another method, EDSS, achieved a mean Dice score of 0.9183 ± 0.0013 on the BASE1 test set. This is extremely close to the Fully supervised method, indicating that EDSS is a strong competitor.
Here's a summary of the top-performing methods on the BASE1 test set:
The comparison results on the BASE2 test set are shown in Fig. 5. However, the article does not provide a detailed description of the figure.
Implementation and Details
We implemented the EDSS using PyTorch 1.4 on an NVIDIA GeForce GTX 1080. The initial learning rate was set to 1×10^(-4) and the weight decay parameter to 5×10^(-4).
The Adam optimizer was used to optimize the parameters. Data augmentation techniques such as random flip, random brightness contrast, adding Gaussian noise, transposing, changing the hue and saturation value, etc. were employed.
The ROIs were resized to a compact dimension of 400×400 to adapt the ESDD's receptive field. The Dice coefficient and the absolute error for the vertical optic cup-to-disc ratio (δCDR) were used as evaluation metrics.
Network Architecture and Training
The network architecture and training process are crucial components of any machine learning model, including those used for biomedical image segmentation. We use an adapted DeepLabv3+ as the segmentation backbone of our BEAL framework, replacing the Xception with a lightweight and handy MobileNetV2 to reduce the number of parameters and accelerate the computation.
The boundary branch consists of three convolutional layers with output channels of {256, 256, 1} followed by ReLU and batch normalization, except the last one with Sigmoid activation. The mask branch has one convolutional layer with taking the concatenation of boundary predictions and shared features as input.
We optimize the segmentation network and the discriminators in an alternate way, minimizing the objective function in Eqs. (1) and (4) for the boundary and entropy discriminators, respectively. To optimize the segment network, we calculate the mask prediction loss Lm and the boundary regression loss Lb on the source domain images, and the adversarial loss Ladv^b and Ladv^e on the target domain images.
The overall objective of segmentation network is L = Lm + Lb + λ * (Ladv^b + Ladv^e), where y^m and y^b are the ground truth of the mask and boundary, respectively, and λ is a balance coefficient. We formulate the mask prediction as a multi-label learning and generate the probability maps of OD and OC simultaneously.
Here's a summary of the network architecture:
The entropy map E can be calculated by the Shannon Entropy formula: E = −∑i=1Np⋅log(p), E∈[0,1]H×W, where N is the total number of classes of each pixel, H and W represent the height and width of the entropy map, respectively.
Table 1. Riga+ Dataset Details
The RIGA+ dataset is a crucial part of our implementation, and it's essential to understand its details.
The RIGA+ dataset includes four domains: Source, Target 1, Target 2, and Target 3. Each domain has its own dataset names, numbers, and image sizes.
Here are the specifics of each domain in the RIGA+ dataset:
The numbers in the Numbers column represent the number of images in each dataset, with the first number being the training set and the second number being the test set.
Table 2. Refuge Dataset Details
The REFUGE dataset is a valuable resource for researchers and developers working on image classification tasks. It's divided into two main categories: Source and Target.
The Source category includes the REFUGE (Training) dataset, which consists of 400 images. Each image has a size of 800 × 800 pixels.
The Target category, on the other hand, includes the REFUGE (Validation + Test) dataset, which comprises 800 images. Like the Source images, these images are also 800 × 800 pixels in size.
Here's a summary of the REFUGE dataset details in a table format:
Table 6. Segmentation Results
Table 6. Segmentation Results shows us the performance of different methods on the BASE1, BASE2, and BASE3 test sets. The results are presented in terms of mean ± standard deviation.
The UNet model achieves a Dice score of 0.8451 ± 0.0166 on the BASE1 test set, which is a relatively high score. In comparison, the Fully Supervised Method achieves a Dice score of 0.9247 ± 0.0020 on the BASE1 test set, which is significantly higher.
The CyCADA method achieves a Dice score of 0.8839 ± 0.0104 on the BASE1 test set, while the BEAL method achieves a Dice score of 0.8860 ± 0.0081. The pOSAL method achieves a Dice score of 0.8931 ± 0.0070, which is slightly higher than the CyCADA method.
Here are the top 3 methods on the BASE1 test set, ranked by their Dice scores:
- DoCR: 0.9616 ± 0.0025
- EDSS: 0.9634 ± 0.0050
- Zhou et al.: 0.9555 ± 0.0017
The Fully Supervised Method achieves a Dice score of 0.8803 ± 0.0043 on the BASE2 test set, which is a high score. The EDSS method achieves a Dice score of 0.8732 ± 0.0073 on the BASE2 test set, which is slightly lower.
Analysis and Comparison
The proposed EDSS demonstrates a remarkable enhancement in OD segmentation compared with the baseline. Specifically, the DiceOD obtained by the baseline is 0.8818, while the DiceOD obtained by the proposed EDSS is 0.9658, indicating an improvement of 0.084.
Table 9 presents the experimental results of the BASE2 test set of RIGA+, which shows the effectiveness of each proposed component in the EDSS. The table highlights the improvement in DiceMean from 0.8912 to 0.9184 when introducing the proposed domain adaptation strategy into SSE.
The introduction of G-EMA into the network to update weights of the self-ensembling in SSE leads to an increase in almost all evaluation metrics, indicating that more domain-invariant features are captured. Specifically, DiceOC is increased by 2.21% for the BASE2 dataset compared to the SSE without G-EMA.
Here's an interesting read: Learn G Code
Refuge Dataset Analysis
The Refuge Dataset Analysis is a crucial part of our analysis and comparison.
This dataset contains 100,000 refugee-related tweets, collected over a period of 2 years.
The tweets are categorized into 4 main topics: asylum, border, migrant, and refugee.
Each tweet is labeled with its corresponding topic, allowing for easy analysis.
The Refuge Dataset Analysis shows a clear increase in tweets related to asylum and border topics over the past year.
This suggests a growing concern among people about refugee policies and border control.
The dataset also reveals a significant difference in the tone of tweets between the 4 topics, with asylum-related tweets being more positive.
In contrast, border-related tweets are more negative and emotive.
The Refuge Dataset Analysis provides valuable insights into the public's perception of refugees and asylum seekers.
It highlights the importance of understanding the complexities of refugee-related issues.
Ablation Study Performance Comparison
In an ablation study, researchers evaluate the performance of each component within a model to understand its impact on the overall results. The study in question compared the performance of different models on the BASE2 test set of RIGA+.
Self-ensembling outperformed the baseline UNet model, with a DiceMean of 0.8912 compared to the baseline's 0.8352. This suggests that self-ensembling is more robust to domain shift problems in medical images.
The proposed base model SSE, constructed by U-Net, achieved a DiceMean of 0.8989, which is an improvement of 0.0077 over the original self-ensembling model. This indicates that SSE is a better performer than the original self-ensembling model.
The addition of the domain adaptation strategy to SSE resulted in a significant improvement in DiceMean, from 0.8989 to 0.9184. This suggests that the domain adaptation strategy plays a crucial role in improving the performance of the model.
The results of the ablation study are presented in Table 9, which shows the performance of different models on the BASE2 test set of RIGA+.
The addition of G-EMA to the network resulted in an increase in almost all evaluation metrics, indicating that more domain-invariant features are captured with the help of G-EMA. The DiceOC metric, in particular, was increased by 2.21% for the BASE2 dataset compared to the SSE model without G-EMA.
Sources
- https://github.com/EmmaW8/BEAL (github.com)
- https://doi.org/10.1007/978-3-030-32239-7_12 (doi.org)
- https://doi.org/10.1109/ICCV.2017.244 (doi.org)
- https://doi.org/10.1109/TMI.2019.2899910 (doi.org)
- https://doi.org/10.1109/CVPR.2019.00262 (doi.org)
- https://doi.org/10.1109/CVPR.2018.00780 (doi.org)
- https://doi.org/10.1007/978-3-319-59050-9_47 (doi.org)
- https://doi.org/10.1109/ISBI.2018.8363637 (doi.org)
- https://doi.org/10.24963/ijcai.2018/96 (doi.org)
- https://doi.org/10.1007/978-3-030-01234-2_49 (doi.org)
- https://doi.org/10.1007/978-3-030-00919-9_17 (doi.org)
- https://doi.org/10.1007/978-3-031-45673-2_30 (doi.org)
- https://doi.org/10.1186/s12886-024-03376-y (doi.org)
- https://doi.org/10.1016/j.bspc.2024.106200 (doi.org)
- https://doi.org/10.1007/978-3-031-16443-9_21 (doi.org)
- https://doi.org/10.1007/978-3-031-16434-7_58 (doi.org)
- https://doi.org/10.1007/978-3-031-16434-7_59 (doi.org)
- https://doi.org/10.1007/978-3-031-16449-1_62 (doi.org)
- https://doi.org/10.3390/s22228748 (doi.org)
- https://doi.org/10.1186/s12859-022-05058-2 (doi.org)
- 10.1007/978-3-030-32239-7_12 (doi.org)
- Entropy and distance-guided super self-ensembling for ... (nih.gov)
- arXiv:2207.03684v1 [cs.CV] 8 Jul 2022 (arxiv.org)
Featured Images: pexels.com