Get Our e-AlertsSubmit Manuscript
BME Frontiers / 2022 / Article

Research Article | Open Access

Volume 2022 |Article ID 9837076 |

Alex Ling Yu Hung, Edward Chen, John Galeotti, "Weakly- and Semisupervised Probabilistic Segmentation and Quantification of Reverberation Artifacts", BME Frontiers, vol. 2022, Article ID 9837076, 15 pages, 2022.

Weakly- and Semisupervised Probabilistic Segmentation and Quantification of Reverberation Artifacts

Received27 Jul 2021
Accepted07 Feb 2022
Published01 Mar 2022


Objective and Impact Statement. We propose a weakly- and semisupervised, probabilistic needle-and-reverberation-artifact segmentation algorithm to separate the desired tissue-based pixel values from the superimposed artifacts. Our method models the intensity decay of artifact intensities and is designed to minimize the human labeling error. Introduction. Ultrasound image quality has continually been improving. However, when needles or other metallic objects are operating inside the tissue, the resulting reverberation artifacts can severely corrupt the surrounding image quality. Such effects are challenging for existing computer vision algorithms for medical image analysis. Needle reverberation artifacts can be hard to identify at times and affect various pixel values to different degrees. The boundaries of such artifacts are ambiguous, leading to disagreement among human experts labeling the artifacts. Methods. Our learning-based framework consists of three parts. The first part is a probabilistic segmentation network to generate the soft labels based on the human labels. These soft labels are input into the second part which is the transform function, where the training labels for the third part are generated. The third part outputs the final masks which quantifies the reverberation artifacts. Results. We demonstrate the applicability of the approach and compare it against other segmentation algorithms. Our method is capable of both differentiating between the reverberations from artifact-free patches and modeling the intensity fall-off in the artifacts. Conclusion. Our method matches state-of-the-art artifact segmentation performance and sets a new standard in estimating the per-pixel contributions of artifact vs underlying anatomy, especially in the immediately adjacent regions between reverberation lines. Our algorithm is also able to improve the performance of downstream image analysis algorithms.

1. Introduction

Ultrasound imaging is low-cost and safe. In addition, its real-time operation is perfect for monitoring needle insertions and other clinical interventions. However, highly reflective parallel surfaces, such as needle walls, can create significant reverberation artifacts because the sound wave reverberates between the posterior and anterior surfaces of the object [1]. When the amount of reflected energy is significant, it manifests itself as an additional echo from the same surface. The reverberation artifacts are relatively bright, looking like actual boundaries which sometimes would overlap with tissue present in the image. Such artifacts not only can be caused by needles and other metallic objects but they also might be the result of certain anatomical structures with large acoustic impedance [2]. This kind of artifact can cloud clinicians’ judgement [3] and confuse medical image analysis algorithms [4, 5]. For some pixels, it can be difficult to differentiate whether the pixel is an artifact, or to assign a percentage to the pixel indicating how much of the pixel’s value is artifact or actual tissue measurement. The brightness of artifacts somewhat predictably falls off as they get further away from the reflective object, but the artifacts have uncertain boundaries and differing intensity distributions. Consequently, pixel-wise labeling is challenging and time consuming for annotators, who may have considerable differences in their annotations. An example is shown in Figure 1(a), where different annotators agree on the general location of the reverberation artifacts but they disagree on the details. Besides, it can also be seen that it is extremely hard for annotators to differentiate between reverberations when it gets further away from the object casting the artifacts, leading to more difference in annotations. Semisupervised approaches can reduce human labeling time, and weak supervision can estimate how much of the pixel is corrupted by the reverberation artifact.

In this paper, we propose a novel weakly- and semisupervised probabilistic needle and reverberation artifact segmentation algorithm and show that our algorithm is useful in improving the performance of downstream tasks. The algorithm can be broken into three parts. The first part is a probabilistic segmentation network trained with hard binary and not-so-accurate labels. The probabilistic segmentation is used to cope with the labeling errors and ambiguity in the input images. The second part is a transform function that turns the output of the first network into soft-labeled images where the value of each pixel represents how much of the pixel value is affected by the artifact. The last part is another network that is trained with the newly generated labels as desired output. The main contributions of this paper are (1) We developed a weakly- and semisupervised segmentation framework to segment the needle and reverberation artifacts which could cope well with the insufficiency in labeled data. (2) Our probabilistic approach is able to deal with ambiguity in the artifact boundaries and variations in human labels. (3) Our approach estimates how much of the pixel values are corrupted by reverberation artifacts.

There already exists a long history of research algorithms to identify and remove reverberation artifacts from medical images. [6] produce a near optimal estimate of the reflectivity value due to reverberation by soft thresholding the 2D discrete wavelet transform. [7] improve the method proposed by [6] by utilizing radiofrequency (RF) data and soft thresholds on the wavelet coefficients to estimate the reflectivity values caused by the artifact. [8] model the reverberation mathematically, identifying and removing the artifact in RF data. Temporal information is used by [9] when they develop the 3D filter to reduce reverberation artifacts. These methods either made use of the RF data or temporal information, i.e., video sequence, but most of the times, we will not have access to RF data or a large number of ultrasound videos.

Many deep learning segmentation frameworks for medical images can be good at segmenting reverberation artifacts too, but to the best of our knowledge, there is no learning-based segmentation framework specifically for the purpose of reverberation artifacts in ultrasound images. U-Net [10] use an encoder-decoder framework with skip connections between the encoder and the decoder, enabling the network to tend to more fine-grained details. [11] put Long Short-Term Memory layers (LSTM) and U-Net together, enabling the proposed USVS-Net to excel at identifying ambiguous boundaries and be robust against speckle noises. Attention U-Net is proposed by [12] to suppress irrelevant regions and has the network focus more on the target structure with different shapes and sizes. [13] recalibrate the channels by putting Squeeze and Excitation (SE) blocks [14] into a normal U-Net to increase the values of meaningful features and suppress the values of the insignificant ones. Together with dual attentive fusion blocks, adversarial learning is used in [15], allowing semisupervision of the segmentation, instead of full supervision, so that it is able to ameliorate the problem of not having enough labeled data. These methods are robust in a fully supervised situation and are good at segmenting structures with hard labels, but they are not able to generate segmentation masks that have a meaningful probabilistic output, i.e. they are unable to model the ambiguity in the data.

[16] explicitly model the aleatoric and epistemic uncertainty in deep learning and increased the performance on noisy data sets with dropout layers and an additional variance term in the output. [17] adopt Monte Carlo dropout to explore the uncertainty in the network for medical image segmentation. The work further explains how to capture different types of uncertainty in medical images. [18] combined a conditional variational autoencoder with a U-Net. The model does not just provide a single segmentation for a given input; instead, it predicts multiple plausible segmentations drawn from the distribution learned from the training data. However, this framework only works well on images with a single object or with global variations. As a result, a hierarchical probabilistic U-Net (HPU-Net) [19] is proposed to solve such problems. Instead of global latent variables, coarse-to-fine hierarchical latent variable maps are used in this work. Besides U-Net like blocks in the decoder side, some blocks are also sampled from the prior which is trained to have similar distribution with the posterior. [20] model the conditional probability distribution of the segmentation given an input image with a network named PHiSeg. Like the HPU-Net, PHiSeg also uses a posterior network and a prior network. However, unlike the HPU-Net, the likelihood network in this work completely samples from the prior network

In terms of weakly-supervised learning methods, [21, 22] investigated the weakly supervised object detection problem by leveraging prior knowledge and context information. [23] discussed the techniques to train a deep saliency network without annotations with synthesized supervision. [24] phrased the training objective as a biconvex optimization for linear models which could be interpreted as a novel loss function with arbitrary linear constraints on the structured output space of pixel labels. [25] improved the performance of weakly-supervised semantic segmentation on natural images by a self-supervised difference detection module that estimates noise from the results. In the medical domain, [26] trained a convolutional neural network (CNN) with discrete constraints and regularization priors to approach this problem. [27] proposed a weakly-supervised framework to segment the shadows in ultrasound images. Essentially, the work uses one network for the initial segmentation, feeds the output of this network into a transform function, and then gets a new set of labels to train another network. The transform function in this paper transforms the hard labels to soft labels and, at the same time, reduces the effect of labeling error.

3. Method

3.1. Overview

Due to the difficulty in defining the actual boundaries of needles, our method is designed to deal with over-labeling from annotators. Over-labeling is a labeling scheme that includes all the pixels of interest in the label, but it would falsely include background pixels in the label as well. The goal of over-labeling is to have fewer false negatives, but the tradeoff is that it also includes more false positives. Over-labeling is much faster than the traditional accurate labeling while including more true positives. Inspired by [27], we proposed a multistep segmentation framework, shown in Figure 2. The entire framework can be divided into three parts: (1) a probabilistic segmentation trained on over-labeled hard labels, (2) a transformation function which takes in the output of the first network and transforms the probabilistic outputs into soft labels to remove the false positives and quantify the artifacts, and (3) another probabilistic segmentation network trained on the newly generated soft labels. Our algorithms differ from [27] in four main parts: (1) our transform function is designed based on the appearances and physical properties of reverberation artifacts instead of shadows, (2) we not only improve the segmentation masks in the transform function but also quantify the artifacts, (3) we propose to use probabilistic segmentation to compensate for the over-labeling by human experts instead of deterministic segmentation, and (4) we propose a new probabilistic segmentation network that learns variance from a known variance posterior. Even though the first part of the algorithm is fully supervised, accurate labeling is not needed, since the probabilistic segmentation method models the ambiguity in the labels and the original inputs. After getting the first network trained, the unlabeled ultrasound images can be passed through the network, creating hard segmentations on the unlabeled images. The amount of training data for the second network can also be increased with this approach. The output of the first network is then modified by a transform function, which converts the hard-labeled segmentation into soft labels. The transform function is designed specifically for the appearance and physical properties of reverberation artifacts in ultrasound images. The second network is designed to further eliminate the effects of uncertainty in human labels and ambiguity of the images, creating a soft-labeled segmentation which represents how much of each pixel value is corrupted by reverberation artifacts.

3.2. Probabilistic Segmentation on Hard Labels

The artifacts might differ by shape, intensity distribution, and unclear boundaries, so human labels could be different across annotators. Even when the same annotator labels the same image multiple times, the results can still differ. As a result, we conjecture that careful elimination of the ambiguity in the labeling can assist with subsequent analysis. Ideally, the segmentation algorithm needs to generate nearly identical results for the same image despite using data labeled by different annotators. The work in [19] models the ambiguity in the images and the labels, which is what we look for in this stage of work. Therefore, we utilize their network for our first step : segmentation on hard labels. In our network, even though we used the same general structure as [19], we used more local blocks than global blocks, so that we could model the ambiguity of the edges better.

We train the network with hand-labeled images and cross entropy as loss function, but after it is trained, we do not necessarily need to use the labeled training images for the following steps. Sampling from a learned distribution, the network will generate two mean maps and two standard deviation maps for the artifacts and needles segmentation. We denote these output mean maps for artifacts and needles as and , respectively. We denote the output standard deviation maps for artifacts and needles as and , respectively.

3.3. Transform Function

Even though the output of the first network inherently models the ambiguity in the input images and the human labels, it does not account for the fact that weaker reverberation artifacts do not affect the quality of the image as much as the strong ones do. Therefore, we transform the segmentation mask into a soft labeling mask, which models how much the artifact affects the pixel value. There are three components to our transform function: (1) reduction of false positives, (2) introduction of an exponential decay depending on how far the artifact is relative to the needle causing the artifact, and (3) lowering the artifact-segmentation soft labels for pixels in between adjacent reverberations.

The first part of the transform function takes in predicted hard needle mask and artifact mask . We begin by clustering the artifacts based on distance. We then determine if there is a needle closely above a cluster of artifacts: if there is, then keep the artifacts, and set the needle as the cause for this cluster of artifacts. This is to deal with the situation where there are multiple needles in the image. If not, then remove the cluster. The pseudocode for the algorithm is shown in Algorithm 3.3. The horizontal threshold will be small because needle artifacts are typically (near) continuous horizontal lines, whereas the vertical threshold will be larger to encompass the vertical spacing between artifact lines, which is based on the needle’s reverberating cross-section [2]. The threshold indicates the largest possible distance between the segmented artifacts and the corresponding needles for the artifacts to be considered true positives. In this paper, for images, we set , , and . The hyperparameters should not affect the result much as long as they are in a reasonable range. The hyperparameter selection process is further discussed in Section 4.

Result:: Artifact mask with false positives removed
input image size
; ;
  Create stack ;
  ; ;
  Whileis not emptydo
     ; ;
If, and, s.t. belowandthen

The second part of the the transform function’s purpose is to create the exponential decay in the segmentation and, at the same time, compensate for the pixels that do not comply with the decay. As sound waves travel through a medium, the intensity falls off as the distance increases. Normally, the intensity decays exponentially [28], and due to the fact that reverberation artifacts are caused by sound waves reverberating inside the tissue [1], the intensity of the reverberation artifacts should be no different. The first bounce of the sound wave at the needle represents the needle, followed by resonating bounces resulting in the artifact, so the intensity of the artifact should be on an exponential decay of the intensity of the needle. The intensity of the artifact falls until the last of the identified artifact pixels towards the bottom. We model this exponential decay as as follows: where is a hyper parameter, depending on the ultrasound imaging settings. The higher the frequency, the larger should be, as sound waves would then encounter more resistance in-depth. For the experiments in this paper, . represents the distance of pixel to the needle that is causing the artifact. denotes the distance between the deepest pixel (the furthest pixel away from the corresponding needle) in the cluster of artifacts containing and the nearest pixel in the corresponding needle.

However, other objects and tissues in the image would also have minor effects on the pixel values in the reverberation artifacts, e.g., other boundaries which overlap with the reverberation, shadows caused by vessels interacting with the reverberations, etc. The exponential-decay artifact model does not account for these other components of the pixel values. To deal with the problem, we also create an alternate measurement based on the pixel values in the input images. Denote the input image as . For normalization, we first find the maximum pixel value in the needle-region of . The normalized pixel values in can then be used as weights on the artifact soft-label mask as follows:

In cases where artifact pixels are unusually bright, e.g., is large due to overlapping with actual object boundaries. Preserving such property is desired because it represents the actual anatomy or a different artifact (such as a diagnostic B-line in a lung). We therefore combine and by taking the maximum:

At this point, we have removed the false positives and also created an exponential decay in the artifact soft labels while preserving the effects of underlying anatomic boundaries on artifacts. However, we still need to assign lower values to the soft labels in between each reverberation, since the hard human labels tend to (incorrectly) in-paint such regions. The pixels between reverberations tend to have lower values than the artifact pixels in the original input images. In the third part of our transform function we determine whether a certain pixel belongs to the nonartifact region in between reverberation lines. We need to compare the value of that pixel in the original input ultrasound image against the maximum value in a local patch. We also apply a sigmoid-like function to make sure that artifact soft labels are limited with respect to the highest intensity value within that region. Intuitively, the sigmoid-like function ensures that the values in which are close to the local maximum (which should be real artifact pixels) would be mostly preserved and the values close to zero (which should not be artifact pixels) would be mostly removed, while the values in between follow a smooth decay in the output artifact mask. The equations are listed as follows: where is a hyper-parameter that controls how fast the fall-off is. If the noise level is high, a larger should be used. Also, is a rectangular region where is the center point, and and is the height and width. and stand for vertical and horizontal window, respectively. should be large enough to include at least one line of true reverberation artifact in the patch. In this paper, we set , , and . Again, these parameters are not sensitive, as long as and stay relatively small and more details are discussed in Section 4.

As for the standard deviation map of artifacts, we want to rescale its values in the same manner as we rescale the mean map of artifacts. Therefore, the transform function for the standard deviation map can be simplified to where avoids division by zero.

Since needles are a lot more visible and less ambiguous than reverberation artifacts and the needle boundaries are better defined than the artifact boundaries, the probabilistic output of the first network would be enough for our purposes. Therefore, the needle labels are not processed, thus will not change in this part, as a result, we have , and .

Among the hyperparameters mentioned in the subsection, the values of , , , , and are selected based on the observation of our particular dataset. In other words, the values for these hyperparameters are dependent on the appearance of the needles and artifacts, as well as the scale (i.e., physical distance between pixels). These hyperparameters are more related to the imaging subject and the scale of the image rather than the imaging setting. On the contrary, and are more dependent on the imaging setting, as indicates the sound waves’ ability to penetrate and the selection of is affected by the noise level in the image.

3.4. Probabilistic Segmentation on Soft Labels

Our segmentation model here is built upon the HPU-Net [19], and we follow their notation below. We want to model the unknown ambiguity in the images and the labels, and at the same time, we also want to take the known variance in the labels into account. Our training objective is similar to that of the previous work: maximizing the evidence lower bound on the likelihood , except that we are modeling a variational posterior instead of , where is the input image, is the known mean of the segmentation label, and is the variance of the segmentation label. Denote . We calculate the posterior from two separate networks, where one network accounts for the mean and the other network models the variance . The latent features in the prior blocks should follow a normal distribution generated by the posterior blocks . During training, we directly sample from the posterior and train the normal distribution generated by the prior to be close to the one from the posterior. When sampling, we did exactly the same as the previous work did: sampling the latent features from the normal distribution modeled by the prior. The training and sampling process is illustrated in Figure 3.

In this particular application, we utilize a mean-squared-error-based custom loss function as a way to deal with the continuous values and unique meaning of soft labels. To deal with overfitting to the background, we only care about the pixels that have values over a certain threshold , which we set to . Lower weights are assigned to pixels where absolute error is within the known standard deviation, since we are less sure about the value in the label where standard deviation is larger. Therefore, the loss function can be expressed as the following: where

4. Experiments

The ultrasound imaging is performed with a UF-760AG Fukuda Denshi diagnostic ultrasound imaging machine, on both a chicken breast and an anthropomorphic phantom containing simulated vessels in a tissue medium produced by Advanced Medical Technologies. We used a linear transducer with 51 mm scanning width. It was set to 12 MHz with an imaging depth of 4 cm and a gain value of 21 dB. The dimension of each image is pixels, and the pixel pitch is 0.10 mm horizontally and 0.075 mm vertically. The images were then resized to . All the networks are trained on a single Nvidia Titan RTX for 15 epochs with horizontal flipping, Gaussian blurring (, where is taking the original image) and Gamma transform () as data augmentation, adam [29] as optimizer with a learning rate of . We note that one training epoch consists of all possible 18 (2 for flipping for Gaussian blurring for Gamma transform) combinations of data augmentation. Then, we picked the best model based on the validation results. There are 401, 100, and 40 images in the training, validation, and test set, respectively. In other words, 7218 training images were used to train each epoch due to data augmentation. Our dataset is split in this ratio due to the sophisticated labeling of the test set. The labeling in this work utilized the labeling tool Labelme [30]. The training of the first and second network in our framework both took about 90 minutes. The transform function was performed on Intel Core i5-8279U which takes about 80 minutes to process 401 images.

The training set is over-labeled (shown in the second images of Figure 1(a)) to show that our algorithm is good at dealing with inaccurate labels as well as with handling false positives in the training data. Since there is not a clear definition of the boundaries and quantification of reverberation artifacts, traditional evaluation metrics, e.g., Dice coefficient and Intersection over Union (IoU) would not work here. Therefore, we carefully label the test set in the following way for better evaluation. An example of our test set labeling is shown in the left image of Figure 1(b), where the yellow and pink regions indicate the possible areas containing needles and artifacts, respectively, green and white regions represent the first and second reverberations that we are confident in, the bright blue circles represent the patch between reverberation with no artifacts, the dark blue represents the needle patches that we are confident in, and the red patches are the regions with identifiable fuzzyartifacts. We define the artifact and needle pixels with values greater than 0.05 and 0.5 as positive, respectively. The labeling is designed to show that in our result, (1) the pixels that definitely belong to needles and to the significant artifacts are segmented, (2) the pixels that are between the reverberations and not part of the artifacts are not segmented, (3) the fuzzy artifacts are segmented with lower values, and (4) very few false positives are included.

Our notations are summarized in Figure 1(c). We denote the ratio between positive artifact and needle counts in the labeled first, second reverberations and needle and the total pixels counts in those regions as FAR (first artifact rate), SAR (second artifact rate), and NR (needle rate), respectively, and the average value for positives in those artifact patches as FAA (first artifact average) and SAA (second artifact average). We also denote the false positive rate of artifacts and needles as AFPR (artifact false positive rate) and NFPR (needle false positive rate), and the average value of the false positives as AFPA (artifact false positive average) and NFPA (needle false positive average). We also calculate the average of the nonartifact region between reverberations as NAA (nonartifact average). We further define the average positive artifact value in the identifiable fuzzy artifact patches as IFAA (identifiable fuzzy artifact average).

The following paragraph describes the intuition behind the proposed measurements. By design, FAR, SAR, and NR measure whether the segmentation algorithm is able to segment the pixels that absolutely belong to the first reverberation, the second reverberation, or the needles, respectively, so the closer they are to , the better the segmentation algorithm performs. Next, we examine the FAA, SAA, NAA, and IFAA measurements together. Our intuition is that the second reverberation (white region in Figure 1(b)) is deeper in position and darker in brightness than the first one (green region in Figure 1(b)), so the second reverberation corrupts the “real” values of the pixels less. The reverberations below the second (identifiable fuzzy artifacts, red region in Figure 1(b)) are usually more ambiguous and darker, so they should corrupt the pixel values less. To add to that, the nonartifact patch between reverberations (bright blue region in Figure 1(b)) should have lower values than both the reverberations above and underneath it, as those pixels should not be a part of the reverberation artifacts. As a result, FAA should be larger than SAA, SAA should be larger than IFAA, and IFAA should be significantly larger than NAA, because the output values describe how much the artifacts corrupt the pixels. In other words, a good quantification model should produce . In terms of AFPR and NFPR, anything outside the pink and yellow regions in Figure 1(b) should be false positives, as they are clearly not needles or artifacts. Therefore, the AFPR and NFPR values should be zero in a ideal segmentation. Even though there might be false positives in the segmentation method, different false positives might have different consequences on the downstream tasks. False positives with really small quantification values would not be as bad as the ones with large output values. Therefore, we also investigate AFPA and NFPA, whose values represent the average value of the false positives in the output. The lower these values are, the less impact the false positives would have on the subsequent algorithms.

Before any quantitative experiments, we want to show some of the intermediate results from the transform function in order to justify the operation. As shown in Figure 4(a), it can be seen that the first part of the transform function removes the false positives in the output of the first network and the second part of the network creates the exponential decay while picking up the bright artifacts in regions that are further down, while the third part differentiates the reverberation from the background.

Furthermore, we also want to show that the hyperparameters are not very sensitive with a few examples shown in Figure 4(b), where we show the intermediate results of the transform function. We multiply all the hyperparameters in the transform function by a coefficient of 0.67 and 1.5 and compare the results against the original values we set. This simple experiment should be able to show that the hyperparameters are not sensitive to changes since these hyperparameters are independent of each other. It can be observed that there is not much difference in the final output of the transform function. Thus, the final results are not very sensitive to the choice of hyperparameters.

In our first quantitative experiment, we compared our results against those from expert labels, U-Net [10], USVS-Net [11], and HPU-Net [19] to show that our proposed algorithm is superior to these current segmentation networks and expert labels for needle-and-reverberation-artifact segmentation. The experts tend to label the entire region that might contain artifacts, but they do not differentiate between the reverberations. The quantitative results can be found in Figure 5(a). As can be seen in the table, our approach performs similarly to HPU-Net at binary-thresholded segmentation of the significant artifacts and needles, and our approach performs better than the other two methods. Our goal, however, is more than just binary segmentation. Since the value in the output represents how much the artifact affects the underlying “real” pixel values, and there should be a decay in the intensity of the artifacts, the average value of first artifacts should be larger than that of second artifacts. Together, they also should not be too close to 1 as they normally do not completely obscure the real tissue there. Our method is the only algorithm that has achieved such results. Our algorithm also limits the false positive rate to around 0.02, whereas the best results achieved by other algorithms are 0.084 and 0.04, for artifacts and needles, respectively. Our algorithm also does a good job in differentiating between reverberations, since the average value in nonartifact patches between reverberations is 0.042, which is the lowest by far, compared to 0.506, 0.431, and 0.136 from U-Net, USVS-Net, and HPU-Net, respectively. The IFFA value further indicates that our network learns the decay in the artifact intensity. Sample results of needle segmentation and artifact segmentation can be found in right two image of Figure 1(b). Qualitative results of artifact segmentation are shown in Figure 5(c). Our method clearly does the best in differentiating between the actual artifacts and the patches around them, while simultaneously assigning lower values to less significant artifacts.

Our second quantitative experiment is done to show that our proposed second network in the pipeline works better than other networks. In this experiment, we test out U-Net, HPU-Net, and USVS-Net as the second network. We found that USVS-Net overfits to the background too much, so it is not included in the quantitative comparison. The quantitative evaluation can be found in Figure 5(b). It shows that our method is as good as HPU-Net in detecting the strong artifacts and needles, since the FAR, SAR, and NR values are similar. It also illustrates that our method works better in differentiating between the artifacts and the patches between reverberations, as well as giving higher values to the first reverberation than to the second one. Lastly, our methods also produce fewer false positives and has lower values in the false-positive patches.

Our next experiment is to show that we can further improve our results with additional unlabeled data, and that if the training data of the first network is unavailable, using only nonlabeled data also would not hurt the performance much. As for the unlabeled data used in this study, there are 480 unlabeled images from the same phantom and chicken breast as the labeled training data, but the unlabeled data are from different trials than the labeled ones. We train the first network with the same training data in this experiment, and then we define 3 different scenarios (1) we only have the labeled data to train the second network, (2) we have both labeled and unlabeled data for the second network, and (3) for some reason, the labeled training data is not available any longer so we only have the unlabeled data to train the second network. The comparison can be found in Figure 6(a), where it shows that using both labeled and unlabeled data slightly decreases the false positives of needles and also lowers the values between reverberations. On the other hand, using only unlabeled data produces only slightly more false positives.

To provide further validation of our algorithm on different sets of plausible hand labels, we apply aleatoric uncertainty maps obtained from a Monte Carlo dropout-based Bayesian 3-D convolutional neural network similar to [31]. In doing so, we are able to simulate the wave-like intensity decrease of needle artifacts in a realistic manner. For modeling the aleatoric uncertainty in the artifact segmentation region, we apply the model using the notation from [31], where represents softmax outputs evaluated with a set of random weights , represents each run of the Monte Carlo sampling, and represents the Kronecker matrix product. In an attempt to obtain more articulate hand labels, we apply the following steps on top of our hand labels: (1) use the labels to mask only the relevant regions in the aleatoric uncertainty map, (2) take the inverse of the uncertainty map, and (3) determine a local threshold for smaller patches within each labeled region and use that for deciding whether to keep or remove the labels. A sample of the aleatoric uncertainty map and uncertainty-pruned label are shown in Figure 6(c). We compare the results from networks trained on the original hand labels against the ones trained on the refined labels. The results are shown in Figure 6(b). Our algorithm still does a decent job on the uncertainty-pruned labels, even though the performance is slightly worse than on original hand labels. The intuition is that although the uncertainty-pruning removes some of the labels in the spaces between reverberations. Our algorithm is designed to differentiate the reverberations from the spaces between reverberations to compensate for the over-labeling, so it does not make much difference in that regards. However, the pruning step removes some of the true positives as well, which could not be learned in the networks, slighting hurting the performance. The algorithm still does a decent job on a different set of labels, while performing slightly better on the over-labeled annotations.

However, our method sometimes fails when there are no obvious artifacts. Shown in Figure 6(d), on the right side of the two images, the reverberation artifacts are either not present or not obvious and both the segmentation of the needles and artifacts are not able to detect anything in the region. We suspect that the network learns to segment the needles by looking at the artifacts and not the other way around, resulting in needles undetected when there are no artifacts or when artifacts are less visible. This is probably due to the fact that most of the reverberations artifacts in our training dataset are very visible, so the network would have bias towards this scenario. Besides, when there are patterns in the image that look similar to those of reverberation artifacts, the network tends to generate small values for those pixels on the artifact mask.

5. Applications

In this section, we show that our reverberation artifact segmentation and quantification method is useful in several downstream tasks, such as multiview image compounding and vessel segmentation. All of the imaging parameters are set exactly as they are in Section 4.

5.1. Multiview Image Compounding

The goal of multiview image compounding is to take the information from images taken at different viewpoints and reconstruct the true underlying structure. This task is important in ultrasound, for which images are path-dependent leading to certain structures being seen in an image taken from one viewpoint but not seen in a different image containing the same structure from a different viewpoint. However, the same object can cast reverberation artifacts in different directions in images from different viewpoints, making multiview compounding a challenging task. In compounding, we want to preserve the real objects and structures in the compounded image but remove the artifacts at the same time.

We apply our reverberation artifact segmentation and quantification algorithm in an image compounding task, to remove the reverberation artifacts in the compounded image. We propose to use a simple compounding algorithm that considers our reverberation artifact segmentation and quantification. For example, we can take two viewpoints—even though the simple algorithm can be easily expanded to more than two viewpoints. Denote the image from viewpoint as , our soft segmentation mask for image as and the compounded image as . We define a confidence map that for each pixel depicts the extent to which artifact corruption is absent, thus . For every pixel , if , then we set ; if , then we set ; else, we set , where is a confidence threshold which we set as in this paper. In other words, if the confidence from one viewpoint is significantly higher, we take the pixel value from that viewpoint. Otherwise, we take the maximum image intensity across different viewpoints.

As an experiment, we take pairs of ultrasound images from orthogonal viewpoints with a needle inserted into the phantom. Since we are performing the experiment on a square phantom, it is easy to make sure the viewpoints are orthogonal with free-hand imaging. We compare our simple method against existing compounding algorithms, such as average (avg) [32], maximum (max) [33], and uncertainty-based fusion (UBF) [34]. The results are shown in Figure 7. We can observe that our simple compounding method preserves the vessels and needles as good as max [33] while nearly removing all the reverberation artifacts.

5.2. Vessel Segmentation

Ultrasound vessel segmentation is important for diagnostic purposes and for guiding needle insertion by both humans and robotic systems [35]. However, needle reverberation artifacts could occlude the structures or objects of interest [3]. Such artifacts could also affect the performance of vessel segmentation [5] and needle tracking algorithms [4].

We believe our reverberation artifact segmentation and quantification algorithm could help with vessel segmentation when reverberation artifacts produced by needles are present in the image. During convolution, segmentation networks should pay less attention to the artifact pixels, so we propose to use the soft segmentation results from our algorithm as the masks in partial convolution [36]. Although the partial convolution method is built for image inpainting, the idea to mask certain regions during convolution is useful for our purpose, since we do not want the network to treat the artifact pixels equally as other pixels.

In this work, we use U-Net [10] as the backbone for the vessel segmentation. We compare our artifact-segmentation-and-quantification-based partial convolutional U-Net (partial U-Net) against a normal U-Net. For fair comparison, we have 5 layers in the encoder side, with 32, 64, 128, 256, and 512 filters for both the partial U-Net and the normal U-Net, and the decoder side is exactly symmetric to the encoder. We use Adam optimizer [29] with a learning rate of and binary cross entropy as loss function. There are 481 images in the training set and 97 images in both validation and test set. The only augmentation technique used during training is horizontal flipping.

Shown in the qualitative results in Figure 8(a), the results by normal U-Net create some false positives at the top left corner, where the needles are present. The needles together with the image boundaries or some other artifacts confuse the normal U-Net, since they look like vessel boundaries to some extent, while the partial U-Net does not generate false positive there. Quantitative results can be found in Figure 8(b). The precision for artifact-masked partial U-Net is notably higher than normal U-Net, indicating that our reverberation artifact quantification helps reduce the false positives in vessel segmentation. By other metrics, the partial U-Net also outperforms normal U-Net with the help of our soft reverberation artifact segmentation.

6. Discussion

Our proposed algorithm outperforms other existing segmentation algorithms in segmenting and quantifying reverberation artifacts. Our weakly- and semisupervised reverberation segmentation framework is able to take as input inaccurate human labels and predict accurate segmentation maps as well as quantitatively estimate how much the artifacts affect the pixel values. It works even better on over-labeled annotations. Our quantification of reverberation artifacts is useful in image compounding and for improving vessel segmentation results. Our probabilistic outputs could also potentially be used in ultrasound image uncertainty measurements, ultrasound image quality evaluation, and reverberation artifact removal. Lastly, this algorithm is also directly related to robot-controlled needle insertion, in which our work can be used to guide where to insert the needle, monitor the tissue under the needle, and determine the quality of the image.

Data Availability

The code and ultrasound data used to support the findings of this study are available at


Please note that patents may be pending, and patent rights are not included in the CC by license.

Conflicts of Interest

Alex Ling Yu Hung is currently a student at the Department of Computer Science and the Department of Radiological Sciences, UCLA. Edward Chen is currently a student at Computer Science Department, Stanford University. The work was done when they were students at Carnegie Mellon University. They have no other affiliations or relationship with other organizations. John Galeotti has financial, consulting, management, and advisory relationship with Activ Surgical, which (currently) deals with surgical video rather than ultrasound and Elio AI, Inc., which is developing ultrasound AI and is licensing this technology. Galeotti serves on the advisory board for Activ Surgical, Inc., and he is a founder and director for Elio AI, Inc.


This present work was sponsored in part by the US Army Medical contracts W81XWH-19-C0083, W81XWH-19-C0101, and W81XWH-19-C-0020 and by the funding from the state of Pennsylvania C000072473. We would like to thank our collaborators at the University of Pittsburgh, Triton Microsystems, Inc., Accipiter Systems, Inc., Sonivate Medical, Inc., and URSUS Medical, LLC.


  1. M. Ziskin, D. Thickman, N. Goldenberg, M. Lapayowker, and J. Becker, “The comet tail artifact,” Journal of Ultrasound in Medicine, vol. 1, no. 1, pp. 1–7, 1982. View at: Publisher Site | Google Scholar
  2. R. M. Kirberger, “Imaging artifacts in diagnostic ultrasound—a review,” Veterinary Radiology & Ultrasound, vol. 36, no. 4, pp. 297–306, 1995. View at: Publisher Site | Google Scholar
  3. J. Mohebali, V. I. Patel, J. M. Romero et al., “Acoustic shadowing impairs accurate characterization of stenosis in carotid ultrasound examinations,” Journal of vascular surgery, vol. 62, no. 5, pp. 1236–1244, 2015. View at: Google Scholar
  4. X. Xu, Y. Zhou, X. Cheng, E. Song, and G. Li, “Ultrasound intima-media segmentation using Hough transform and dual snake model,” Computerized Medical Imaging and Graphics, vol. 36, no. 3, pp. 248–258, 2012. View at: Publisher Site | Google Scholar
  5. G. Reusz, P. Sarkany, J. Gal, and A. Csomos, “Needle-related ultrasound artifacts and their importance in anaesthetic practice,” British Journal of Anaesthesia, vol. 112, no. 5, pp. 794–802, 2014. View at: Publisher Site | Google Scholar
  6. P. C. Tay, S. T. Acton, and J. Hossack, “A transform method to remove ultrasound artifacts,” in 2006 IEEE Southwest Symposium on Image Analysis and Interpretation, pp. 110–114, Denver, CO, USA, March 2006. View at: Publisher Site | Google Scholar
  7. P. C. Tay, S. T. Acton, and J. A. Hossack, “A wavelet thresholding method to reduce ultrasound artifacts,” Computerized Medical Imaging and Graphics, vol. 35, no. 1, pp. 42–50, 2011. View at: Publisher Site | Google Scholar
  8. K. K. Win, J. Wang, C. Zhang, and R. Yang, “Identification and Removal of Reverberation in Ultrasound Imaging,” in 2010 5th IEEE Conference on Industrial Electronics and Applications, pp. 1675–1680, Taichung, Taiwan, June 2010. View at: Publisher Site | Google Scholar
  9. N. E. Bylund, M. Andersson, and H. Knutsson, “Interactive 3d filter design for ultrasound artifact reduction,” in IEEE International Conference on Image Processing, Genova, Italy, Sept.2005. View at: Publisher Site | Google Scholar
  10. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, Cham, 2015. View at: Google Scholar
  11. T. S. Mathai, V. Gorantla, and J. Galeotti, “Segmentation of vessels in ultra high frequency ultrasound sequences using contextual memory,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 173–181, Springer, Cham, 2019. View at: Google Scholar
  12. O. Oktay, J. Schlemper, L. L. Folgoc et al., “Attention u-net: Learning where to look for the pancreas,” 2018, View at: Google Scholar
  13. A. G. Roy, N. Navab, and C. Wachinger, “Concurrent spatial and channel `squeeze & excitation'in fully convolutional networks,” in International conference on medical image computing and computer-assisted intervention, pp. 421–429, Springer, Cham, 2018. View at: Google Scholar
  14. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, Salt Lake City, Utah, 2018. View at: Google Scholar
  15. L. Han, Y. Huang, H. Dou et al., “Semisupervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network,” Computer Methods and Programs in Biomedicine, vol. 189, article 105 275, 2020. View at: Publisher Site | Google Scholar
  16. A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” Advances in neural information processing systems, vol. 30, pp. 5574–5584, 2017. View at: Google Scholar
  17. T. Nair, D. Precup, D. L. Arnold, and T. Arbel, “Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation,” Medical Image Analysis, vol. 59, article 101557, 2020. View at: Google Scholar
  18. S. Kohl, B. Romera-Paredes, C. Meyer et al., “A probabilistic u-net for segmentation of ambiguous images,” Advances in Neural Information Processing Systems, vol. 31, pp. 6965–6975, 2018. View at: Google Scholar
  19. S. A. Kohl, B. Romera-Paredes, K. H. Maier-Hein et al., “A hierarchical probabilistic u-net for modeling multi-scale ambiguities,” 2019, View at: Google Scholar
  20. C. F. Baumgartner, K. C. Tezcan, K. Chaitanya et al., “Phiseg: capturing uncertainty in medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 119–127, Springer, Cham, 2019. View at: Publisher Site | Google Scholar
  21. D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework,” International Journal of Computer Vision, vol. 127, no. 4, pp. 363–380, 2019. View at: Publisher Site | Google Scholar
  22. D. Zhang, W. Zeng, J. Yao, and J. Han, “Weakly supervised object detection using proposal- and semantic-level relationships,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, p. 1, 2020. View at: Publisher Site | Google Scholar
  23. D. Zhang, J. Han, Y. Zhang, and D. Xu, “Synthesizing supervision for learning deep saliency network without human annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 7, pp. 1755–1769, 2020. View at: Publisher Site | Google Scholar
  24. D. Pathak, P. Krahenbuhl, and T. Darrell, “Constrained Convolutional Neural Networks for Weakly Supervised Segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1796–1804, Santiago, Chile, Dec. 2015. View at: Google Scholar
  25. W. Shimoda and K. Yanai, “Self-supervised difference detection for weakly-supervised semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5208–5217, Seoul, Korea, 2019. View at: Google Scholar
  26. J. Peng, H. Kervadec, J. Dolz, I. B. Ayed, M. Pedersoli, and C. Desrosiers, “Discretely-constrained deep network for weakly supervised segmentation,” Neural Networks, vol. 130, pp. 297–308, 2020. View at: Publisher Site | Google Scholar
  27. Q. Meng, M. Sinclair, V. Zimmer et al., “Weakly supervised estimation of shadow confidence maps in fetal ultrasound imaging,” IEEE Transactions on Medical Imaging, vol. 38, no. 12, pp. 2755–2767, 2019. View at: Publisher Site | Google Scholar
  28. V. R. Amin, Ultrasonic Attenuation Estimation for Tissue Characterization, 1989.
  29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, View at: Google Scholar
  30. K. Wada, “labelme: Image Polygonal Annotation with Python,” 2016, View at: Google Scholar
  31. Y. Kwon, J.-H. Won, B. J. Kim, and M. C. Paik, “Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation,” Computational Statistics & Data Analysis, vol. 142, article 106816, 2020. View at: Publisher Site | Google Scholar
  32. J. W. Trobaugh, W. D. Richard, K. R. Smith, and R. D. Bucholz, “Frameless stereotactic ultrasonography: Method and applications,” Computerized Medical Imaging and Graphics, vol. 18, no. 4, pp. 235–246, 1994. View at: Publisher Site | Google Scholar
  33. A. Lasso, T. Heffter, A. Rankin, C. Pinter, T. Ungi, and G. Fichtinger, “Plus: open-source toolkit for ultrasound-guided intervention systems,” IEEE transactions on biomedical engineering, vol. 61, no. 10, pp. 2527–2537, 2014. View at: Publisher Site | Google Scholar
  34. C. S. Z. Berge, A. Kapoor, and N. Navab, “Orientation-driven ultrasound compounding using uncertainty information,” in International conference on information processing in computer-assisted interventions, pp. 236–245, Springer, Cham, 2014. View at: Google Scholar
  35. J. Guerrero, S. E. Salcudean, J. A. McEwen, B. A. Masri, and S. Nicolaou, “Real-Time vessel segmentation and tracking for ultrasound imaging applications,” IEEE Transactions on Medical Imaging, vol. 26, no. 8, pp. 1079–1090, 2007. View at: Publisher Site | Google Scholar
  36. G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 85–100, Munich, Germany, 2018. View at: Google Scholar

Copyright © 2022 Alex Ling Yu Hung et al. Exclusive Licensee Suzhou Institute of Biomedical Engineering and Technology, CAS. Distributed under a Creative Commons Attribution License (CC BY 4.0).

 PDF Download Citation Citation
Altmetric Score