(a) A table showing the benefit from adding labeled or unlabeled data. We note that the first network is trained with the same training data in this experiment, and the unlabeled data are from the same subject as the training data but from different trials. (b) A table showing the robustness of our method across various binary labels. (c) Example of aleatoric uncertainty map and uncertainty-pruned label. Left to right: input image, aleatoric uncertainty map, hand label, and uncertainty-pruned label. (d) Example of failure cases. Left to right: input images, hand labels (where needles are gray and artifacts are white), needle outputs, and artifact outputs. It can be seen that when artifacts are less visible, the model fails to detect both needles and artifacts. Besides, small values are present in the artifact output where there are patterns that look similar to reverberation artifacts.