Get Our e-AlertsSubmit Manuscript
Intelligent Computing / 2022 / Article

Research Article | Open Access

Volume 2022 |Article ID 9818965 | https://doi.org/10.34133/2022/9818965

Yijie Zhang, Luzhe Huang, Tairan Liu, Keyi Cheng, Kevin de Haan, Yuzhu Li, Bijie Bai, Aydogan Ozcan, "Virtual Staining of Defocused Autofluorescence Images of Unlabeled Tissue Using Deep Neural Networks", Intelligent Computing, vol. 2022, Article ID 9818965, 13 pages, 2022. https://doi.org/10.34133/2022/9818965

Virtual Staining of Defocused Autofluorescence Images of Unlabeled Tissue Using Deep Neural Networks

Received05 Jul 2022
Accepted15 Sep 2022
Published27 Oct 2022

Abstract

Deep learning-based virtual staining was developed to introduce image contrast to label-free tissue sections, digitally matching the histological staining, which is time-consuming, labor-intensive, and destructive to tissue. Standard virtual staining requires high autofocusing precision during the whole slide imaging of label-free tissue, which consumes a significant portion of the total imaging time and can lead to tissue photodamage. Here, we introduce a fast virtual staining framework that can stain defocused autofluorescence images of unlabeled tissue, achieving equivalent performance to virtual staining of in-focus label-free images, also saving significant imaging time by lowering the microscope’s autofocusing precision. This framework incorporates a virtual autofocusing neural network to digitally refocus the defocused images and then transforms the refocused images into virtually stained images using a successive network. These cascaded networks form a collaborative inference scheme: the virtual staining model regularizes the virtual autofocusing network through a style loss during the training. To demonstrate the efficacy of this framework, we trained and blindly tested these networks using human lung tissue. Using 4× fewer focus points with 2× lower focusing precision, we successfully transformed the coarsely-focused autofluorescence images into high-quality virtually stained H&E images, matching the standard virtual staining framework that used finely-focused autofluorescence input images. Without sacrificing the staining quality, this framework decreases the total image acquisition time needed for virtual staining of a label-free whole-slide image (WSI) by ~32%, together with a ~89% decrease in the autofocusing time, and has the potential to eliminate the laborious and costly histochemical staining process in pathology.

1. Introduction

Histological analysis is considered to be the gold standard for tissue-based diagnostics. In the histological staining process, the tissue specimen is first sliced into 2–10 μm thin sections and then fixed on microscopy slides. These slides are stained in a process that dyes the specimen with markers by binding, e.g., chromophores to different tissue constituents, revealing the sample’s cellular and subcellular morphological information under a microscope [1]. However, the traditional histological staining is a costly and time-consuming procedure. Some types of stains, such as immunohistochemical staining (IHC), require a specialized laboratory infrastructure and skilled histotechnologists to perform tissue preparation steps.

The ability to virtually stain microscopic images of unlabeled tissue sections was demonstrated through deep neural networks, avoiding the laborious and time-consuming histochemical staining processes. These deep learning-based label-free virtual staining methods can use different input imaging modalities, such as autofluorescence microscopy [24], hyperspectral imaging [5], quantitative phase imaging (QPI) [6], reflectance confocal microscopy [7], and photoacoustic microscopy [8], among others [911]. Virtual staining, in general, has the potential to be used as a substitute for histochemical staining, providing savings in both costs and tissue processing time. It also enables the preservation of tissue sections for further analysis by avoiding destructive biochemical reactions during the chemical staining process [12].

In all these label-free virtual staining methods, the acquisition of in-focus images of the unlabeled tissue sections is essential. In general, focusing is a critical but time-consuming step in scanning optical microscopy used to correct focus drifts caused by mechanical or thermal fluctuations of the microscope body and the nonuniformity of the specimen’s topology [13]. Focus map surveying, the most adopted autofocusing method for whole slide imaging of tissue sections, creates a prescan focus map by sampling the focus points in a pattern [14]. At each focus point of this pattern, the autofocusing process captures an axial stack of images, from which it extracts image sharpness measures at different axial depths and locates the best focal plane using an iterative search algorithm [1517]. To acquire finely-focused whole slide images (WSI) of label-free tissue sections and generate high-quality virtually stained images, standard virtual staining methods demand many focus points across the whole slide area with high focusing precision to form an accurate prescan focus map. However, this fine focus search process is time-consuming to perform across a WSI and might introduce photodamage and photobleaching [18] on the tissue sample. To alleviate some of these problems, various hardware-based approaches were developed to reduce focusing and scanning time for microscopic imaging [1922]. However, such hardware modifications require additional components, cost, and labor, and may not be always compatible with the existing microscope hardware already deployed in clinical labs. Recent works in optical microscopy have explored the use of deep learning for online autofocusing [2326], offline autofocusing [27], and depth-of-field (DoF) enhancement [2830]. Despite all this progress, integrating deep learning-based autofocusing methods with virtual staining of unstained tissue remains to be explored.

Here, we demonstrate a deep learning-based fast virtual staining framework that can generate high-quality virtually stained images using defocused autofluorescence images of label-free tissue. As shown in Figure 1, this framework uses an autofocusing neural network (Deep-R) [27] to digitally refocus the defocused autofluorescence images. Then, a virtual staining network is used to transform the refocused images into virtually stained images, matching the brightfield microscopic images of the histochemically stained tissue (ground truth). Instead of training the two cascaded networks (i.e., the autofocusing and virtual staining neural networks) separately, we first trained the virtual staining network and used the learned virtual staining model to regularize the Deep-R network using a style loss during the training stage, which formed a collaborative inference scheme.

To demonstrate the success of this deep learning-based fast virtual staining framework, we trained the networks using human lung tissue sections. Through blind testing on coarsely-focused autofluorescence images of unlabeled lung tissue sections, the fast virtual staining framework successfully generated virtual H&E stained images matching the staining quality of the standard virtual staining framework that used in-focus autofluorescence images of the same samples. These coarsely-focused autofluorescence images of unlabeled tissue were acquired with 4× fewer focus points and 2× lower focusing precision than their finely-focused counterparts used in the standard virtual staining framework; this resulted in a ~32% decrease in the total image acquisition time (per WSI) and a ~89% decrease in autofocusing time using a benchtop scanning optical microscope. With its capability to stain defocused images of unstained tissue, we believe this virtual staining method will save time without sacrificing the image quality of the virtually stained images and be highly useful for histology.

2. Results

The standard virtual staining framework [2] uses in-focus autofluorescence microscopic images of label-free tissue to digitally stain the corresponding images. To generate high-quality virtually stained images using defocused autofluorescence images, we first use a Deep-R [27] network for virtual autofocusing, followed by the virtual staining of the resulting refocused autofluorescence images, as shown in Figures 1 and 2(a). To achieve accurate virtual staining on the refocused autofluorescence images (the output of Deep-R), the trained virtual staining model was used to regularize the Deep-R network during its training by introducing a style loss, which minimizes the difference between multiscale virtual staining features of the Deep-R output and the target (see the Materials and Methods section for details). In other words, the presented defocused image virtual staining framework does not involve a simple cascade of two different, separately trained neural networks, one following another.

We trained this defocused image virtual staining framework with a dataset of 5,832 human lung tissue fields-of-views (FOVs), each of which had pixels, imaged using a 40×/0.95 NA objective lens. As shown in Figure 1(b), to train the Deep-R network, it was fed with accurately paired image data consisting of (1) autofluorescence images of label-free tissue (including DAPI and TxRed filter channels) acquired at different axial defocus distances (ranging from -2 μm to 2 μm with an axial step size of 0.5 μm, as illustrated in Figure 1(b)) as inputs and (2) the corresponding in-focus DAPI and TxRed autofluorescence images as targets. During the training of the Deep-R network, the input autofluorescence images (defocused) in each batch were randomly picked from the z-stacks. To train the virtual staining network, registered pairs of in-focus autofluorescence images (DAPI and TxRed channels) captured before the histochemical staining and the brightfield images of the same tissue sections after their histochemical staining is used (see Figure 1(a) and the Materials and Methods section). The two networks (Deep-R and the successive virtual staining network) are linked together by a style loss during the training (see the Materials and Methods section), forming a collaborative inference scheme.

Once trained, the defocused image staining framework can generate high-quality virtually stained images using defocused autofluorescence microscopic images of label-free tissue as its input; this capability enables using fewer autofocus points and lower focusing precision at each focus point during the WSI scanning process. To demonstrate its success, we blindly tested and compared the performance of the standard in-focus image virtual staining framework and our defocused image virtual staining framework on 2081 unique image FOVs (each image with pixels) from ten new patients that were never seen by the network before. For the standard in-focus virtual staining framework, we acquired finely-focused whole slide autofluorescence images of the test tissue sections (~23 mm2 of sample area per patient on average) by using focus points at 8.5% of the total acquired image FOVs, and a ±0.35 μm focusing precision at each focus point to form a fine focus map before the WSI scanning. On the other hand, for the defocused virtual staining framework, we used a smaller number of focus points that only took up 2.1% of the total acquired image FOVs, and reduced the focusing precision to ±0.83 μm for each focus point to acquire coarsely-focused whole slide autofluorescence images, as illustrated in Figure 2(a). These changes reduced the autofocusing time (per WSI) from 9.8 minutes to 1.1 minutes and the total image acquisition time from 27.1 minutes to 18.4 minutes, achieving an 88.8% decrease in the autofocusing time and a 32.1% decrease in the entire image acquisition process per WSI (see the Materials and Methods section). Because of the coarse focus map, the acquired autofluorescence image FOVs exhibit various defocus distances for each WSI. Figure 2(b) presents the zoomed-in regions of the acquired autofluorescence images and the generated virtually stained images. Both frameworks (in-focus vs. defocused image virtual staining networks) can generate high-quality staining that presents a good match to the corresponding histochemically stained ground truth images. Although the fast virtual staining framework took defocused autofluorescence images as its input, with an apparent loss of sharpness and contrast compared to their finely-focused counterparts, it can still achieve comparable virtual staining performance to the standard network that used in-focus input images.

To further showcase the ability of the presented framework, we compared the performance of the standard in-focus image virtual staining network (termed framework 1) and the fast, defocused image virtual staining network (termed framework 2) using the same coarsely-focused autofluorescence images as input. Framework 1 directly applies the virtual staining network on these coarsely-focused autofluorescence images of label-free tissue sections, without using the Deep-R network for refocusing, whereas framework 2 uses the Deep-R network for refocusing of defocused autofluorescence images, followed by the virtual staining. In this comparison, the results of the standard virtual staining using finely-focused image FOVs were also used as a baseline, which we termed framework 3. Figure 3 reports a detailed comparison of these three frameworks’ inference on various FOVs of different lung tissue sections, never used during the training phase. Using defocused autofluorescence images, framework 1 presented a noticeable sharpness and contrast degradation in its virtually stained images (Figures 3(a)–3(e)). Furthermore, it also caused hallucinations of nuclei and red blood cells and related artifacts that cannot be seen in the results of framework 3 (Figures 3(k)–3(o)). In contrast, framework 2 (Figures 3(f)–3(j)) successfully avoids these hallucinations and artifacts, and produces sharp virtually stained images that are in-focus, with a good match to the results of framework 3, further confirming the conclusions reported in Figure 2(b).

We further quantified the virtual staining performance by calculating the peak signal-to-noise ratio (PSNR) and the structural similarity index (SSIM) [31] between (1) framework 1 or framework 2 and (2) the corresponding baseline results in framework 3, as shown in Figures 3(a)–3(j). Compared to the inference of framework 1, both metrics (PSNR and SSIM) were significantly improved by the reported defocused image virtual staining method (framework 2), demonstrating its robustness to defocused image inputs.

Since the chromatic contrast among different tissue components serves as one of the most significant features/cues for pathologists to interpret tissue sections, we also quantified the color distribution of the virtually stained images by converting them from RGB to YCbCr color space and then plotting the histograms of Cb and Cr channels, as shown in Figures 3(p)–3(t). The two chroma components (Cb and Cr) can present the blue and red information of the virtually stained images, respectively, reflecting the staining quality of H&E where hematoxylin stains nuclei a purplish-blue, and eosin stains the extracellular matrix and cytoplasm pink [32]. For both Cb and Cr channels, the distributions of framework 1 image inference have obvious shifts compared to the other two frameworks (2 and 3). In contrast, the image inference results of framework 2 agree well with the distributions of framework 3, further validating the success of our defocused image virtual staining framework. It is also worth noting that the defocused image virtual staining has performance degradation on autofluorescence images with a large defocus distance (e.g., with an axial defocus amount of >2.5 μm, see Figures 3(i)–3(j)), which is not surprising since this large defocus lies outside of its training range (±2 μm).

Next, we used the differences in the YCbCr color space to further quantify the relationship between the virtual staining performance and the axial image defocus distance (see Figure 4). To conduct this analysis, we acquired z-stacks of autofluorescence images of label-free lung tissue sections (over an axial range of -3 μm to 3 μm with a step size of 0.5 μm, see the Materials and Methods section), resulting in 562 unique image FOVs, each with 512 × 512 pixels. Then, we separately tested framework 1 and framework 2 on the acquired autofluorescence images at different axial defocus distances. The resulting virtually stained images were used to compute the absolute YCbCr color difference with respect to the virtual staining results of framework 3 that used in-focus autofluorescence images (i.e., using of the same FOVs). The average absolute color differences (of the 562 image FOVs) are plotted as a function of the axial defocus distance, as shown in Figure 4(a). These results reveal that framework 2 performs similarly to framework 1 when the input autofluorescence image has a small defocus distance (e.g., <1 μm). On the other hand, when the input autofluorescence images have a large defocus distance of ≥1 μm, framework 2 has significantly better performance than framework 1. We also performed paired upper-tailed t tests (see the Materials and Methods section) between the two frameworks for each color channel and plotted the resulting values as a function of the axial defocus distance to further illustrate the performance improvement of our defocused image virtual staining framework, as presented in Figure 4(b). The t test results demonstrate that the defocused image virtual staining network has a statistically significant improvement in virtual staining performance over the standard virtual staining network at a defocus distance of ± ~1 μm or larger.

3. Discussion

We demonstrated a deep learning-based framework that decreases the total image acquisition time needed for virtual staining of a label-free WSI by ~32%, also resulting in a ~89% decrease in the autofocusing time per tissue slide. By combing a Deep-R network with a virtual staining model, our framework generated virtual H&E stained images from coarsely-focused whole slide autofluorescence images of label-free tissue sections, matching the standard virtual staining inference that used finely-focused WSIs that were acquired with 4× more focus points and 2× higher focusing precision. For a 1 cm2 label-free tissue section, consisting of 900 image FOVs (with each FOV having pixels), the data acquisition for in-focus autofluorescence images per sample takes ~6,900 seconds using a scanning optical microscope (see the Materials and Methods section). With the help of the presented defocused image virtual staining network, this scanning time can be reduced to ~4,800 seconds by acquiring coarsely-focused WSIs with reduced focus points and focusing precision. After the training is complete, which is a one-time effort, the inference process for both the Deep-R and the virtual staining network only takes ~18 seconds (i.e., ~36 seconds in total) for a tissue area of 1 cm2; stated differently, the total inference time for virtual staining is negligible compared with the whole slide image acquisition process.

Besides saving significant amounts of image acquisition time, the framework presented here can also act as an add-on module to improve the robustness of the standard virtual staining framework. Even when using high-precision prescan focus maps, parts of the WSI can still be inaccurately focused due to fluctuations of the microscope body and local variations of the specimen’s topology. This can cause either defocused image FOVs in parts of the WSI or an inaccurate focus map. Our fast virtual staining framework can be applied to these defocused FOVs to generate the same high-quality virtually stained images as the other in-focus regions, improving the inference consistency of the virtual staining framework using label-free tissue sections.

The ability of the fast virtual staining framework to generate high-quality stained images using coarsely-focused autofluorescence images stems from the integration of the Deep-R and virtual staining neural networks. In the training of these two networks, the style loss (see Figure 5(b) and the Materials and Methods section) serves as an essential regularization term to optimize the Deep-R network to generate refocused autofluorescence images suitable for the trained virtual staining network. To demonstrate that this style loss plays an indispensable role in our framework, we further trained the Deep-R and the virtual staining networks separately, without the style loss for the collaborative training; for comparison, we term this framework 4. We separately tested our defocused image virtual staining framework (framework 2), and this new framework 4 on the autofluorescence images acquired at different axial defocus distances used in Figure 4 and accordingly computed the absolute YCbCr color differences with respect to the virtual staining results of framework 3 that used in-focus autofluorescence images of the same sample FOVs. The average absolute color differences were plotted as a function of the axial defocus distance (see Supplementary Figure S1), demonstrating that framework 2 has significantly better performance than framework 4 over the entire defocus range, which reveals the significance of the style loss.

Since the neural network output usually has a distribution deviation from its target, directly feeding the refocused output images of Deep-R into the virtual staining network (such as in framework 4) leads to artifacts and hallucinations in the generated virtually stained images. The style loss, however, helps us regularize the Deep-R network such that Deep-R learns to recover the “style features” used for the virtual staining network, close to those of the in-focus autofluorescence images. Furthermore, conventional loss terms, such as the mean absolute error (MAE), emphasize low-level image features by achieving pixel-wise correlations between the output and the ground truth images. The style loss, however, penalizes the high-level image features by comparing multiscale features of the virtual staining network, enabling Deep-R to retrieve the features for the connected virtual staining network.

By introducing an additional Deep-R network in the inference process, the fast, defocused image virtual staining framework can be implemented on conventional fluorescence microscopes without hardware modifications or a customized optical setup. This fast virtual staining workflow can also be expanded to many other stains, such as Masson’s Trichrome stain, Jones’ silver stain, and immunohistochemical (IHC) stains [24, 12]. In addition to lung tissue, the presented virtual staining workflow can be applied to other types of human tissue such as, e.g., breast, kidney, salivary gland, liver, and skin [24]. Although the virtual staining approach presented here was demonstrated based on the autofluorescence imaging of unlabeled tissue sections, it can also be used to speed up the virtual staining workflow of other label-free microscopy modalities [6, 7].

This fast virtual staining method using defocused autofluorescence images of label-free tissue can be further improved to get better image quality and inference generalization. As shown in the zoomed-in regions in Figure 2(c), both the standard and fast virtual staining output images in some areas miss the nuclear features compared to the histochemically stained ground truth images. Additional channels of autofluorescence (e.g., FITC and Cy5 filters in addition to DAPI and TxRed filters) can be used to further improve the inference accuracy of the virtual staining network as demonstrated in Ref. [4] for IHC staining.

4. Materials and Methods

4.1. Image Data Acquisition

The neural networks were trained using microscopic images of thin tissue sections from lung needle core biopsies. Unlabeled tissue sections were obtained from existing deidentified specimens from the UCLA Translational Pathology Core Laboratory (TPCL). The human lung tissue blocks were sectioned using a microtome into ~4 μm thick sections, then deparaffinized using xylene, and mounted on a standard glass slide using mounting medium Cytoseal 60 (Thermo-Fisher Scientific). The autofluorescence images were captured using a Leica DMI8 microscope, controlled with Leica LAS X microscopy automation software. The unstained tissue sections were excited near the ultraviolet range and imaged using a DAPI filter cube (Semrock OSFI3-DAPI5060C, EX377/50 nm EM 447/60 nm) as well as a TxRed filter cube (Semrock OSFI3-TXRED-4040C, EX 562/40 nm EM 624/40 nm). The autofluorescence images were acquired with a 40×/0.95 NA objective (Leica HC PL APO 40×/0.95 DRY). Each FOV was captured with a scientific complementary metal-oxide-semiconductor (sCMOS) image sensor (Leica DFC 9000 GTC) with an exposure time of ∼100 ms for the DAPI channel and ∼300 ms for the TxRed channel.

While acquiring the autofluorescence images of the samples used to train the networks, we first built a fine prescan focus map with focus points uniformly distributed over the sample, taking up ~10% of the total image FOVs. Each focus point had a focusing precision of ±0.35 μm. At each FOV, we acquired a z-stack of autofluorescence images ranging from −2 to 2 μm with 0.5 μm axial spacing, where refers to the in-focus position from the fine prescan focus map. The in-focus autofluorescence images () are used as the network input to the virtual staining network (Figure 1(a)) and network target for the Deep-R network (Figure 1(b)). The autofluorescence images at different axial depths in the z-stack were randomly fed into the Deep-R network as input.

For each testing tissue sample, we first built the fine prescan focus map similar to the acquisition of the training images, and acquired the finely-focused whole slide autofluorescence images, which were the test inputs for the standard virtual staining framework. Then, we built a coarse prescan focus map and acquired the corresponding coarsely-focused WSI for the same sample. The focus points on the coarse focus map had a precision of ±0.83 μm and were evenly distributed over the sample, taking up ~2% of total image FOVs. For the blind testing samples that were used to quantify the relationship between the virtual staining performance and the axial defocus distance in Figure 4, we build fine prescan focus maps, the same as the acquisition of the training sample. At each image FOV, we then acquired z-stacks ranging from −3 to 3 μm with 0.5 μm axial spacing. To achieve the ±0.35 μm (or ±0.83 μm) focusing precision, the Leica LAS X microscopy automation software performs a two-step search algorithm to find the in-focus position. It first controls the microscope to implement a coarse focus search in a z-stack that has a range of 50 μm with 23 (or 9) axial steps. Then, a fine focus search in a z-stack with a range of 20 μm with 29 (or 13) axial steps finds the optimal focus; the time of the autofocusing process at each focus point is 33 (or 15) seconds, respectively.

After the autofluorescence imaging of each tissue section, the H&E histochemical staining was performed by UCLA TPCL. These stained slides were then digitally scanned using a brightfield scanning microscope (Leica Biosystems Aperio AT2), which were used as ground truth images.

4.2. Image Preprocessing and Coregistration

To train the network through supervised learning, matching pairs of images must be obtained before and after the histochemical staining. To do this, the in-focus autofluorescence images of an unlabeled tissue section were coregistered to brightfield images of the same tissue section after it was histochemically stained. This image coregistration was done through a combination of coarse and fine matching steps that were used to progressively improve the alignment until subpixel level accuracy was achieved, which followed the process reported by Rivenson et al. [2]. In the coarse image registration, a cross-correlation-based method was first used to extract the most similar portions in the stained images matching the autofluorescence images. Next, multimodal image registration [33] between the extracted histochemically stained images and the autofluorescence images resulted in an affine transformation, which was applied to the extracted stained images to correct any changes in size or rotation. To achieve pixel-level coregistration accuracy, a fine matching step using an elastic pyramidal registration algorithm [34, 35] was implemented. Since this step relies upon local-correlation-based matching, an initial rough virtual staining network is applied to the autofluorescence images. These roughly stained images were then coregistered to the brightfield images of the histochemically stained tissue using the elastic pyramidal registration algorithm.

Before feeding the aligned images into the neural networks, several preprocessing steps were applied to the images. For the Deep-R network, each pair of input and target autofluorescence images was normalized to have zero mean and unit variance. The same normalization was also applied to the input autofluorescence images of the virtual staining network. The histochemically stained images (ground truth) were converted to the YCbCr color space before being fed into the virtual staining network as target. For both the Deep-R and virtual staining networks, all image pairs were randomly partitioned into patches of pixels and then augmented eight times by random flipping and rotations during training.

4.3. Network Architecture, Training, and Validation

To perform the virtual staining network, we used a GAN [36] architecture (see Figure 5(a)), which is composed of two deep neural networks, including a generator and a discriminator. The generator network follows a U-net [37] structure, consisting of four downsampling blocks with residual connections and four upsampling blocks. Each downsampling block comprises three convolution layers and their activation functions, which double the number of channels. An average pooling layer follows these convolution layers with a stride and kernel size of two. The upsampling blocks first 2× bilinearly resize the tensors and then use three convolution layers with activation functions to reduce the number of channels by a factor of four. Skip connections between the downsampling and the upsampling layers at the same level allow features at various scales to be learned.

The input of the discriminator network was either the virtually stained images from the generator or the histochemically stained ground truth images. The discriminator contains six convolution blocks, each of which consists of two convolution layers that double the number of channels and has a stride of two. These six blocks were followed by a global pooling layer and two dense layers to generate a scalar after a sigmoid activation function.

During the training phase, the virtual staining network iteratively minimizes the loss functions of the generator and discriminator networks, defined as where and refer to the outputs of the discriminator and generator for the virtual staining network, respectively. represents the in-focus autofluorescence images, and denotes the brightfield counterparts of the histochemically stained tissue (ground truth). In these loss functions, the total variation (TV) and MAE loss terms are used as structural regularization terms to ensure that highly accurate virtually stained images are generated. The MAE loss and TV operator are defined as where and represent the number of vertical and horizontal pixels of the image patch, and and represent the pixel locations. The adversarial loss is defined as

The regularization parameters ( and ) were empirically set to 2,000 and 0.02.

For the virtual staining network, the generator and discriminator both use the Adam [38] optimizer with the initial learning rates of and , respectively. The number of channels for the first downsampling block of the virtual staining generator and discriminator was set to 64. A batch size of 4 was used during the training phase, and the training process took ~24 hours and converged after ~40,000 iterations (equivalent to ~30 epochs). Also, see the computer implementation details listed below.

For the Deep-R network, a similar GAN structure to the virtual staining network was used, but several modifications were made to the generator architecture. The Deep-R generator adapts five upsampling and downsampling blocks, and each downsampling or upsampling block contains two convolution layers in conjunction with a residual connection. In the downsampling path, instead of an average pooling layer, the Deep-R generator adapts max-pooling layers. For the objective function of the generator training, we use an adversarial loss from the discriminator, a perceptual loss [39], and a style loss based on high-level image features, in addition to the MAE loss and the multiscale structural similarity (MSSSIM) losses between the Deep-R output and the ground truth in-focus images, as shown in Figure 5(b). The generator and discriminator losses of the Deep-R network are defined as where , , , , and are training coefficients empirically set as 300, 2,000, 500, 100, 100, respectively. and refer to the discriminator and generator outputs for the Deep-R, respectively. denotes the autofluorescence images taken from a z-stack ranging from −2 to 2 μm with an axial step size of 0.5 μm; same as before, represents the in-focus autofluorescence images. The adversarial loss and MAE loss are defined as before. The perceptual loss is defined as where represents the output feature map at the -th convolutional block of the discriminator. The style loss is defined as where stands for the output feature map at the -th downsampling block of the trained virtual staining network (see Figure 5(b)). In the training of the Deep-R network, the refocused output images from the Deep-R generator and the in-focus ground truth images were input into the trained virtual staining network separately. Output feature maps at each downsampling block of these two inputs were used to compute the MAE losses for different feature levels (see Figure 5(b)), which were then summed up and averaged to generate the style loss.

The MSSSIM loss, , is defined as where and are the distorted (or recovered/inferred) and reference images downsampled times, respectively; are the averages of ; are the variances of , respectively; is the covariance of ; are the constants used to stabilize the division with a small denominator; and are exponents used to adjust the relative importance/weights of different components. The MSSSIM function is implemented using the TensorFlow function tf.image.ssim_multiscale using its default parameter settings.

The generator and discriminator for the Deep-R network used the Adam optimizers with the initial learning rates of and , respectively. The number of channels for the first downsampling block of the Deep-R generator and discriminator was set to 32. We used a batch size of 5 in our training phase, and the training process took ~72 hours and converged after ~100,000 iterations (equivalent to ~10 epochs). After the training of the Deep-R and the virtual staining networks, the blind inference process of the cascaded networks on a pixel input image (including the DAPI and TxRed channels) takes ~0.1 s (see the computer implementation details below).

4.4. Quantitative Image Metrics

PSNR is defined as where is the maximum possible value of the ground truth image. MSE is the mean square error between the two images being compared, defined as where is the target image and is the image that is compared with the target image.

SSIM is defined as where and are the two images being compared. and are the mean values of and , respectively. and are the standard deviations of and , respectively. is cross-covariance of and . and are the constants that are used to avoid division by zero.

4.5. Statistical Analysis

Paired upper-tailed t tests were used to determine whether statistically significant improvements were made when using the fast, defocused virtual staining framework. For each YCbCr color channel and the axial defocus distance, the paired upper-tailed t test was performed across the 562 unique FOVs using the absolute color differences between the virtual staining results of framework 1 and framework 3 (termed c1: comparison 1) and the absolute color differences between the virtual staining results of framework 2 and framework 3 (termed c2: comparison 2). The null hypothesis for the paired upper-tailed t test is that c1 and c2 have the same mean. We used a 0.05 statistical significance to reject the null hypothesis in favor of an alternate upper-tailed hypothesis that c2 has a smaller mean than c1, indicating that the framework 2 has a statistically significant improvement over the framework 1 (see Figure 4).

4.6. Implementation Details

The image preprocessing was implemented in MATLAB using version R2018b (MathWorks). The neural networks were implemented using Python version 3.9.0 and TensorFlow 2.1.0. The training was performed on a desktop computer with an Intel Xeon W-2265 central processing unit (CPU), 256 GB random-access memory (RAM), and an Nvidia GeForce RTX 2080 TI graphics processing unit (GPU).

Data Availability

Data supporting the results demonstrated by this study are available within the main text and the Supplementary Information. Additional data can be requested from the corresponding author.

Conflicts of Interest

Y.Z., L.H., K.d.H., and A.O. have pending patent applications related to the work reported in the manuscript. A.O. serves as a cofounder of Pictor Labs which works on virtual staining of tissue.

Authors’ Contributions

Y. Zhang, L. Huang, K. Cheng, K. de Haan, Y. Li, B. Bai, and A. Ozcan contributed to the algorithms and analysis. Y. Zhang, L. Huang, and T. Liu performed the microscopic image acquisition experiments. A. Ozcan, Y. Zhang, and L. Huang prepared the manuscript; all the authors contributed to the manuscript editing. A. Ozcan initiated the presented concept and supervised the research. Yijie Zhang and Luzhe Huang contributed equally to this work.

Acknowledgments

The authors acknowledge the support of the NSF Biophotonics Program.

Supplementary Materials

Figure S1. Comparison of YCbCr color difference as a function of the image defocus axial distance. (Supplementary Materials)

References

  1. M. R. Wick, “Histochemistry as a tool in morphological analysis: a historical review,” Annals of Diagnostic Pathology, vol. 16, no. 1, pp. 71–78, 2012. View at: Publisher Site | Google Scholar
  2. Y. Rivenson, H. Wang, Z. Wei et al., “Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning,” Nature Biomedical Engineering, vol. 3, no. 6, pp. 466–477, 2019. View at: Publisher Site | Google Scholar
  3. Y. Zhang, K. de Haan, Y. Rivenson, J. Li, A. Delis, and A. Ozcan, “Digital synthesis of histological stains using micro-structured and multiplexed virtual staining of label-free tissue,” Light: Science & Applications, vol. 9, no. 1, p. 78, 2020. View at: Publisher Site | Google Scholar
  4. B. Bai, H. Wang, Y. Li et al., “Label-free virtual HER2 immunohistochemical staining of breast tissue using deep learning,” 2021, https://arxiv.org/abs/2112.05240. View at: Google Scholar
  5. N. Bayramoglu, M. Kaakinen, L. Eklund, and J. Heikkila, “Towards virtual H&E staining of hyperspectral lung histology images using conditional generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 64–71, Venice, Italy, 2017. View at: Google Scholar
  6. Y. Rivenson, T. Liu, Z. Wei, Y. Zhang, K. de Haan, and A. Ozcan, “PhaseStain: the digital staining of label-free quantitative phase microscopy images using deep learning,” Light: Science & Applications, vol. 8, no. 1, p. 23, 2019. View at: Publisher Site | Google Scholar
  7. J. Li, J. Garfinkel, X. Zhang et al., “Biopsy-free in vivo virtual histology of skin using deep learning,” Light: Science & Applications, vol. 10, no. 1, p. 233, 2021. View at: Publisher Site | Google Scholar
  8. L. Kang, X. Li, Y. Zhang, and T. T. Wong, “Deep learning enables ultraviolet photoacoustic microscopy based histological imaging with near real-time virtual staining,” Photoacoustics, vol. 25, article 100308, 2022. View at: Publisher Site | Google Scholar
  9. P. Pradhan, T. Meyer, M. Vieth et al., “Computational tissue staining of non-linear multimodal imaging using supervised and unsupervised deep learning,” Biomedical Optics Express, vol. 12, no. 4, pp. 2280–2298, 2021. View at: Publisher Site | Google Scholar
  10. Z. Chen, W. Yu, I. H. Wong, and T. T. Wong, “Deep-learning-assisted microscopy with ultraviolet surface excitation for rapid slide-free histological imaging,” Biomedical Optics Express, vol. 12, no. 9, pp. 5920–5938, 2021. View at: Publisher Site | Google Scholar
  11. D. Li, H. Hui, Y. Zhang et al., “Deep learning for virtual histological staining of bright-field microscopic images of unlabeled carotid artery tissue,” Molecular Imaging and Biology, vol. 22, no. 5, pp. 1301–1309, 2020. View at: Publisher Site | Google Scholar
  12. Y. Rivenson, K. de Haan, W. D. Wallace, and A. Ozcan, “Emerging advances to transform histopathology using virtual staining,” BME Frontiers, vol. 2020, article 9647163, 11 pages, 2020. View at: Publisher Site | Google Scholar
  13. F. Shen, L. Hodgson, and K. Hahn, “Digital autofocus methods for automated microscopy,” Methods in Enzymology, vol. 414, pp. 620–632, 2006. View at: Publisher Site | Google Scholar
  14. Z. Bian, C. Guo, S. Jiang et al., “Autofocusing technologies for whole slide imaging and automated microscopy,” Journal of Biophotonics, vol. 13, no. 12, article e202000227, 2020. View at: Publisher Site | Google Scholar
  15. R. Redondo, G. Bueno, J. C. Valdiviezo et al., “Autofocus evaluation for brightfield microscopy pathology,” Journal of Biomedical Optics, vol. 17, no. 3, article 036008, 2012. View at: Publisher Site | Google Scholar
  16. Y. Sun, S. Duthaler, and B. J. Nelson, “Autofocusing in computer microscopy: selecting the optimal focus algorithm,” Microscopy Research and Technique, vol. 65, no. 3, pp. 139–149, 2004. View at: Publisher Site | Google Scholar
  17. L. Firestone, K. Cook, K. Culp, N. Talsania, and K. Preston Jr., “Comparison of autofocus methods for automated microscopy,” Cytometry: The Journal of the International Society for Analytical Cytology, vol. 12, no. 3, pp. 195–206, 1991. View at: Publisher Site | Google Scholar
  18. M. A. Bopp, Y. Jia, L. Li, R. J. Cogdell, and R. M. Hochstrasser, “Fluorescence and photobleaching dynamics of single light-harvesting complexes,” Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 20, pp. 10630–10635, 1997. View at: Publisher Site | Google Scholar
  19. J. Liao, Y. Jiang, Z. Bian et al., “Rapid focus map surveying for whole slide imaging with continuous sample motion,” Optics Letters, vol. 42, no. 17, pp. 3379–3382, 2017. View at: Publisher Site | Google Scholar
  20. J. Kang, I. Song, H. Kim et al., “Rapid tissue histology using multichannel confocal fluorescence microscopy with focus tracking,” Quantitative Imaging in Medicine and Surgery, vol. 8, no. 9, pp. 884–893, 2018. View at: Publisher Site | Google Scholar
  21. M. Bathe-Peters, P. Annibale, and M. J. Lohse, “All-optical microscope autofocus based on an electrically tunable lens and a totally internally reflected IR laser,” Optics Express, vol. 26, no. 3, pp. 2359–2368, 2018. View at: Publisher Site | Google Scholar
  22. L. Silvestri, M. C. Müllenbroich, I. Costantini et al., “Universal autofocus for quantitative volumetric microscopy of whole mouse brains,” Nature Methods, vol. 18, no. 8, pp. 953–958, 2021. View at: Publisher Site | Google Scholar
  23. T. R. Dastidar and R. Ethirajan, “Whole slide imaging system using deep learning-based automated focusing,” Biomedical Optics Express, vol. 11, no. 1, pp. 480–491, 2020. View at: Publisher Site | Google Scholar
  24. H. Pinkard, Z. Phillips, A. Babakhani, D. A. Fletcher, and L. Waller, “Deep learning for single-shot autofocus microscopy,” Optica, vol. 6, no. 6, pp. 794–797, 2019. View at: Publisher Site | Google Scholar
  25. S. Jiang, J. Liao, Z. Bian, K. Guo, Y. Zhang, and G. Zheng, “Transform-and multi-domain deep learning for single-frame rapid autofocusing in whole slide imaging,” Biomedical Optics Express, vol. 9, no. 4, pp. 1601–1612, 2018. View at: Publisher Site | Google Scholar
  26. C. Li, A. Moatti, X. Zhang, H. T. Ghashghaei, and A. Greenabum, “Deep learning-based autofocus method enhances image quality in light-sheet fluorescence microscopy,” Biomedical Optics Express, vol. 12, no. 8, pp. 5214–5226, 2021. View at: Publisher Site | Google Scholar
  27. Y. Luo, L. Huang, Y. Rivenson, and A. Ozcan, “Single-shot autofocusing of microscopy images using deep learning,” ACS Photonics, vol. 8, no. 2, pp. 625–638, 2021. View at: Publisher Site | Google Scholar
  28. Y. Wu, Y. Rivenson, H. Wang et al., “Three-dimensional virtual refocusing of fluorescence microscopy images using deep learning,” Nature Methods, vol. 16, no. 12, pp. 1323–1331, 2019. View at: Publisher Site | Google Scholar
  29. X. Yang, L. Huang, Y. Luo et al., “Deep-learning-based virtual refocusing of images using an engineered point-spread function,” ACS Photonics, vol. 8, no. 7, pp. 2174–2182, 2021. View at: Publisher Site | Google Scholar
  30. L. Huang, H. Chen, Y. Luo, Y. Rivenson, and A. Ozcan, “Recurrent neural network-based volumetric fluorescence microscopy,” Light: Science & Applications, vol. 10, no. 1, p. 62, 2021. View at: Publisher Site | Google Scholar
  31. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. View at: Publisher Site | Google Scholar
  32. J. K. Chan, “The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology,” International Journal of Surgical Pathology, vol. 22, no. 1, pp. 12–32, 2014. View at: Publisher Site | Google Scholar
  33. “Register multimodal MRI images - MATLAB & Simulink Example,” June 2022, https://www.mathworks.com/help/images/registering-multimodal-mri-images.html. View at: Google Scholar
  34. Y. Rivenson, H. Ceylan Koydemir, H. Wang et al., “Deep learning enhanced mobile-phone microscopy,” ACS Photonics, vol. 5, no. 6, pp. 2354–2364, 2018. View at: Publisher Site | Google Scholar
  35. H. Wang, Y. Rivenson, Y. Jin et al., “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nature Methods, vol. 16, no. 1, pp. 103–110, 2019. View at: Publisher Site | Google Scholar
  36. I. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020. View at: Google Scholar
  37. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015. View at: Publisher Site | Google Scholar
  38. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” 2014, https://arxiv.org/abs/1412.6980. View at: Google Scholar
  39. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision, pp. 694–711, Springer, 2016. View at: Publisher Site | Google Scholar

Copyright © 2022 Yijie Zhang et al. Exclusive Licensee Zhejiang Lab, China. Distributed under a Creative Commons Attribution License (CC BY 4.0).

 PDF Download Citation Citation
Views698
Downloads225
Altmetric Score
Citations