Research Article | Open Access
Talukder Zaki Jubery, Clayton N. Carley, Arti Singh, Soumik Sarkar, Baskar Ganapathysubramanian, Asheesh K. Singh, "Using Machine Learning to Develop a Fully Automated Soybean Nodule Acquisition Pipeline (SNAP)", Plant Phenomics, vol. 2021, Article ID 9834746, 12 pages, 2021. https://doi.org/10.34133/2021/9834746
Using Machine Learning to Develop a Fully Automated Soybean Nodule Acquisition Pipeline (SNAP)
Nodules form on plant roots through the symbiotic relationship between soybean (Glycine max L. Merr.) roots and bacteria (Bradyrhizobium japonicum) and are an important structure where atmospheric nitrogen (N2) is fixed into bioavailable ammonia (NH3) for plant growth and development. Nodule quantification on soybean roots is a laborious and tedious task; therefore, assessment is frequently done on a numerical scale that allows for rapid phenotyping, but is less informative and suffers from subjectivity. We report the Soybean Nodule Acquisition Pipeline (SNAP) for nodule quantification that combines RetinaNet and UNet deep learning architectures for object (i.e., nodule) detection and segmentation. SNAP was built using data from 691 unique roots from diverse soybean genotypes, vegetative growth stages, and field locations and has a good model fit (). SNAP reduces the human labor and inconsistencies of counting nodules, while acquiring quantifiable traits related to nodule growth, location, and distribution on roots. The ability of SNAP to phenotype nodules on soybean roots at a higher throughput enables researchers to assess the genetic and environmental factors, and their interactions on nodulation from an early development stage. The application of SNAP in research and breeding pipelines may lead to more nitrogen use efficiency for soybean and other legume species cultivars, as well as enhanced insight into the plant-Bradyrhizobium relationship.
The dynamic and symbiotic relationship between soybean (Glycine max L. Merr.) and bacteria (Bradyrhizobium japonicum) is largely considered mutually beneficial . In a specialized root structure, known as a nodule, bacteria fix atmospheric nitrogen (N2) into a bioavailable ammonia (NH3) form that is used by the host soybean plant to assist in meeting its nitrogen needs. The bacteria, in turn, acquire carbon from the host [2, 3]. Nitrogen (N) is critical for building amino acids, vegetative growth, and for protein accumulation in seed. The number of nodules formed on soybean roots can vary from only a few nodules to several hundred per plant . Figure 1 shows diverse soybean genotypes at the soybean vegetative growth stage (V5) with varying amounts of nodules, which can vary in quantity among genotypes and fluctuate along the taproot and secondary roots of the plant. Comparisons of nodulating versus nonnodulating soybean isolines show there can be a sixfold increase of nitrogen in the nodulating plant at later growth stages, demonstrating the impact of nodulation . However, hypernodulating mutants have been shown to be inefficient, with reduced biomass and yield .
Due to the importance of nodulation in legume crops in terms of crop health and yield, numerous studies have investigated nodule distributions on roots [7–11]. To evaluate these nodule plant relationships, nodule evaluations have traditionally been done with qualitative ratings taken on the entire root, just the tap root, secondary roots, or combinations thereof. For the primary world row crops, seed yield is one of the major breeding objectives. However, for N-fixing legume crops, much work is still required to develop and exploit an optimum balance between host genotype, nodulation amount, applied N rates, and positioning of nodules on the root [12–15]. Therefore, plant breeders and researchers are motivated to continue the exploration of germplasm and interactions between the plant-bacteria at multiple levels (cellular, plant, crop, and ecosystem) [16–18]. The need to understand nodule growth and development has led to numerous nodulation studies focused on the rates and positioning of biological products , fertilizer application patterns , climate , soil types , and even herbicides , while evaluating N management decisions including runoff and conservation from the previous year's production  to help mitigate environmental damage and hypoxia zones . Plant breeders can mitigate some of these challenges by developing more efficient and environment responsive N-fixing varieties, to positively impact plant growth and development. For both researchers and producers, a technological limitation is the inability to count and quantify the amount, shape, and location of nodules, as the arduous phenotyping task is very time-consuming and can be technically challenging.
Due to the labor-intensive nature of studies on nodule count and its health, researchers have traditionally used rater subjective qualitative ratings that are based on tap root nodulation, or representing the entire root, including secondary root branches . Later, researchers developed more descriptive scales to assist with nodule counting and ratings. These visual scales include numerical qualitative ratings of 1 to 10, where rating “1” represents few nodules on the taproot and rating “10” with much of the taproot nodulated as in Hiltbold et al. . In the next rating iteration, nodule quantification included all roots (tap and lateral). There have been attempts to use both nodule count and size and more simple numerical scales to make the rating system more informative [27, 28]. However, the lack of automation has hindered large experiments that use high-throughput phenotyping to assess a more exhaustive number of genotypes. This often forces much of the distribution of nodule counts into qualitative categories which limits more in-depth studies that are possible with quantitative evaluations.
Limited attempts have been made to quantify nodule numbers and size in semicontrolled and especially field environments due to the sheer volume of work and labor required to accomplish root nodule phenotyping at a reasonable scale and time. This severely limits the number of experimental units that are examined. There have been some efforts to identify nodulation patterns in legume roots grown in controlled or semicontrolled environments by using traditional computer vision techniques [14, 29, 30]. These techniques involve simple thresholding-based segmentation of root from the background using either color alone  or color and texture together , and then detection of nodules using predefined morphological and color characteristics  or predefined outline/shape of the nodules . These techniques were not robust enough to detect all nodules on an image, and users’ input is required where automatic detection fails. Although semiautonomous counting methods have been developed, full automation for phenotyping is still unavailable, necessitating a reliance on qualitative ratings.
With the advances in phenotyping methods in plant organs [31–33] and plant stress traits [34–38], machine learning methods are an attractive solution to advance nodule phenotyping. Machine learning (ML) has been used in numerous plant trait phenotyping to make trait acquisition more feasible and consistent , for example, in disease identification [24, 40–42], abiotic stress [37, 43], small object detection for soybean cyst nematode (SCN) eggs , and yield-related traits [45, 46]. Furthermore, ML methods have helped develop an end-to-end phenotyping pipeline to study root system architecture traits in soybean [47, 48].
Due to the success of ML for trait phenotyping, we explored ML methods to develop an automated pipeline to identify and characterize nodules and root nodule distributions while reducing the amount of manual processing and counting. The objectives of this work are (1) to develop an open-source analysis pipeline that can automatically detect nodules on diverse genotypes and growth stages in soybean root images and (2) provide metrics including the total number of nodules, sizes of the nodules, and nodule distributions along the tap root. We present a novel Soybean Nodule Acquisition Pipeline (SNAP) to achieve these objectives.
2. Materials and Methods
2.1. Plant Materials, Root Excavation, and Imaging Protocols
The dataset for developing SNAP consisted of growing unique soybean genotypes in diverse environments with data collected across several time points. For the evaluation of SNAP, 691 images were collected from 7 unique genotypes (CL0J095-4-6, PI 80831, PI 437462A, PI 438103, PI 438133B, PI 471899, and IA3023), in three environments. These included Muscatine Island Research Station, Fruitland, IA, in 2018 and 2019 (soil type: Fruitland Coarse Sand) and The Horticulture Research Station, Gilbert, IA, in 2018 (soil type: Clarion loam modified in 1967 to a texture of sandy loam).
Three seeds per experimental unit were planted, and after emergence, two were cut at the soil surface using a sharp blade to leave one standing plant per plot; therefore, each experimental unit consisted of one plant. Each plot was spaced . Images were collected at three vegetative growth stages: V1, V3, and V5 . At the designated growth stage, plants were tagged with barcodes and also labeled with identification strips. Soybean roots were extracted using trenching spades from a 50 cm diameter and 30 cm deep area. Extreme precaution was taken in digging the soil sample to avoid disruption in the plant roots. This was followed by gently removing the loose roots from the soil by hand, ensuring maximum nodule retention on the roots.
After extraction, the root from each plot was placed in a 5-gallon bucket half full of water to rinse the remaining soil from the sample. After 30 minutes, each root was placed on blue painted trays for background consistency. The tray measurements were with a 2 cm lip. To obtain a clear 2D image of each plant root, the imaging trays were half-filled with water, and the roots were gently separated from each other to prevent increased occlusion or clumping together of the roots in each image. Each placement of the root typically took 2-3 minutes. A glass plate fitted to the size of the tray was then laid on top of the root in the water to hold it in place and then slid into an imaging platform  customized for this project. The platform was built from aluminum T-slot extrusion frames (80/20 Inc., Columbia City, IN) with two softbox photography lights (Neewer; Shenzen, China), four 70-watt CFL bulbs in total, to provide consistent illumination and a Canon T5i digital SLR camera (lens: EF-S 18-55 mm f/3.5-5.6 IS II) (Canon USA, Inc., Melville, NY) was mounted 55 cm above the imaging plane. See Falk et al.  for full details. The camera was tethered to a laptop with Smart Shooter 3 photo capture and camera control software , to trigger image capture. Smart Shooter enabled automatic image labeling and naming by reading the tag barcode in each image, reducing human error.
After imaging, roots were dried in paper bags at 60°C for two days. After the roots were thoroughly dry, they were weighed for dry weight (grams), and nodules from each root were manually removed (by hand) and counted for use in the ground truth analysis. Hand removal of the nodules was accomplished by trained researchers, who carefully removed each nodule with tweezers. Upon removal, a second researcher cross-validated and observed the root for any remaining nodules. This two-person unit then individually counted and recorded the number of nodules. If there was a discrepancy in the counts, that sample was recounted, ensuring that all nodules were correctly identified and counted. The removed nodules were then weighed and recorded in grams. After completing a sample and its validation, the two-person team moved on to the next sample.
2.2. Deep Learning and Image Processing Workflow
The proposed workflow using deep learning (DL) and image processing is shown in Figure 2. It has two phases: (1) training and (2) evaluation. Two DL networks were trained: (a) nodule detection network and (b) tap root detection network. To train these networks, we selected a representative subset of the dataset for annotation using submodular optimization . Nodules were annotated by a human root nodule phenotyping expert, who drew bounding boxes around each nodule on image patches using VGG annotator  (Figures S1–S3). The taproot was annotated using the full root image, and tracing was done on a Microsoft Surface computer with a Surface pen generating a line over the tap root using the edit and create tool in Microsoft Photos (Microsoft 2020). In the evaluation phase, we used the trained models to obtain the number of nodules, size distribution, and spatial distribution along the tap root.
2.2.1. Representative Sample Selection
We deployed an unsupervised framework based on submodular function optimization  to select a representative subset () from the whole dataset () for annotation. One of the properties of this function is that incremental gain of utility by adding new samples to the subset () decreases as the size of the subset grows. Thus, a small subset is maximally representative of the full dataset. We used the uncapacitated facility location function  as the submodular function:where and are a measure of similarity and Euclidean distance between a pair of samples , . The value of indicates the representativeness of the subset to the whole set . By maximizing , we selected the most representative subset , given that , where is a predefined value indicating the size of subset. We used a simple greedy forward-selection algorithm  to maximize . The algorithm starts with and iteratively adds samples that maximally increase .
In the framework, each sample with dimensions is input as a vector. In high-dimensional space, Euclidean distance becomes less meaningful . Therefore, we reduced the dimensions of the input vector using a combination of nonlinear transformation (an autoencoder) to balance an accurate reconstruction without overfitting the training data and linear transformation (principal component analysis (PCA)) to best minimize feature redundancy. First, we downsampled the input image () using bilinear interpolation (to ), ensuring no visible distortion of the root. Next, we generated low-dimensional latent representation using a shallow autoencoder; then, we flattened the encoded representation and finally mapped it to 400 principal components that capture approximately 98% of the variance (Figure S4).
2.2.2. Nodule Detection Framework
We approached nodule detection as a dense detection problem, as the root images have many target nodules for detection. We selected RetinaNet , which shows better performance than other detection frameworks for dense detection problems due to the use of “focal loss” as a classification loss function . Focal loss function uses a scaling factor to focus on the sparse set of hard examples (foreground/nodules) and downweights the contribution of easy (and abundant) examples (background) during training (Figure S5).
We used the same backbone and subnetworks as Lin et al. . The backbone network was based on ResNet50-FPN and two subnetworks consisting of four convolution layers with 256 filters, rectified linear unit (ReLU), and sigmoid activation functions. Multiple anchors were used at each spatial location to detect nodules of various sizes and aspect ratios. The default anchor configurations in Lin et al.  were suitable for detecting objects with 32 pixels or above in size. We changed the default configurations and optimized them for our case. We explored two different selection strategies: (a) maximize overlapping between selected anchors and bounding boxes as Zlocha et al.  using a differential evolution search algorithm  and (b) fit a normal distribution to the bounding box sizes and aspect ratios and make selections based on equidistant percentiles. In our case, we evaluated equidistant percentiles using several scales and aspect ratios of 3, 5, and 7, where the default anchor configurations consisted of three scales (20/3, 21/3, and 22/3) and three aspects (1 : 2, 1 : 1, and 2 : 1).
The base ResNet-50 models were initialized as the pretrained model in the COCO dataset . The convolution layers on the subnets, except the last one, were initialized with bias and a Gaussian weight fill with . The last layer on the classification subnet is initialized with . We set , where specifies the confidence value of an anchor as foreground.
Data augmentation was performed by first flipping the images in horizontal and vertical directions with a 50% chance. Next, random affine transformations with rotation/shearing of up to 0.1 radians were used, followed by scaling/translation of up to 10% of the image size. To develop the trained networks and evaluate them, we utilize a standard 80% training and 20% testing set. The selections were made before data augmentation.
The model was trained with an Adaptive Moment Estimation (Adam) optimizer  with an initial learning rate of 10−3. Focal loss was used to train the classification subnetwork.where and are the scaling factors to focus on the sparse set of hard examples (nodules). is the class probability for the hard examples, . We used and .
The standard smooth loss was used for box regression.with . is the transition point (near zero) from to loss.
The number of trainable parameters of the framework was 36,737,717. We performed data augmentation and explored the effect of batch size and scale of input image size on our dataset. The average precision (AP)  metric was used to evaluate the models. The AP indicates the area under the precision-recall curve after samples are ordered based on the confidence of detection.where is the precision as a function of recall.
All models were trained for 300 iterations on fixed training data (representative 20% samples from the dataset) and tested on a fixed test data using 4% randomly selected samples from the dataset. Model development in the pipeline was completed using a GeForce GTX TITAN X 12 GB GPU. On average, the training time took from 10 hours to 3.5 days for the models.
2.2.3. Tap Root Detection Framework
The tap root detection was approached as an image segmentation task. We deployed a UNet involving three max pooling operations (downsampling)  with 7,708,609 trainable parameters. The network consisted of three encoding/contracting blocks and three decoding/expansive blocks. Each encoding block consisted of two convolutions with 64 feature channels, two rectified linear unit (ReLU) activations, two batch normalizations, and one dropout operation. The decoding block was the same as the encoding block, except for the dropout operation. In between these encoding and decoding blocks, the output from the encoding block was downsampled using a max pooling operation followed by two convolutions with 128 feature channels, two rectified linear unit (ReLU) activations, two batch normalizations, and two dropout operations. The output from these operations was upsampled using a transposed convolution, followed by a concatenation operation that combines the encoder output feature channels and upsampled feature channels. After the decoding blocks, the final layer is a convolution that is used to map each of the 64 channel feature vectors to the 1 channel output binary image (Figure S6).
We utilized a standard 80% training and 20% testing set. The selections were made before data augmentation. Data augmentation was performed by flipping the images in horizontal and vertical directions, zooming 120%, translation in horizontal and vertical directions then 5%, rotation until 15 degrees.
The model was initialized using Glorot uniform initializer  with zero biases and trained with Adam optimizer  with batch size 4 and learning rate 10−3 using a Jaccard index  as the loss function (Figure S7).where is the annotated ground truth tap root image and is the detected tap root image.
2.3. Hyperparameter Tuning
Of the 20% representative samples used for training data, 4% were randomly selected as test data. During hyperparameter tuning, model training was performed on the represented samples, and model evaluation was done on the test data.
2.4. Detection, Postprocessing, and Evaluation
In the trained nodule detection model, each sample was fed to the trained RetinaNet model in 256 by 256 patches. Samples were padded to ensure the image width and height were divisible by 256.
In order to begin elucidating spatial relationships of nodules in various root zones, we developed a method of predicting the number of nodules in the taproot location zone using a trained UNet as shown in Figure 3. Once the model identified the taproot location in the image, we dilated the taproot. We then identified the center of the detected bounding boxes around nodules which fell within the dilated taproot. We then count the bounding box centers that are within the dilated taproot zone and called these as taproot zone affiliated nodules. Further, in order to generate metrics involving the spatial distribution of the nodules along the taproot, we skeletonized the taproot and identified the nearest location on the taproot for every identified bounding box. This enabled spatial statistics assessments of nodules along the taproot related in proximity to the soil line and taproot length.
2.5. Avoiding Misclassification of Nodules and SCN Cysts
To avoid potential misclassification of early developing nodules with additional structures such as soybean cyst nematode (SCN) (Heterodera glycines) cysts, during the training phase, images with SCN cysts on the roots were included in the training dataset to enable robust, accurate classification of only nodules. When labeling nodules, care was taken not to mislabel a cyst as a nodule. To evaluate the accuracy of SNAP predicted nodules, a human expert rater evaluated each predicted nodule to ensure that it was not a cyst.
2.6. SNAP Evaluation
Further evaluation of SNAP was conducted to evaluate the sensitivity and precision of the pipeline nodule predictions using the following equations:where true positive represents instances when SNAP accurately identified nodules in the image, false negative when nodules were present in the image but not identified by SNAP, and false positives were twice counted nodules or the predicted nodule was not an actual nodule.
To evaluate the required processing time of SNAP, the pipeline was implemented using a Python 3.6 environment on a Microsoft Surface with 16 GB RAM using an Intel® Core™ i7-8650U CPU @ 1.90 GHz and 2.11 GHz.
To develop a model that best quantifies nodules, a balance between computational resources and accuracy was sought. When evaluating accuracy, an increase of 20% average precision (AP; Equation (4)) was observed for the optimized method compared to the default, and minimal AP difference was noted when the numbers of scales and aspect ratios were increased (Supplemental Table S1). The normal percentile method was computationally cheap, compared to the optimized method, and it enforces the expected normal distribution on the naturally occurring objects like nodules. No further improvement was noted for the normal percentile method with increasing scales and aspect ratios. We investigated the effect of % data annotated, batch size, and input image scale by comparing AP (Table 1).
Bolding shows best AP.
No perceivable difference was noted on the effect of batch size. Minimal improvement was noted with the increase of percent data annotated; except at 30%, a small increase in AP was noted with an increase in input image scale (Table 2). No major improvement was noted when the batch size was increased to 32 and 64, and the input image scale was increased to 758. The image input scales tended to overpredict nodules bigger than about 15 pixels, which roughly represents the width of the bounding box (Figure S8). An improvement in nodule detection was noted for smaller sized nodules (<15 pixels), when the input image scale was increased from 256 to 512 scale without a continual improvement from 512 to 768 scales.
Bolding shows best AP.
3.1. Validation of Whole Root and Tap Root Nodule Counts
To determine the capability of SNAP for nodule detection, we randomly picked 10% of the root sample images not used in the ML model training and evaluation sets. The validation was performed three ways by comparisons between (a) SNAP nodule count to human rater removed nodule count, (b) SNAP nodule count to human expert nodule count from the image, and (c) SNAP nodule count in the tap root zone to human nodule count on the taproot zone in the image (Figure 4). Examples of good and poor nodule predictions can be found in Figures S9 and S10.
High correlations were observed for all the three comparisons, with of 0.91 in the physically removed nodules to SNAP comparison, 0.99 in the nodules counted within the image to SNAP comparison, and 0.71 in the taproot zone counted nodules within the image to SNAP taproot zone nodule counts. Overall, SNAP nodule count had and .
3.2. Time and Labor Requirements
SNAP pipeline development was dependent on efficient root digging, sample preparation, including root washing, imaging, and generation of ground truth data by manual nodule harvesting and counting. Once the ML model was developed, the actual time to obtain nodule count through SNAP was dramatically reduced. Manual nodule harvesting (i.e., extraction) and sample preparation with imaging time increased per growth stage. The most time-intensive step was manual quantification (i.e., ground truth nodule counting), and the time required to remove nodules dramatically increased per growth stage. Once the ML model is trained, the most time-intensive step of manual nodule count is removed, providing SNAP users with an increase in time and resource efficiency, and an ability to work with more samples. In the course of this study, we observed that on average, the manual extraction of roots takes 240, 360, and 420 seconds for V1, V3, and V5 roots, respectively. To wash and image V1, V3, and V5 roots, it took an average of 100, 128, and 150 seconds, respectively. The comparison of hand quantification of nodules and SNAP showed a dramatic change, as it took our team of multiple trained workers an average of 1500, 2100, and 3000 seconds per V1, V3, and V5 root, respectively, while SNAP took 90, 120, and 150 seconds per V1, V3, and V5 root, respectively (Table S2).
Object detection in cluttered and complicated backgrounds is an inherently challenging problem. The complexity and diversity of roots and nodules combined with root occlusion and color/texture similarities of the roots and nodules and the need for a high-throughput method to routinely screen large number of genotypes, necessitates a unique ML architecture to extract and quantify nodule counts and sizes. Our earlier iterations to approach this problem included segmentation and detection using classical SURF and SIFT methods  and a deep learning-based Faster-RCNN approach . However, due to poor model performance with these methods, we transitioned to RetinaNet, which showed improved accuracy and faster performance in dense object detection due to the use of focal loss function .
We combined RetinaNet and Unet, to develop SNAP that enables accurate nodule counts with an of 0.91 in manual nodule removal counted and 0.99 for the image counted nodules on a soybean root and generate information on the spatial distribution of nodules. SNAP provides automation in phenotyping, with a significant reduction in time to count nodules on a root. In each image, nodules were counted in about 2-3 minutes compared to another existing semiautomated pipeline, which took about 20 minutes to do similar counting . The primary reduction in time was observed in SNAP compared to manual counting, with improvements by factors of 16 to 25 times depending on the growth stage.
SNAP offers multiple avenues for its applications in research and breeding domains. There is an active interest in learning the spatial distribution and temporal development of nodulation in crops, particularly to optimize plant performance and symbiosis with bacteria [67, 68]. SNAP can estimate the number of nodules in the taproot zone with a precision of over 70%. Upon human validation of SNAP predicted nodules, no instance was noted where an SCN cyst was misclassified as a nodule. Figure 5 shows a representative example of a complex root architecture with varying nodule sizes, and nodule and cyst distribution patterns. As SNAP is able to identify even small or newly developing nodules often missed in rater assessments, it is possible to now classify nodule development stages and quantities in correlation to vegetative growth stages or evaluate the effects of SCN on nodulation in a temporal scale using a fully automated ML cyst detection pipeline . However, it is important to note that field root study samples are destructively sampled; therefore, the study of nodules will require the evaluation of separate plants of the same genotypes at different time points. Through SNAP, the groundwork has been laid for future studies that can screen large breeding populations, identify and investigate QTL, and determine the relationships and correlations between root growth zones, root system architecture, and nodules.
While we utilized root samples from field experiments, SNAP can be combined with additional technologies such as mobile platforms for immediate in-field evaluation or nonfield environments, such as greenhouse, growth chamber, X-ray, and CT scan experiments to enable further solutions to breeding challenges for nodule phenotyping. SNAP-based nodule counting is amenable with previously used methods such as a binned rating scale of 1-10 if researchers are interested in comparative studies combing old and new research outcomes. Using the distribution generated by SNAP, a more accurate binning and count can occur, and roots can be rated automatically for comparison to each other and potentially against additional or prior studies.
Often breeders are unable to include root architecture and nodulation in their assessments as they are seen as unattainable and unrealistic traits to evaluate in a manageable and high-throughput manner, although more recently improvements have been suggested [47, 48, 69, 70]. SNAP empowers breeders to evaluate and select genotypes that have a required level of nodulation in various biotic and abiotic conditions and accounting for genotype by environment interactions. Additionally, SNAP increases opportunities to identify and map genes controlling nodule-related traits, for example, size, onset, and nodule growth coinciding with plant growth and development stages. Since SNAP was trained and evaluated on several genotypes, field locations, and vegetative growth stages, it can enable the investigation of nodulation across diverse root types and vegetative time points as well as the investigation of the growth of nodules between similar roots in a temporal scale, unraveling new scientific insights at a larger scale (i.e., more genotypes) which was previously difficult.
Nonsoybean crop researchers working in other N-fixing crops need to validate the results of SNAP prior to its usage in their research. While we tested the success of SNAP in correctly identifying nodules discriminatively from SCN cysts, there may be other pathogen organisms, for example, root knot nematode, that will require additional model training and testing prior to its deployment to study root nodules. With advances in higher resolution imaging, a SNAP type of approach in the future will be beneficial to study other beneficial plant and microorganism interactions, such as arbuscular mycorrhizal fungi, which can positively impact crop production [71–73]. The combination of SNAP-based nodule phenotyping in conjunction with genomic prediction forecasted on a germplasm collection is also an attractive approach to identify useful accessions for plant breeding applications spanning various maturity groups .
Improvements in SNAP functionality could be realized, for example, through the implementation of more sophisticated active learning-based representative sample selection strategy to help improve the performance of the pipeline , delineate nodules specifically for irregular and nonuniform nonspherical nodules to get even higher size and shape accuracy, or evaluate spatial distribution of the nodules along the lateral roots.
Overall, SNAP will help reduce the strain on human labor and capacity to quantify nodules consistently and accurately in N-fixing crop species and move the current state of the art in nodule phenotyping and associated applications. SNAP outputs will have usefulness for researchers and farmers, who have an interest to rapidly and accurately phenotype nodules on roots. With the continual ML advances in plant phenotyping, we expect that further improvements in complex trait phenotyping will happen at a rapid pace [40, 76, 77].
Data is freely available upon request to the corresponding authors, and the pipeline software and codes will be available at GitHub: (https://github.com/SoylabSingh/SNAP).
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
C.C., Z.J., B.G., and A.K.S. conceived the project. All authors participated in the project implementation and completion. C.C. conducted experiments, imaging, and data curation. Z.J. developed the machine learning and image analysis pipeline. C.C. annotated the ground truth images and assessed the pipeline output. C.C. and Z.J. wrote the manuscript draft with A.K.S. and B.G. All authors contributed to the final manuscript production. Talukder Zaki Jubery and Clayton N. Carley contributed equally to this work.
The authors thank the many undergraduate, graduate students, and staff in the Singh group at Iowa State University who helped with field experiments, data collection, and imaging. Additional thanks are due to Koushik Nagasubramanian for his initial pipeline suggestions, and Vahid Mirnezami and Kevin Falk for the assistance with the imaging system. This project was supported by the Iowa Soybean Research Center (A.K.S.), Iowa Soybean Association (A.K.S.), R.F. Baker Center for Plant Breeding (A.K.S.), Plant Sciences Institute (A.K.S., B.G., and S.S.), Bayer Chair in Soybean Breeding (A.K.S.), and USDA CRIS project IOW04717 (A.K.S. and A.S.). C.N.C. was partially supported by the National Science Foundation under Grant No. DGE-1545453. T.Z.J. was partially supported by USDA-NIFA HIPS award.
Figure S1: width and height of the annotated nodules. Figure S2: width distribution of the annotated bounding boxes for nodules from 30% of the dataset. Figure S3: aspect ratio distribution of the annotated bounding boxes for the nodules from 30% of the dataset. Figure S4: informative sample selection workflow. Figure S5: focal (classification) and regression (bounding box detection) losses during training of nodule detection network, RetinaNet, using input image scale 512, anchor scales 0.48, 0.67, and 0.86, and aspect ratios 0.85, 0.99, and 1.13. Figure S6: the UNet architecture used to develop the tap root detection model. Figure S7: training and validation losses (Jaccard loss) during the training of UNet for tap root detection. Figure S8: effect of input image scale on nodule detection in the test data. Figure S9: representative example of good nodule detection on a V5 growth stage soybean root. Figure S10: a rare example of high misclassification of image debris as nodules on a V5 growth stage soybean root. Figure S11: (A) input image, (B) SNAP-detected nodules, (C) segmented image with difficult-to-detect clusters of nodules, (D) 50 strongest SURF points on original grayscale image, (E) 50 strongest SURF points on masked grayscale image, and (F) samples of training image patches used for bag-of-features codebook generation. Table S1: sizes, aspect ratios, and range of anchor configuration. Table S2: comparison of average times taken to extract, wash, and image roots in this study with average times required to hand quantify nodules compared to SNAP Quantify. (Supplementary Materials)
- M. K. Udvardi and M. L. Kahn, “Evolution of the (Brady) rhizobium legume symbiosis: why do bacteroids fix nitrogen?” Symbiosis, vol. 14, pp. 87–101, 1992.
- J.-P. Nap and T. Bisseling, “Developmental biology of a plant-prokaryote symbiosis: the legume root nodule,” Science, vol. 250, no. 4983, pp. 948–954, 1990.
- P. Mylona, K. Pawlowski, and T. Bisseling, “Symbiotic nitrogen fixation,” Plant Cell, vol. 7, no. 7, pp. 869–885, 1995.
- C. R. Weber, “Nodulating and nonnodulating soybean isolines: II. Response to applied nitrogen and modified soil conditions 1,” Agronomy Journal, vol. 58, no. 1, pp. 46–49, 1966.
- D. H. Kohl, G. Shearer, and J. E. Harper, “Estimates of N2Fixation based on differences in the natural abundance of15N in nodulating and Nonnodulating isolines of soybeans,” Plant Physiology, vol. 66, no. 1, pp. 61–65, 1980.
- S. Wu and J. E. Harper, “Dinitrogen fixation potential and yield of hypernodulating soybean mutants: a field evaluation,” Crop Science, vol. 31, no. 5, pp. 1233–1240, 1991.
- R. A. Kluson, W. J. Kenworthy, and D. F. Weber, “Soil temperature effects on competitiveness and growth ofRhizobium japonicum and on Rhizobium-induced chlorosis of soybeans,” Plant and Soil, vol. 95, no. 2, pp. 201–207, 1986.
- G. Caetano-Anoll’es and P. M. Gresshoff, “Nodule distribution on the roots of soybean and a supernodulating mutant in sand-vermiculite,” Plant and Soil, vol. 148, no. 2, pp. 265–270, 1993.
- D. N. Munns, V. W. Fogle, and B. G. Hallock, “Alfalfa root nodule distribution and inhibition of nitrogen fixation by heat 1,” Agronomy Journal, vol. 69, no. 3, pp. 377–380, 1977.
- W. B. Voorhees, V. A. Carlson, and C. G. Senst, “Soybean nodulation as affected by wheel traffic 1,” Agronomy Journal, vol. 68, no. 6, pp. 976–979, 1976.
- M. I. Bollman and J. K. Vessey, “Differential effects of nitrate and ammonium supply on nodule initiation, development, and distribution on roots of pea (Pisum sativum),” Botany, vol. 84, no. 6, pp. 893–903, 2006.
- T. R. Sinclair and M. A. Nogueira, “Selection of host-plant genotype: the next step to increase grain legume N2 fixation activity,” Journal of Experimental Botany, vol. 69, no. 15, pp. 3523–3530, 2018.
- K. J. Kunert, B. J. Vorster, B. A. Fenta, T. Kibido, G. Dionisio, and C. H. Foyer, “Drought stress responses in soybean roots and nodules,” Frontiers in Plant Science, vol. 7, article 1015, 2016.
- L. Remmler, L. Clairmont, A.-G. Rolland-Lagan, and F. C. Guinel, “Standardized mapping of nodulation patterns in legume roots,” New Phytologist, vol. 202, no. 3, pp. 1083–1094, 2014.
- A. M. Carter and M. Tegeder, “Increasing nitrogen fixation and seed development in soybean requires complex adjustments of nodule nitrogen metabolism and partitioning processes,” Current Biology, vol. 26, no. 15, pp. 2044–2051, 2016.
- D. Egamberdieva, D. Jabborova, S. J. Wirth, P. Alam, M. N. Alyemeni, and P. Ahmad, “Interactive effects of nutrients and Bradyrhizobium japonicum on the growth and root architecture of soybean (Glycine max L.),” Frontiers in Microbiology, vol. 9, article 1000, 2018.
- S. R. Tracy, K. A. Nagel, J. A. Postma, H. Fassbender, A. Wasson, and M. Watt, “Crop improvement from phenotyping roots: highlights reveal expanding opportunities,” Trends in Plant Science, vol. 25, no. 1, pp. 105–118, 2020.
- M. B. Shine, Q. M. Gao, R. V. Chowda-Reddy, A. K. Singh, P. Kachroo, and A. Kachroo, “Glycerol-3-phosphate mediates rhizobia-induced systemic signaling in soybean,” Nature Communications, vol. 10, no. 1, p. 5303, 2019.
- D. Harris, W. A. Breese, and J. V. D. K. K. Rao, “The improvement of crop yield in marginal environments using ‘on-farm’ seed priming: nodulation, nitrogen fixation and disease resisitance,” Australian Journal of Agricultural Research, vol. 56, no. 11, pp. 1211–1218, 2005.
- L. G. Moretti, E. Lazarini, J. W. Bossolani et al., “Can additional inoculations increase soybean nodulation and grain yield?” Agronomy Journal, vol. 110, no. 2, pp. 715–721, 2018.
- I. Aranjuelo, C. Arrese-Igor, and G. Molero, “Nodule performance within a changing environmental context,” Journal of Plant Physiology, vol. 171, no. 12, pp. 1076–1090, 2014.
- J. M. McCoy, G. Kaur, B. R. Golden et al., “Nitrogen fertilization of soybean affects root growth and nodulation on two soil types in Mississippi,” Communications in Soil Science and Plant Analysis, vol. 49, no. 2, pp. 181–187, 2018.
- L. H. S. Zobiole, R. J. Kremer, R. S. de Oliveira Jr, and J. Constantin, “Glyphosate effects on photosynthesis, nutrient accumulation, and nodulation in glyphosate-resistant soybean,” Journal of Plant Nutrition and Soil Science, vol. 175, no. 2, pp. 319–330, 2012.
- I. A. Ciampitti and F. Salvagiotti, “New insights into soybean biological nitrogen fixation,” Agronomy Journal, vol. 110, no. 4, pp. 1185–1196, 2018.
- C. Hou, M. L. Chu, J. A. Guzman, J. S. Acero Triana, D. N. Moriasi, and J. L. Steiner, “Field scale nitrogen load in surface runoff: impacts of management practices and changing climate,” Journal of Environmental Management, vol. 249, article 109327, 2019.
- R. W. Weaver and L. R. Frederick, “Effect of Inoculum Rate on Competitive Nodulation of Glycine max L. Merrill. I. Greenhouse Studies,” Agronomy Journal, vol. 66, no. 2, pp. 229–232, 1974.
- A. E. Hiltbold, D. L. Thurlow, and H. D. Skipper, “Evaluation of Commercial Soybean Inoculants by Various Techniques,” Agronomy Journal, vol. 72, no. 4, pp. 675–681, 1980.
- B. Fenta, S. Beebe, K. Kunert et al., “Field Phenotyping of Soybean Roots for Drought Stress Tolerance,” Agronomy, vol. 4, no. 3, pp. 418–435, 2014.
- S. Han, F. Cointault, C. Salon, and J.-C. Simon, “Automatic Detection of Nodules in Legumes by Imagery in a Phenotyping ContextComputer Analysis of Images and Patterns. CAIP 2015,” Tech. Rep., Springer, Cham, 2015.
- J. G. A. Barbedo, “Method for automatic counting root nodules using digital images,” in 2012 12th International Conference on Computational Science and Its Applications, pp. 159–161, Salvador, Brazil, 2012.
- M. Reynolds, S. Chapman, L. Crespo-Herrera et al., “Breeder Friendly Phenotyping,” Plant Science, vol. 295, article 110396, 2020.
- S. Dhondt, N. Wuyts, and D. Inzé, “Cell to whole-plant phenotyping: the best is yet to come,” Trends in Plant Science, vol. 18, no. 8, pp. 428–439, 2013.
- B. Elnashef, S. Filin, and R. N. Lati, “Tensor-based classification and segmentation of three-dimensional point clouds for organ-level plant phenotyping and growth analysis,” Computers and Electronics in Agriculture, vol. 156, pp. 51–61, 2019.
- M. K. Omari, “Digital image-based plant phenotyping: a review,” Korean Journal of Agricultural Science, vol. 47, no. 1, pp. 119–130, 2020.
- R. Pieruschka and U. Schurr, “Plant phenotyping: past, present, and future,” Plant Phenomics, vol. 2019, article 7507131, pp. 1–6, 2019.
- J. A. Atkinson, L. U. Wingen, M. Griffiths et al., “Phenotyping pipeline reveals major seedling root growth QTL in hexaploid wheat,” Journal of Experimental Botany, vol. 66, no. 8, pp. 2283–2292, 2015.
- J. Zhang, “Computer vision and machine learning for robust phenotyping in genome-wide studies,” Scientific Reports, vol. 7, no. 1, article 44048, 2017.
- Z. Zheng, S. Hey, T. Jubery et al., “Shared genetic control of root system architecture betweenZea maysandSorghum bicolor,” Plant Physiology, vol. 182, no. 2, pp. 977–991, 2020.
- Y. Jiang and C. Li, “Convolutional neural networks for image-based high-throughput plant phenotyping: a review,” Plant Phenomics, vol. 2020, article 4152816, pp. 1–22, 2020.
- A. K. Singh, B. Ganapathysubramanian, S. Sarkar, and A. Singh, “Deep learning for plant stress phenotyping: trends and future perspectives,” Trends in Plant Science, vol. 23, no. 10, pp. 883–898, 2018.
- S. Ghosal, D. Blystone, A. K. Singh, B. Ganapathysubramanian, A. Singh, and S. Sarkar, “An explainable deep machine vision framework for plant stress phenotyping,” Proceedings of the National Academy of Sciences, vol. 115, no. 18, pp. 4613–4618, 2018.
- K. Nagasubramanian, S. Jones, S. Sarkar, A. K. Singh, A. Singh, and B. Ganapathysubramanian, “Hyperspectral band selection using genetic algorithm and support vector machines for early identification of charcoal rot disease in soybean stems,” Plant Methods, vol. 14, no. 1, p. 86, 2018.
- H. S. Naik, J. Zhang, A. Lofquist et al., “A real-time phenotyping framework using machine learning for plant stress severity rating in soybean,” Plant Methods, vol. 13, no. 1, 2017.
- A. Akintayo, G. L. Tylka, A. K. Singh, B. Ganapathysubramanian, A. Singh, and S. Sarkar, “A deep learning framework to discern and count microscopic nematode eggs,” Scientific Reports, vol. 8, no. 1, article 9145, 2018.
- K. A. Parmley, R. H. Higgins, B. Ganapathysubramanian, S. Sarkar, and A. K. Singh, “Machine learning approach for prescriptive plant breeding,” Scientific Reports, vol. 9, no. 1, article 17132, 2019.
- K. Parmley, K. Nagasubramanian, S. Sarkar, B. Ganapathysubramanian, and A. K. Singh, “Development of optimized phenomic predictors for efficient plant breeding decisions using phenomic-assisted selection in soybean,” Plant Phenomics, vol. 2019, article 5809404, pp. 1–15, 2019.
- K. G. Falk, T. Z. Jubery, S. V. Mirnezami et al., “Computer vision and machine learning enabled soybean root phenotyping pipeline,” Plant Methods, vol. 16, no. 1, 2020.
- K. G. Falk, T. Z. Jubery, J. A. O’Rourke et al., “Soybean root system architecture trait study through genotypic, phenotypic, and shape-based clusters,” Plant Phenomics, vol. 2020, article 1925495, pp. 1–23, 2020.
- W. R. Fehr, C. E. Caviness, D. T. Burmood, and J. S. Pennington, “Stage of development descriptions for soybeans, Glycine max (L.) Merrill 1,” Crop Science, vol. 11, no. 6, pp. 929–931, 1971.
- F. Hart, “KUVACODE, Smart Shooter 4 Photography Software,” https://kuvacode.com/download.
- S. Fujishige, Submodular Functions and Optimization, Elsevier, 2005.
- A. Dutta and A. Zisserman, “The VIA annotation software for images, audio and video,” in Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 2019.
- K. Wei, R. Iyer, and J. Bilmes, “Submodularity in Data Subset Selection and Active Learning,” in Proceedings of the 32nd International Conference on Machine Learning, pp. 1954–1963, Lille, France, 2015.
- G. Cornuejols, M. Fisher, and G. L. Nemhauser, “On the uncapacitated location problem,” Annals of Discrete Mathematics, vol. 1, pp. 163–177, 1977.
- T. H. Cormen, E. L. Charles, L. R. Ronald, and S. Clifford, Introduction to Algorithms, MIT Press, 2009.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988, Venice, Italy, 2017.
- M. Zlocha, Q. Dou, and B. Glocker, “Improving RetinaNet for CT lesion detection with dense masks from weak RECIST labels,” in Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, Shenzhen, China, pp. 402–410, Springer, Cham, Switzerland, 2019.
- R. Storn and K. Price, “Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, 1997.
- T.-Y. Lin et al., “Microsoft Coco: Common Objects in Context,” in Computer Vision – ECCV 2014. ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., vol. 8693 of Lecture Notes in Computer Science, pp. 740–755, Springer, Cham, 2014.
- D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 2014.
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims, “A support vector method for optimizing average precision,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '07, pp. 271–278, Amsterdam, The Netherlands, 2007.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015, N. Navab, J. Hornegger, W. Wells, and A. Frangi, Eds., vol. 9351 of Lecture Notes in Computer Science, pp. 234–241, Springer, Cham, 2015.
- X. Glorot and Y. Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” in Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256, Sardinia, Italy, 2010.
- P. Jaccard, “The distribution of the flora in the alpine zone,” New Phytologist, vol. 11, no. 2, pp. 37–50, 1912.
- H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded Up Robust Features,” in Computer Vision – ECCV 2006. ECCV 2006, A. Leonardis, H. Bischof, and A. Pinz, Eds., vol. 3951 of Lecture Notes in Computer Science, pp. 404–417, Springer, Berlin, Heidelberg, 2006.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- S. Smita, J. Kiehne, S. Adhikari, E. Zeng, Q. Ma, and S. Subramanian, “Gene regulatory networks associated with lateral root and nodule development in soybean,” In Silico Plants, vol. 2, no. 1, 2020.
- S. Roy, W. Liu, R. S. Nandety et al., “Celebrating 20 years of genetic discoveries in legume nodulation and symbiotic nitrogen fixation,” The Plant Cell, vol. 32, no. 1, pp. 15–41, 2020.
- A. Bucksch, J. Burridge, L. M. York et al., “Image-based high-throughput field phenotyping of crop roots,” Plant Physiology, vol. 166, no. 2, pp. 470–486, 2014.
- L. M. York, “Phenotyping crop root crowns: general guidance and specific protocols for maize, wheat, and soybean,” in Root Development, D. Ristova and E. Barbez, Eds., vol. 1761 of Methods in Molecular Biology, pp. 23–32, Humana Press, New York, NY, 2018.
- R. S. Meena, V. Vijayakumar, G. S. Yadav, and T. Mitran, “Response and interaction of Bradyrhizobium japonicum and arbuscular mycorrhizal fungi in the soybean rhizosphere,” Plant Growth Regulation, vol. 84, no. 2, pp. 207–223, 2018.
- W. Ellouze, C. Hamel, R. M. DePauw, R. E. Knox, R. D. Cuthbert, and A. K. Singh, “Potential to breed for mycorrhizal association in durum wheat,” Canadian Journal of Microbiology, vol. 62, no. 3, pp. 263–271, 2016.
- W. Ellouze, C. Hamel, A. K. Singh, V. Mishra, R. M. DePauw, and R. E. Knox, “Abundance of the arbuscular mycorrhizal fungal taxa associated with the roots and rhizosphere soil of different durum wheat cultivars in the Canadian prairies,” Canadian Journal of Microbiology, vol. 64, no. 8, pp. 527–536, 2018.
- L. de Azevedo Peixoto, T. C. Moellers, J. Zhang et al., “Leveraging genomic prediction to scan germplasm collection for crop improvement,” PLoS One, vol. 12, no. 6, article e0179191, 2017.
- K. Nagasubramanian, T. Z. Jubery, F. F. Ardakani et al., “How useful is active learning for image-based plant phenotyping?” The Plant Phenome Journal, vol. 4, no. 1, 2021, e20020.
- A. Singh, S. Jones, B. Ganapathysubramanian et al., “Challenges and opportunities in machine-augmented plant stress phenotyping,” Trends in Plant Science, vol. 26, no. 1, pp. 53–69, 2021.
- A. Singh, B. Ganapathysubramanian, A. K. Singh, and S. Sarkar, “Machine learning for high-throughput stress phenotyping in plants,” Trends in Plant Science, vol. 21, no. 2, pp. 110–124, 2016.
Copyright © 2021 Talukder Zaki Jubery et al. Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).