Research Article | Open Access
Laura M. Zingaretti, Amparo Monfort, Miguel Pérez-Enciso, "Automatic Fruit Morphology Phenome and Genetic Analysis: An Application in the Octoploid Strawberry", Plant Phenomics, vol. 2021, Article ID 9812910, 14 pages, 2021. https://doi.org/10.34133/2021/9812910
Automatic Fruit Morphology Phenome and Genetic Analysis: An Application in the Octoploid Strawberry
Automatizing phenotype measurement will decisively contribute to increase plant breeding efficiency. Among phenotypes, morphological traits are relevant in many fruit breeding programs, as appearance influences consumer preference. Often, these traits are manually or semiautomatically obtained. Yet, fruit morphology evaluation can be enhanced using fully automatized procedures and digital images provide a cost-effective opportunity for this purpose. Here, we present an automatized pipeline for comprehensive phenomic and genetic analysis of morphology traits extracted from internal and external strawberry (Fragaria x ananassa) images. The pipeline segments, classifies, and labels the images and extracts conformation features, including linear (area, perimeter, height, width, circularity, shape descriptor, ratio between height and width) and multivariate (Fourier elliptical components and Generalized Procrustes) statistics. Internal color patterns are obtained using an autoencoder to smooth out the image. In addition, we develop a variational autoencoder to automatically detect the most likely number of underlying shapes. Bayesian modeling is employed to estimate both additive and dominance effects for all traits. As expected, conformational traits are clearly heritable. Interestingly, dominance variance is higher than the additive component for most of the traits. Overall, we show that fruit shape and color can be quickly and automatically evaluated and are moderately heritable. Although we study strawberry images, the algorithm can be applied to other fruits, as shown in the GitHub repository.
Demographic pressure and climate change are two of the major challenges of the 21st century. The worldwide population continues growing exponentially, and it is expected to reach in 2050 . Climate change generated by greenhouse gas emissions is possibly the greatest threat, as it is leading to extreme weather conditions, increasing areas of drought, and species extinction, among others [2–4]. In this adverse context, food production needs to be increased significantly. Increasing food production is not enough though. Breeding programs should also consider food safety and environmental care among their objectives [5, 6].
Artificial breeding is mainly responsible for the dramatic rise in food production we have witnessed for over a century. The main goal of plant and animal breeding is to utilize genetic variability of complex traits to increase performance and optimize use of resources. A current bottleneck in plant breeding programs is the evaluation of hundreds of lines under different environmental conditions [7, 8]. Plant breeding involves both genomics and phenomics, i.e., the expression of a genome in given environments. While available technologies can routinely and inexpensively scan the genome, high-throughput phenotypic characterization remains a difficult task [9, 10]. Automatizing phenotype measurement is then needed to increase the pace of artificial selection and is, unsurprisingly, one of the main targets of “Precision Agriculture” [11, 12].
The term “phenomics” or “phenometrics” was coined by Schork  as an attempt to understand events happening in between full genome and clinical endpoint phenotypes in complex human diseases. The expression quickly spreads to animal and plant breeding research as a concept that bridges the gap between genotypes and the “end-phenotypes.” Although the term phenomics was devised in line with “genomics,” that is, to describe the whole phenome of any organism, note that the phenome varies over time and between cells or tissues and can never be fully portrayed .
Although electronics applied to agriculture has a long history, a window of opportunities has emerged in the phenomics field with recent improvements in robotics, electronics, and computer science. The subjective, time-consuming, and often destructive human data collection is being replaced by miniaturized, cheap sensors, digital cameras, cell phones, unmanned aerial vehicles, and mass spectrometry, among others, that allow collecting hundreds of phenotype data objectively and inexpensively [9, 15–17]. The challenge now is to develop new and improved analytical tools, capable of transforming this wealth of data into valuable knowledge . This is a rapidly evolving field, and numerous software and pipelines to automatize phenotype collection are already available [18–22]. Many of these tools focus on the analysis of root images and, as far as we know, require more user intervention than we propose, making it impractical to analyze hundreds of images.
Digital images are among the cheapest and most widely available types of data. Imaging allows assessing morphological traits, which are highly relevant in numerous plant breeding schemes, since they can critically affect consumer acceptance especially in fruits [23–25]. Nevertheless, consumer preferences on appearance traits differ around the world and between communities. Like most traits, fruit shape is determined by genetic and environmental factors such as flower morphology or insect-mediated pollination [26, 27]. In all, morphological traits are among those with the highest heritability, which has allowed breeders to rapidly modify shape, size, and color patterns of agricultural products [20, 28–30].
Although numerous works have been developed in the area of fruit morphology, most of them have focused in the inheritance of linear measures, e.g., diameter, perimeter, and circularity [20, 31–33]. By definition, however, morphological traits are highly dimensional. Computing only linear, univariate phenotype leads to a loss of information by extremely simplifying the features of a shape [28, 34]. The use of geometric-morphometric approaches for shape analysis is warranted . Further, fruit shape has been traditionally evaluated subjectively  but can be enhanced by resorting to automatized procedures. For instance, hundreds of fruit pictures can be routinely and inexpensively collected, even in the field, with a cell phone camera. Automatized image processing and analysis can then dramatically change the way shape and color traits are collected and characterized.
Here, we present a comprehensive phenomic and genetic analysis pipeline for fruit morphology automatic analysis. Two main issues are addressed: (1) converting the raw data (fruit images) into a processed curated database and (2) designing an efficient analysis workflow to analyze the fruit shape and color phenome. Finally, genetic parameters are automatically inferred from pedigree information. We apply the pipeline to images of cultivated strawberry (Fragaria x ananassa) fruits. In addition to previous similar works in strawberry, e.g., Feldmann et al. , we provide a wholly automatized pipeline and new tools to analyze shape and color patterns.
2. Materials and Methods
2.1. Plant Material and Imaging Acquisition
Lines employed are part of the strawberry breeding program of the Planasa company (https://planasa.com/en/) and are routinely used to develop new elite genotypes. The experiment consisted of 24 crosses between 30 parental lines of F. x ananassa. We evaluated 20 randomly chosen lines per cross for all but 2 crosses, for which we chose 19 lines at random. A total of 478 seedlings and 30 parental genotypes were evaluated (Supp Table 1). Shape varied between the cultivars studied, e.g., circular, ellipsoid, or rhomboid, and color ranged from white to dark red.
Strawberries were grown in plastic semitunnel using standard cultivation practices in South West Spain (Huelva, 37°1659N, 7°918W). Fruits were collected from two individual plants of each line at the end of April 2018 in only one harvest event. Fruits from both plants were pooled in the photographs. We took images of 1 to 7 sliced fruits per genotype using a Nikon D80 digital camera. Samples were laid on a black surface, with the camera positioned at 35 cm height. The focal length was 18 mm, the manual aperture was f/8, and the exposure time was 1/8 seconds. Illumination consisted of two white light sources at both sides of the camera. In total, we took 508 images of pixels that contained all external and internal sides of fruits and the label for each genotype in the same image.
2.2. Preprocessing and Segmentation
The first step in the pipeline is to segment and recognize the objects, since each raw image contains internal and external fruits, a rule, a coin, and a printed genotype (the strawberry line) label. Image segmentation is needed for obtaining meaningful morphometric and color information. However, most of available technologies to determine the boundaries of the objects at the pixel level are usually semiautomatic and time-consuming [37–40]. Our fully automatic python-based pipeline takes the images of each strawberry line and outputs a curated database of square images (1000 px) and reads the genotype label (Figure 1). https://github.com/lauzingaretti/DeepAFS/blob/main/main.ipynb explains how to apply the most expensive part of this workflow to alternative experiments. Note that after creating a curated database, a standard multivariate analysis can be easily run using R/Python tools to shape evaluation.
For segmentation, the three-channel digital signals (RGB/BGR) are converted into grayscale and blurred using Gaussian filtering of size 5, to remove undesirable noise. The histogram information is used for image binarization, i.e., splitting the background and foreground. Here, we binarized the image using simply the mean value of the pixel as a threshold. The pipeline also allows Otsu thresholding , which is designed to automatically define the threshold by minimizing the “overlap” between two classes. After binarization, we performed erosion and dilation, the former shrinks the edges, and the latter makes the image region grow. Finally, the algorithm extracts the regions of interests (ROI) and determines whether it is a strawberry or an image label. The color pattern analysis allows us to classify the internal or external part of a fruit image. We here apply a -means clustering based on the information about the color mean, color standard deviation, and the ratio between them for all the fruits; i.e., we compute these 3 features for all the fruits, and then, we classify these observations into 2 clusters to split the internal and external part of the fruits. For the labels, the Optical Character Recognition (OCR) algorithm from PyTesseract library (https://pypi.org/project/pytesseract/) is used to read the genotype name and automatically label the image into the database. As a result, the algorithm delivers a curated database of 508 folders labeled with the name of each genotype and subfolders containing either the internal or external strawberry pictures (Figure 1, Algorithm 1 in Suppl. Info). All fruits are stored in square images (1000 px size or user-defined), with the fruits placed in the center and filled with black pixels.
2.3. Automatic Fruit Phenotyping
Once masks for either internal or external fruit images are obtained, an automatic phenotyping procedure is run for inside or outside parts separately (Figure 2). Classical linear descriptors and multivariate and deep learning techniques are combined from a novel perspective to dissect a variety of shape and color patterns. If pedigree or marker information is available, a genetic analysis can be employed to estimate variance components for each of the fruit phenotypes. In the following, we describe the main methods implemented in the pipeline of Figure 2.
2.4. Autoencoder and -Means to Infer Internal Color Patterns
We used an “autoencoder” (AE) network to perform an unsupervised clustering of the internal images. An autoencoder (Figure 3(a)) is an unsupervised machine learning technique that applies backpropagation to train a neural network where the outputs are the same values as the inputs . The AE gives new insight into image analysis by learning the structure about the data; i.e., it is not designed to copy an exact replicate of the input but instead to learn the repeatable and most useful properties.
We used a convolutional AE, as convolutional operations are especially suited for image analysis [42, 43]. These layers create a feature map from the input image, preserving the relationships between pixels in the original space (Figure 3(a)). Each convolution outputs a scored-filtered image, where a high score means a perfect match between the original image and filtered image. The output layer is obtained by applying the Rectified Linear Unit (ReLU) activation function. Finally, as usual, in any convolutional architecture, a max-pooling layer shrinks the output size and achieves a smoother representation, summarizing adjacent neuron outputs by computing their maximum (see accompanying GitHub).
The decoded images from an AE architecture are less noisy than the original ones, making it easier to detect repeatable/consistent color patterns. Our approach consists in taking five colors as reference: a class for the background (black) and four classes for the internal fruit color patterns, including calyx. The four “reference classes” were “orange-like” (198, 99, 35, in RGB coordinates), “quasired” (184, 46, 8), “pale” (194, 144, 78), and “green” (76, 75, 20) for sepals. We then perform a -means clustering with after removing the background, and we assigned each cluster to the nearest reference color using the Euclidean distance between the average color of each cluster in the sample and the reference coordinates. As a result of this step, the surface of each of 1900 strawberry images is split into four categories of colors.
2.5. Superpixel Algorithm to Remove the Calyx
Some of the fruit pictures contain sepals that interfere with fruit shape quantification and need to be removed prior to estimating shape parameters. For that purpose, we applied the Simple Linear Interactive Clustering (SLIC) algorithm  from the Python skimage library. SLIC is based on the “superpixel” concept. Basically, a superpixel is a group of pixels sharing perceptual and semantic information; e.g., the pixels in a superpixel are grouped together because of their color or texture features. The iterative algorithm starts with regularly spaced -centers at a given distance, user defined as , which are then relocated in the direction of the lowest gradient in a neighborhood window to avoid being at the edges of the image. Further, a pixel is assigned to a given cluster if its distance to the cluster’s center is smaller than the distance to the other centers in the search area, as determined by . Finally, the centers are recalculated by averaging all the pixels belonging to the superpixel. The iterative process ends when the residual error (distance between previous centers and recomputed ones) does not exceed a fixed threshold. SLIC outputs a set of meaningful clusters, splitting the background, the calyx, and the fruit. Knowing that all our fruits are centered in the image (the segmentation procedure outlined in Figure 1 ensured that every image was centered), the superpixel containing the central pixel matches with the fruit.
2.6. Univariate Phenotypes: Linear Descriptors
Numerous object shape descriptors exist in the literature. Particularly for fruits, a controlled vocabulary was established in Brewer et al. . Here, we implement a custom script to compute some standard linear measures: circularity, solidity, shape aspect , ellipse ratio , fruit perimeter and area, fruit width at 25% height, fruit width at 75% height, fruit width at 50% of height, total height, and maximum width. Circularity is a measure of the degree of roundness of a given object, defined as the ratio between the area of a given object and the area of a circle with the same convex perimeter; i.e., a value near one means a “globe” o “circular” shape. Solidity is the ratio between the area of the object and the area of the convex hull of a given shape. Most of the linear descriptors used here are standard in fruit shape analyses [18, 20, 32, 39, 45].
The external fruit color was measured using the CIELAB space, where indicates the luminosity and and are the chromatic coordinates. The variation on the index indicates the transition between green to red, where a higher value means a redder object. Variations in reflect the change between yellow and blue colors, i.e., a higher value refers to a “bluer” object.
2.7. Generalized Procrustes Analysis (GPA)
Shape is usually defined as all the geometric information that remains unchanged after filtering out the location, scale, and rotation effects of a given object . The above shape linear descriptors are standard in the literature but do not provide a whole shape portrayal. Alternatively to linear descriptors, shape variations can be described using “pseudolandmarks” , which identify points around the outline of the object. Here, 50 pseudolandmarks were defined as the intersection between 50 equally spaced conceptual lines starting from the centroid and the fruit contour (Figure 4(a)). Next, we performed a Procrustes analysis . The Procrustes analysis is aimed at finding the transformation T such that given two matrices and , the product best matches . The Generalized Procrustes Analysis (GPA) is an extension of the method devised to align many matrices simultaneously . In a morphometric analysis, this is done by averaging the distance between all the landmarks on a target shape and the corresponding points on a reference. The pseudolandmarks of the samples can then be analyzed as a multivariate object using, for instance, a principal component analysis (PCA). In addition, the pseudolandmark variability gives insight on the most important regions that determine the differences between shapes. We used the Momocs  and geomorph  R packages to run these analyses.
2.8. Elliptical Fourier Descriptors
An alternative approach to morphometric analysis is elliptical Fourier transformation . This method describes a closed curve as a sum of sine and cosine functions of growing frequencies. As its name suggests, Fourier harmonics are ellipses, and a larger number of harmonic means that more ellipses are fitted to a given contour. The second-order harmonic is simply one ellipse with the values of sine and cosine components for the - and -axis, respectively. As the strawberry fruit is a relatively simple shape, four harmonics were enough to describe all the shapes in the database, giving a total of 16 coefficients. A PCA of the Fourier components can also be employed to quantify morphometric variability, as in the Procrustes analysis. Geomorph  R package was employed for this purpose.
2.9. Conditional Variational Autoencoders (VAE) to Cluster Shapes
Fruit shape can also be addressed from a completely different angle, such as obtaining clusters of shapes to objectively classify fruits in groups of similar morphology . A standard approach consists of flattening the image and grouping the raw data, treating each pixel as a feature. Unfortunately, clustering algorithms are not exempt from the “curse of the dimensionality” problem  and they perform poorly as the number of analyzed dimensions increases, especially if noise is high.
A natural way to solve the aforementioned issue is to apply a dimension reduction technique before clustering. Although the classical autoencoders seem to be a good option, as shown above, AEs were conceived to perform a nonlinear and not isometric dimensionality reduction, and thus, they do not preserve the geometrical properties of the original space . Unlike traditional autoencoders, variational autoencoders (VAEs) [52–54] preserve distances and, importantly, are generative models (Figure 3(b)). The main difference between AE and VAE is that the latter encodes the input as a distribution over a latent space. Basically, given an input , VAE creates a latent distribution and the input reconstruction is obtained after sampling from the latent representation . The VAE does not only force the latent space to be continuous; it can also generate meaningful information, even with images that it has never seen before.
The key aspect in VAE training lies in the loss function, which includes a “reconstruction” and a “regularization” term. The former is the usual loss or the joint log-likelihood between the true and the VAE output, whereas the second is the entropy corresponding to the Kullback-Leibler divergence  between the latent distribution and the standard normal distribution . Without incorporating a regularization, the VAE behaves as AE, where the latent space is neither complete nor continuous. Regularization forces the latent distribution to be close to the normal standard, generating a continuous space of low variance centered in the origin, which is suitable for data clustering and generation .
Here, we run standard -means clustering of the latent space, with varying between 2 and 9 groups. We chose a maximum given that up to nine strawberry shapes have been proposed in the literature, in particular in the Japanese market . We assessed the cluster robustness using the silhouette index . This index determines how well each object fits into its cluster, taking into account intra- and between-class variations. The index ranges between -1 and 1, and a value close to 1 means that the cluster is compact and homogeneous. Importantly, the combination of VAE and clustering also allows us to use conditional VAE to generate the expected fruit pertaining to a specific group.
2.10. Genetic Parameter Inference
Genetic parameters determine how successful artificial selection will be and are therefore a critical parameter of any plant breeding scheme. Heritability () is the proportion of phenotypic variance explained by the genetic variation . To estimate , the degree of resemblance between relatives using the pedigree was used (see Supp Table 2). Take linear model where y represents the phenotype vector, averaged for each genotype; is the intercept; , and are the additive effects and dominance effects; and is the residual component, respectively; and are the additive and dominance covariance matrices, respectively. Both and can be computed recursively from the pedigree . In the presence of marker information, and can be computed as specified in [59, 60] and implemented in  but statistical inference is otherwise identical. Posterior distributions of the genetic parameters were obtained using Reproducing Kernel Hilbert Space (RKHS) regression with the BGLR package . The additive and dominance variance fractions were estimated as , where is the mean posterior estimate of .
3.1. Shape Descriptors
Shape linear descriptors, pseudolandmarks, and elliptical Fourier transforms for fruit shape were computed for the 1920 external images output from pipeline in Figure 1 and Algorithm S1. Figure 4(d) shows the minimum and maximum consensus for shape superimposition, suggesting that shapes vary between a “globose-like” to an “elongated-like” form in these samples. The standard deviation of the first PCA from GPA coordinates (Supp Fig. 1) of the tip, neck, and both sides around the neck is above the mean (Figure 4(c)). This suggests that these regions are responsible for the main shape variations in strawberry, in agreement with Feldmann et al. . Supp Fig. 1 shows the fruit shape variations from the Procrustes principal component analysis (Proc-PCA). The first principal component describes the variations between “elongated”- to “globose”-like. Observations with a negative score on that component correspond to elongated fruits, while those who have positive scores are “globose”-like fruits. A permutation-based Procrustes analysis of variance was conducted to assess the effect of the crosses on the fruit shape. The value obtained after 100 permutations shows a significant effect of the lines, i.e., genotypes, in the fruit shape (), suggesting that the shape is heritable (Supp Table 3).
We also set a fourth-order elliptical Fourier to describe the main strawberry shape variations (see Supp Figs. 2 and 3). As in the Procrustes analysis, variations in the first principal component of the elliptical analysis show that the strawberry shapes vary between “globose-like” to “elongated-like” (see a few examples in Supp Fig. 7). Similarly, the first component from elliptical PCA can also be used as a “morphological” descriptor. A -means clustering using the two first PCA components of Fourier transform similarly detects the two previously defined groups of shapes when setting (Supp Fig. 4).
Alternatively, one can directly identify the number of different shapes from a collection of images. We used a VAE (Figure 3(b)) to automatically discover the optimal number of shapes in our database, which again was (Supp Fig. 5 and 6). About 35% of the strawberries belong to the “globose-like” shape, whereas the remaining fruits were classified as “elongated-like” (Figures 5(a) and 5(b)).
Figure 4(e) shows a PCA on the linear descriptors, where the color of each sample is proportional to the predicted cluster probability. A dark color corresponds to a fully elongated shape, and a light blue, to a fully round fruit. Note that shape gradient is mainly observed along the second principal component. Interestingly, the most influential variables in this component are the fruit ratio between main and minor ellipse axis, the circularity, and solidity coefficients (Figure 4(f)). All of these are shape-related variables. It is not surprising that solidity and circularity are highly correlated, since the convex hull area increases when a shape digresses from a circle (circularity), and solidity approaches zero. The area, perimeter, and height are quasi-independent of the aforementioned descriptors and are not related with the shape clusters.
3.2. Color Descriptors
For the external side color in our dataset, the L channel ranged between 7.01 and 118.30, mean of 75.54; the channel ranged between 127.9 and 184.8, mean value of 167.1; and the channel had a mean of 175.4, ranging between 128.8 and 192.6.
Estimating the color of the internal fruit is more challenging than that of external parts, as it fluctuates in a wider range of patterns. Figure 6 shows the estimated percentages of each reference color for four chosen strawberries. Note that the percentage of “quasired” is zero and most of the fruit is computed as “pale” (~95%) for the first two, whitish fruits. Two colors, “quasired” and “orange-like,” predominate in the third fruit. Finally, the last fruit is almost red, as can be verified from the estimated quasired value (99%).
3.3. Genetic Parameter Estimation
Figure 7 shows the Bayesian estimates of heritability for all automatically extracted traits. We used the pedigree information to compute the additive and dominance relationship matrices, since we did not have genotypes. Like many polyploid species, strawberry is clonally propagated . Inferring the dominance component in these cases is critical, as clonal propagation allows a straightforward utilization of gene interaction . Interestingly, we found that dominance variance was higher than the additive component for most of the traits. The sum of both components ranged between 0.4 and 0.6, indicating that the traits are clearly heritable. The ellipse ratio and the ratio between height and width were the most heritable characters, exhibiting an important additive component. Elliptical Fourier components, as well the percentage of fruits of each of both categories obtained from VAE, also have a high heritability, for both additive and dominance components. Regarding the internal color, we find that the pale color has an important dominance component.
Over the last decades, plant and animal breeding programs have benefited from the development and cost reduction on genomic technologies [65, 66]. Breeding nevertheless depends of both genotype and phenotype, and our ability of characterizing the latter is much more limited compared to the former [9, 10]. In fact, one of the biggest challenges of “Precision Agriculture” is to transform large-scale datasets collected with sensors into phenotypic measurements that can be used for genetic improvement.
Consumer attitudes are increasingly shaping agricultural practices. In the case of fruits, consumer preferences are based primarily on fruit appearance. However, measuring this trait is not straightforward, as it is a complex mixture of shape and color patterns. A crucial aspect for improving appearance is then to characterize the color and shape of the fruits in an inexpensive and fast way. In this paper, we deliver a fully automatized pipeline that analyzes fruit appearance as complex multivariate data. While this is not the first study characterizing fruit shape variations, our procedure is quite more automatized than their predecessors as it requires minimal human intervention [18, 20, 40]. It also incorporates new features such as the use of variational autoencoders (VAE) to automatically detect the most likely number of underlying shapes or to cluster the internal color.
The pipeline presented here or previous efforts to automatize fruit morphology measurement by Feldmann et al.  are important steps to increase agriculture efficiency. They are by no means sufficient, and additional developments are warranted. A first limitation is that algorithms need to be trained in the specific dataset that will be used in production and can sometimes be difficult to generalize to different scenarios. A second limitation concerns the phenotypes measured. For instance, uniformity of shape and lack of blemishes (like depressions or creases) significantly impact the value of the product but were not studied here. Uniformity of fruits can be easily quantified, e.g., measuring the dispersion along landmarks (Figure 4(a)), whereas irregularities in color that may mean fruit damage can be more challenging. In the lab, as done here, perhaps, a suitable color clustering to associate color patterns with fruit damages could be envisaged. To be really useful, however, fruit damages should be evaluated once the product has been packaged, prior or after distribution, which would need distinct code from that employed here. The number of seeds is also important economically, but we found that a very high resolution is needed to quantify them. Finally, 3D approaches have also been evaluated in fruits, including strawberry [67, 68]. Three-D imaging is far more demanding in terms of sample collection and computationally than 2D [69, 70]. This hampers using 3D technologies as massively as 2D, although 3D has a number of advantages, mainly a far more realistic and comprehensive fruit representation. For instance, Li et al.  utilize 3D imaging to assess fruit uniformity and show that it can be characterized by combining up to six linear parameters.
Our algorithm requires images being taken on a homogeneous black or white surface, and field images are not allowed. To compare the shapes and colors, all shots must be taken in the same conditions, using the same digital camera placed at the same height and setting the same parameters: focal length, manual aperture, exposure time, and lighting. Scanned images are also allowed but the same scanning conditions must be followed in all images.
Although 2D digital images are among the easiest phenotypes to collect, analyzing them can be challenging, partly because object boundaries must be determined, a process known as feature extraction. Numerous classical [41, 44, 72] and deep learning approaches [73, 74] have been developed in computer vision and image processing to meet this objective. Here, we combined some of these methods to automatically segment fruit snapshots and read the fruit label. The main approach we used is not new, as it is based on an algorithm developed in the late seventies . However, we resort to novel techniques in order to remove undesirable image noise , and we characterize color pattern or classify fruits through a variational autoencoder (Figure 3) .
In this work, we characterize shape and color variations using several complementary methods, from naïve linear descriptors to multivariate and deep learning techniques. It is important to point out that results from all approaches are consistent and suggest that the fruits in our database can be classified into two groups, “globose-like” and “elongated-like” (Supp Figs. 5 and 6). We determine that the most variable regions are the neck, neck sides, and the tip of the fruit (Figures 4(b) and 4(c)). The “shape” linear descriptor, i.e., the ratio between fruit shape and height, is a good morphological descriptor (Figures 4(e) and 4(f)) and is as discriminative as more complex multivariate characterizations. An ANOVA on the Procrustes coordinates shows that genotype is significant ( value < 0.01, Supp Tab 2), another indirect indication that shape is heritable.
Shapes can be classified using standard clustering techniques with the number of clusters previously specified, as shown by Feldmann et al. in strawberry . Our results are in agreement with these authors’ in that we also find that shape is heritable and that a few components may be needed to classify shapes (Figure 4(f)). In addition to that approach, here, we propose a completely unsupervised manner based on variational autoencoders (Figure 3). The advantage of this analysis is that shape discovery not only can be automatized but also is capable of generating shapes not seen before. Predicting shapes and appearance of new genotypes can be a powerful tool to design new crosses, as the breeder can evaluate not only the average shape but also their variability in morphology. To our knowledge, VAEs have not been utilized for these purposes yet.
Here, we have explored multiple methods to describe fruit morphology. Although somewhat redundant, each metric has its own advantages and limitations. For instance, previous works (e.g., [18, 20]) show that the classical linear descriptor defined as the ratio between height and width is an accurate way to describe fruit shape variations in fruits like strawberries. The advantage of using this method lies in the principle of parsimony; i.e., it is the simplest way to characterize a shape. Albeit its simplicity, this measure is not complete enough to describe the many variations that can occur, since many vegetables and crops, like tomatoes, melons, cucurbits, and even strawberries, exhibit enormous morphological variations . In these scenarios, multivariate descriptors (like Fourier and Generalized Procrustes) are more suitable for the analysis. In turn and as mentioned above, generative methods (like deep variational autoencoders) can describe variation, with the potential to generate new fruit genotypes in silico, which may be useful for applying new breeding strategies. It is important to note that these methods can also be applied to leaves, flowers, and roots, which may have an even greater diversity of shapes compared to fruits. Therefore, having different complementary analyses available offers an important advantage to better understand the complexity of shape.
Describing internal color patterns is challenging, mainly because color is a quantitative multichannel character. We addressed this problem by defining three reference colors named as “quasired,” “orange-like,” and “pale.” We then automatically determined the percentage of color corresponding to each of these reference colors for each fruit using an autoencoder for fruit denoising and a -means for segmentation. The algorithm calculates the Euclidean distance between the three RGB coordinates obtained by means of clustering to the target color coordinates and classifies the cluster as belonging to one of the three targets whose distance is minimal. The color patterns are satisfactorily dissected, as can be seen in some picked images from the database (Figure 6).
The phenotype results from a complex interaction between the genotype and environmental factors. Portraying the phenotypes would not be worthwhile for breeding if the desirable characters could not be transmitted to the progeny. Thus, quantifying the heritability of all of these traits is crucial. Typically, genetic variance is decomposed in additive and nonadditive effects . Clonally propagated species like strawberry allows direct utilization of dominance and epistatic interaction. We used Bayesian modeling to estimate both additive and dominance effects. As can be observed in Figure 7 and Supp. Table 4, most traits are moderately heritable, and a high degree of variance is explained by the dominance component. In this scenario, prediction accuracy in genomic selection could possibly increase by including dominance in the model .
Nevertheless, data are from a single sampling season, making it not possible to estimate the variance caused by (). Therefore, heritabilities reported are likely overestimated. Further, the pedigree utilized considered only parents and offspring, while parents themselves are related, which was ignored except in a subset of parents. The effect in this case should be smaller than that of and should affect the variance of the estimates rather than bias, since relationships decrease quadratically with generation, and most information is contained in closest relatives .
We estimated heritabilities using pedigree information, but a similar study could be carried out if genetic markers were available. This would have the extra benefit of allowing to perform genome-wide association studies (GWAS) and to implement genomic selection [63, 76]. It is straightforward to implement these features in our pipeline. Association studies in humans, apple, or tomato have revealed genes or markers associated with human craniofacial shape [34, 78, 79], leaf variation , and tomato morphology . To the best of our knowledge, there is not a similar study in strawberry and there is still a long way to go to fully unravel the genetic basis of strawberry shape .
There is a need to develop analysis pipelines for plant high-throughput phenotyping, suitable to automate processes that are often subjective and time-consuming. Our workflow establishes a proof of concept in strawberry morphometrics, which can be transferred to other visual phenotypes and fruits with relatively minor modifications. We developed a python-based pipeline (https://github.com/lauzingaretti/DeepAFS/blob/main/main.ipynb) that shows how to apply our methodology to other fruits like apples, tomatoes, citrus, and prunus. This code is able to automatically read the fruit image, to segment it, and to compute some linear and color descriptors. This code also allows to save the segmented images into a predefined folder, as well as the fruit outline reference points used for posterior multivariate comparison. Overall, our results show that, although fruit shape is made up of a complex set of traits, it can be quickly and automatically evaluated and is moderately heritable (Figures 1, 2, and 7). Future improvements are still needed as, e.g., image segmentation is not always simple in field conditions and many additional phenotypes are of commercial interest (e.g., uniformity, blemishes). Future improvements should also address additional technological developments such as spectral and MIR images  and 3D imaging . Finally, a word of caution is that the user should be aware that artificial intelligence tools need thorough training in the specific conditions on which they are going to be employed and that optimizing algorithms may not be that simple.
Code is available at https://github.com/lauzingaretti/DeepAFS.
Conflicts of Interest
The authors declare no conflict of interest.
LMZ, AM, and MPE conceived the research. AM provided data. LMZ developed methods and code. LMZ and MPE wrote the manuscript with help from AM.
The authors would like to thank Planasa for providing the strawberry fruits under the Planasa-IRTA collaboration contract, headed by AM. LMZ was supported by a PhD grant from the Ministry of Economy and Science (MINECO, Spain). Work was funded by the MINECO grants AGL2016-78709-R and PID2019-108829RB-I00 to MPE and by the CERCA Programme/Generalitat de Catalunya. We acknowledge the financial support from the Spanish Ministry of Science and Innovation-State Research Agency (AEI), through the “Severo Ochoa Programme for Centres of Excellence in R&D” SEV-2015-0533 and CEX2019-000902-S.
Algorithm 1: create a segmented fruit database from raw data. Supp Table 1: scheme of the crosses used in the experiment. Supp Table 2: scheme of the pedigree used in the analysis. Supp Fig. 1: output plot from Procrustes principal component analysis (Proc-PCA). The analysis shows a variation between “elongated” and “globose”-like shape. Supp Table 3: output from Procrustes ANOVA, which evaluates the effect of the crosses on fruit shape. Supp Fig. 2: output from elliptical Fourier analysis. Supp Fig. 3: shape variation derived from PCA on elliptical Fourier analysis. Supp Fig. 4: clustering on the elliptical Fourier components characterizing fruit shapes. Supp Fig. 5: silhouette analysis for number of clusters on shape latent space from variational autoencoder output. Supp Fig. 6: left: silhouette plot for each cluster (0 and 1); right: visualization of the clustered data in the latent space. Supp Fig. 7: the upper panel shows two examples of fruit belonging to the cluster. Suppl. Table 4: heritability values for all the shape- and color-related traits and the additive and dominant components were evaluated. (Supplementary Materials)
- M. C. Hunter, R. G. Smith, M. E. Schipanski, L. W. Atwood, and D. A. Mortensen, “Agriculture in 2050: recalibrating targets for sustainable intensification,” Bioscience, vol. 67, no. 4, pp. 386–391, 2017.
- J. Carnicer, M. Coll, M. Ninyerola, X. Pons, G. Sánchez, and J. Peñuelas, “Widespread crown condition decline, food web disruption, and amplified tree mortality with increased climate change-type drought,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 4, pp. 1474–1478, 2011.
- T. Wernberg, D. A. Smale, F. Tuya et al., “An extreme climatic event alters marine ecosystem structure in a global biodiversity hotspot,” Nature Climate Change, vol. 3, no. 1, pp. 78–82, 2013.
- G. C. Hegerl, H. Hanlon, and C. Beierkuhnlein, “Elusive extremes,” Nature Geoscience, vol. 4, no. 3, pp. 142-143, 2011.
- L. L. Porfirio, D. Newth, J. J. Finnigan, and Y. Cai, “Economic shifts in agricultural production and trade due to climate change,” Palgrave Communications, vol. 4, no. 1, 2018.
- G. A. Lobos, A. V. Camargo, A. del Pozo, J. L. Araus, R. Ortiz, and J. H. Doonan, “Editorial: plant phenotyping and phenomics for plant breeding,” Frontiers in Plant Science, vol. 8, article 2181, 2017.
- R. Pieruschka and U. Schurr, “Plant phenotyping: past, present, and future,” Plant Phenomics, vol. 2019, pp. 1–6, 2019.
- D. A. Fasoula and V. A. Fasoula, “Gene action and plant breeding,” in Plant Breeding Reviews, vol. 15, pp. 315–374, John Wiley & Sons, Inc., 2010.
- F. Tardieu, L. Cabrera-Bosquet, T. Pridmore, and M. Bennett, “Plant phenomics, from sensors to knowledge,” Current Biology, vol. 27, no. 15, pp. R770–R783, 2017.
- L. Awada, P. W. B. Phillips, and S. J. Smyth, “The adoption of automated phenotyping by plant breeders,” Euphytica, vol. 214, no. 8, 2018.
- A.-K. Mahlein, “Present and future trends in plant disease detection,” Plant Disease, vol. 100, pp. 1–11, 2016.
- J. V. Stafford, “Implementing precision agriculture in the 21st century,” Journal of Agricultural Engineering Research, vol. 76, no. 3, pp. 267–275, 2000.
- N. J. Schork, “Genetics of complex disease approaches, problems, and solutions,” American Journal of Respiratory and Critical Care Medicine, vol. 156, no. 4, pp. S103–S109, 1997.
- D. K. Großkinsky, J. Svensgaard, S. Christensen, and T. Roitsch, “Plant phenomics and the need for physiological phenotyping across scales to narrow the genotype-to-phenotype knowledge gap,” Journal of Experimental Botany, vol. 66, pp. 5429–5440, 2015.
- C. Zhao, Y. Zhang, J. Du et al., “Crop phenomics: current status and perspectives,” Frontiers in Plant Science, vol. 10, 2019.
- A. Gracia-Romero, S. C. Kefauver, J. A. Fernandez-Gallego, O. Vergara-Díaz, M. T. Nieto-Taladriz, and J. L. Araus, “UAV and ground image-based phenotyping: a proof of concept with durum wheat,” Remote Sensing, vol. 11, article 1244, 2019.
- A. Ruckelshausen and L. Busemeyer, “Toward digital and image-based phenotyping,” in Phenomics Crop Plants Trends, Options Limitations, J. Kumar, A. Pratap, and S. Kumar, Eds., Springer, New Delhi, 2015.
- M. J. Feldmann, M. A. Hardigan, R. A. Famula et al., “Multi-dimensional machine learning approaches for fruit shape phenotyping in strawberry,” GigaScience, vol. 9, pp. 1–17, 2020.
- S. D. Turner, S. L. Ellison, D. A. Senalik, P. W. Simon, E. P. Spalding, and N. D. Miller, “An automated image analysis pipeline enables genetic studies of shoot and root morphology in carrot (daucus carota l.),” Frontiers in Plant Science, vol. 871, 2018.
- M. T. Brewer, L. Lang, K. Fujimura, N. Dujmovic, S. Gray, and E. Van Der Knaap, “Development of a controlled vocabulary and software application to analyze fruit shape variation in tomato and other plant species,” Plant Physiology, vol. 141, pp. 15–25, 2006.
- A. Das, H. Schneider, J. Burridge et al., “Digital imaging of root traits (DIRT): a high-throughput computing and collaboration platform for field-based root phenomics,” Plant Methods, vol. 11, 2015.
- A. Seethepalli, H. Guo, X. Liu et al., “RhizoVision crown: an integrated hardware and software platform for root crown phenotyping,” Plant Phenomics, vol. 2020, pp. 1–15, 2020.
- S. R. Jaeger, L. Machín, J. Aschemann-Witzel, L. Antúnez, F. R. Harker, and G. Ares, “Buy, eat or discard? A case study with apples to explore fruit quality perception and food waste,” Food Quality and Preference, vol. 69, pp. 10–20, 2018.
- J. L. Gilbert, J. W. Olmstead, T. A. Colquhoun, L. A. Levin, D. G. Clark, and H. R. Moskowitz, “Consumer-assisted selection of blueberry fruit quality traits,” HortScience, vol. 49, pp. 864–873, 2014.
- K. S. Lewers, M. J. Newell, E. Park, and Y. Luo, “Consumer preference and physiochemical analyses of fresh strawberries from ten cultivars,” International Journal of Fruit Science, vol. 20, no. sup2, pp. 733–756, 2020.
- M. González, E. Baeza, J. L. Lao, and J. Cuevas, “Pollen load affects fruit set, size, and shape in cherimoya,” Scientia Horticulturae, vol. 110, no. 1, pp. 51–56, 2006.
- B. K. Klatt, A. Holzschuh, C. Westphal et al., “Bee pollination improves crop quality, shelf life and commercial value,” Proceedings of the Royal Society B: Biological Sciences, vol. 281, article 20132440, 2014.
- C. Peter Klingenberg, “Evolution and development of shape: integrating quantitative approaches,” Nature Reviews Genetics, vol. 11, pp. 623–635, 2010.
- G. R. Rodríguez, S. Muños, C. Anderson et al., “Distribution of SUN, OVATE, LC, and FAS in the tomato germplasm and the relationship to fruit shape diversity,” Plant Physiology, vol. 156, no. 1, pp. 275–285, 2011.
- A. J. Monforte, A. Diaz, A. Caño-Delgado, and E. Van Der Knaap, “The genetic basis of fruit morphology in horticultural crops: lessons from tomato and melon,” Journal of Experimental Botany, vol. 65, no. 16, pp. 4625–4637, 2014.
- M. T. Brewer, J. B. Moyseenko, A. J. Monforte, and E. Van Der Knaap, “Morphological variation in tomato: a comprehensive study of quantitative trait loci controlling fruit shape and development,” Journal of Experimental Botany, vol. 58, no. 6, pp. 1339–1349, 2007.
- M. Rashidi and F. Keshavarzpour, “Classification of apple size and shape based on mass and outer dimensions,” Journal of Agriculture and Environmental Sciences, vol. 9, pp. 618–621, 2010.
- N. Mezghani, I. Zaouali, W. B. Amri et al., “Fruit morphological descriptors as a tool for discrimination of Daucus L. germplasm,” Genetic Resources and Crop Evolution, vol. 61, no. 2, pp. 499–510, 2014.
- P. Claes, D. K. Liberton, K. Daniels et al., “Modeling 3D facial shape from DNA,” PLoS Genetics, vol. 10, 2014.
- I. L. Dryden and K. V. Mardia, “Statistical Shape Analysis,” Tech. Rep., Wiley series in probability and statistics, 1998.
- UPOV, Strawberry: guidelines for the conduct of tests for distinctness, uniformity and stability, Upov, 2012.
- J. Schindelin, C. T. Rueden, M. C. Hiner, and K. W. Eliceiri, “The ImageJ ecosystem: an open platform for biomedical image analysis,” Molecular reproduction and development, vol. 82, no. 7-8, pp. 518–529, 2015.
- J. Schindelin, I. Arganda-Carreras, E. Frise et al., “Fiji: an open-source platform for biological-image analysis,” Nature methods, vol. 9, no. 7, pp. 676–682, 2012.
- A. Darrigues, J. Hall, E. Van Der Knaap, D. M. Francis, N. Dujmovic, and S. Gray, “Tomato analyzer-color test: a new tool for efficient digital phenotyping,” Journal of the American Society for Horticultural Science, vol. 133, no. 4, pp. 579–586, 2008.
- M. A. Gehan, N. Fahlgren, A. Abbasi et al., “PlantCV v2: image analysis software for high-throughput plant phenotyping,” PeerJ, vol. 5, article e4088, 2017.
- N. Otsu, “Threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-9, pp. 62–66, 1979.
- I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Cambridge, 2016.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, SLIC superpixels, 2010.
- L. Diaz-Garcia, G. Covarrubias-Pazaran, B. Schlautman, E. Grygleski, and J. Zalapa, “Image-based phenotyping for identification of QTL determining fruit shape and size in American cranberry (Vaccinium macrocarpon L.),” PeerJ, vol. 2018, pp. 1–19, 2018.
- J. C. Gower, “Generalized procrustes analysis,” Psychometrika, vol. 40, no. 1, pp. 33–51, 1975.
- V. Bonhomme, S. Picq, C. Gaucherel, and J. Claude, “Momocs: outline analysis using R,” Journal of Statistical Software, vol. 56, pp. 1–24, 2014.
- D. C. Adams and E. Otárola-Castillo, “Geomorph: anrpackage for the collection and analysis of geometric morphometric shape data,” Methods in Ecology and Evolution, vol. 4, no. 4, pp. 393–399, 2013.
- F. P. Kuhl and C. R. Giardina, “Elliptic Fourier features of a closed contour,” Computer graphics and image processing, vol. 18, no. 3, pp. 236–258, 1982.
- R. Bellman, Dynamic programming Princeton University Press, New Jersey Google Scholar, Princeton, NJ, 1957, https://press.princeton.edu/books/paperback/9780691146683/dynamic-programming.
- A. Gropp, M. Atzmon, and Y. Lipman, “Isometric autoencoders,” 2020, https://arxiv.org/abs/2006.09289.
- D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 2013.
- D. P. Kingma, “Fast gradient-based inference with continuous latent variable models in auxiliary form,” 2013, http://arxiv.org/abs/1306.0733.
- D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic back-propagation and variational inference in deep latent Gaussian models,” in International Conference on Machine Learning, vol. 32, pp. 1278–1286, Bejing, China, 2014, http://jmlr.org/proceedings/papers/v32/rezende14.html.
- T. Ishikawa, A. Hayashi, S. Nagamatsu et al., “Classification of strawberry fruit shape by machine learning,” International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, vol. XLII-2, pp. 463–470, 2018.
- P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987.
- D. S. Falconer, T. F. C. Mackay, and R. Frankham, Introduction to quantitative genetics, Trends in Genetics, 4th edition, 1996.
- A. Nazarian and S. A. Gezan, “GenoMatrix: a software package for pedigree-based and genomic prediction analyses on complex traits,” Journal of Heredity, vol. 107, no. 4, pp. 372–379, 2016.
- P. M. VanRaden, “Efficient methods to compute genomic predictions,” Journal of dairy science, vol. 91, no. 11, pp. 4414–4423, 2008.
- Z. G. Vitezica, L. Varona, and A. Legarra, “On the additive and dominant variance and covariance of individuals within the genomic selection scope,” Genetics, vol. 195, no. 4, pp. 1223–1230, 2013.
- R. R. Amadeu, C. Cellon, J. W. Olmstead, A. A. Garcia, M. F. Resende Jr., and P. R. Muñoz, “AGHmatrix: R package to construct relationship matrices for autotetraploid and diploid species: a blueberry example,” The plant genome, vol. 9, no. 3, 2016.
- P. Pérez and G. De Los Campos, “Genome-wide regression and prediction with the BGLR statistical package,” Genetics, vol. 198, no. 2, pp. 483–495, 2014.
- S. A. Gezan, L. F. Osorio, S. Verma, and V. M. Whitaker, “An experimental validation of genomic selection in octoploid strawberry,” Horticulture research, vol. 4, no. 1, article 16070, 2017.
- W. Grüneberg, R. Mwanga, M. Andrade, and J. Espinoza, “Selection methods. Part 5: breeding clonally propagated crops,” Plant breeding and farmer participation, pp. 275–322, 2009, http://www.cabdirect.org/abstracts/20103075062.html.
- C. D. Robertsen, R. L. Hjortshøj, and L. L. Janss, “Genomic selection in cereal breeding,” Agronomy, vol. 9, no. 2, p. 95, 2019.
- J. Crossa, P. Pérez-Rodríguez, J. Cuevas et al., “Genomic selection in plant breeding: methods, models, and perspectives,” Trends in Plant Science, vol. 22, no. 11, pp. 961–975, 2017.
- J. Q. He, R. J. Harrison, and B. Li, “A novel 3D imaging system for strawberry phenotyping,” Plant Methods, vol. 13, pp. 1–8, 2017.
- B. Li, H. M. Cockerton, A. W. Johnson et al., “Defining strawberry shape uniformity using 3D imaging and genetic mapping,” Horticulture research, vol. 7, no. 1, 2020.
- M. Wahabzada, S. Paulus, K. Kersting, and A.-K. Mahlein, “Automated interpretation of 3D laserscanned point clouds for plant organ segmentation,” BMC bioinformatics, vol. 16, p. 248, 2015.
- S. Paulus, “Measuring crops in 3D: using geometry for plant phenotyping,” Plant Methods, vol. 15, no. 1, 2019.
- B. Li, H. M. Cockerton, A. W. Johnson, and A. Karlström, “Defining strawberry uniformity using 3D,” Imaging, vol. 44, 2020.
- M. I. Unit, “A review on image segmentation techniques,” Pattern recognition, vol. 26, 1993.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, 2017.
- V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
- X.-J. Mao, C. Shen, and Y.-B. Yang, Image restoration using convolutional auto-encoders with symmetric skip connections, pp. 1–17, 2016, http://arxiv.org/abs/1606.08921.
- L. M. Zingaretti, S. A. Gezan, L. F. Ferrão et al., “Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species,” Frontiers in plant science, vol. 11, pp. 1–14, 2020.
- D. A. L. Lourenco, I. Misztal, S. Tsuruta et al., “Are evaluations on young genotyped animals benefiting from the past generations?” Journal of Dairy Science, vol. 97, no. 6, pp. 3930–3942, 2014.
- P. Claes, J. Roosenboom, J. D. White et al., “Genome-wide mapping of global-to-local genetic effects on human facial shape,” Nature genetics, vol. 50, no. 3, pp. 414–423, 2018.
- M. Galvánek, K. Furmanová, I. Chalás, and J. Sochor, “Automated facial landmark detection, comparison and visualization,” in Proceedings of the 31st Spring Conference on Computer Graphics, pp. 7–14, Smolenice, Slovakia, 2015.
- Z. Migicovsky, M. Li, D. H. Chitwood, and S. Myles, “Morphometrics reveals complex and heritable apple leaf shapes,” Frontiers in Plant Science, vol. 8, pp. 1–14, 2018.
- A. Gaston, S. Osorio, B. Denoyes, and C. Rothan, “Applying the Solanaceae strategies to strawberry crop improvement,” Trends in Plant Science, vol. 25, no. 2, pp. 130–140, 2020.
Copyright © 2021 Laura M. Zingaretti et al. Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).