Database/Software Article | Open Access
Etienne David, Mario Serouart, Daniel Smith, Simon Madec, Kaaviya Velumani, Shouyang Liu, Xu Wang, Francisco Pinto, Shahameh Shafiee, Izzat S. A. Tahir, Hisashi Tsujimoto, Shuhei Nasuda, Bangyou Zheng, Norbert Kirchgessner, Helge Aasen, Andreas Hund, Pouria Sadhegi-Tehran, Koichi Nagasawa, Goro Ishikawa, Sébastien Dandrifosse, Alexis Carlier, Benjamin Dumont, Benoit Mercatoris, Byron Evers, Ken Kuroki, Haozhou Wang, Masanori Ishii, Minhajul A. Badhon, Curtis Pozniak, David Shaner LeBauer, Morten Lillemo, Jesse Poland, Scott Chapman, Benoit de Solan, Frédéric Baret, Ian Stavness, Wei Guo, "Global Wheat Head Detection 2021: An Improved Dataset for Benchmarking Wheat Head Detection Methods", Plant Phenomics, vol. 2021, Article ID 9846158, 9 pages, 2021. https://doi.org/10.34133/2021/9846158
Global Wheat Head Detection 2021: An Improved Dataset for Benchmarking Wheat Head Detection Methods
The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD_2020 has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience, a few avenues for improvements have been identified regarding data size, head diversity, and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and complemented by adding 1722 images from 5 additional countries, allowing for 81,553 additional wheat heads. We now release in 2021 a new version of the Global Wheat Head Detection dataset, which is bigger, more diverse, and less noisy than the GWHD_2020 version.
Quality training data is essential for the deployment of deep learning (DL) techniques to get a general model that can scale on all the possible cases. Increasing dataset size, diversity, and quality is expected to be more efficient than increasing network complexity and depth . Datasets like ImageNet  for classification or MS COCO  for instance detection are crucial for researchers to develop and rigorously benchmark new DL methods. Similarly, the importance of getting plant- or crop-specific datasets is recognized within the plant phenotyping community ([4–10], p. 2, [11–13]). These datasets allow benchmarking the algorithm performances used to estimate phenotyping traits while encouraging computer vision experts to further improvement (, p. 2, [14–17]). The emergence of affordable RGB cameras and platforms, including UAVs and smartphones, makes in-field image acquisition easily accessible. These high-throughput methods are progressively replacing manual measurement of important traits such as wheat head density. Wheat is a crop grown worldwide, and the number of heads per unit area is one of the main components of yield potential. Creating a robust deep learning model performing over all the situations requires a dataset of images covering a wide range of genotypes, sowing density and pattern, plant state and stage, and acquisition conditions. To answer this need for a large and diverse wheat head dataset with consistent and quality labeling, we developed in 2020 the Global Wheat Head Detection (GWHD_2020)  that was used to benchmark methods proposed in the computer vision community and recommend best practices to acquire images and keep track of the metadata.
The GWHD_2020 dataset results from the harmonization of several datasets coming from nine different institutions across seven countries and three continents. There are already 27 publications [19–45] (accessed July 2021) that have reported their wheat head detection model using the GWHD_2020 dataset as the standard for training/testing data. A “Global Wheat Detection” competition hosted by Kaggle was also organized, attracting 2245 teams across the world , leading to improvements in wheat head detection models [23, 25, 31, 41]. However, issues with the GWHD_2020 dataset were detected during the competition, including labeling noise and an unbalanced test dataset.
To provide a better benchmark dataset for the community, the GWHD_2021 dataset was organized with the following improvements: (1) the GWHD_2020 dataset was checked again to eliminate few poor-quality images, (2) images were re-labeled to avoid consistency issues, (3) a wider range of developmental stages from the GWHD_2020 sites was included, and (4) datasets from 5 new countries (the USA, Mexico, Republic of Sudan, Norway, and Belgium) were added. The resulting GWHD_2021 dataset contains 275,187 wheat heads from 16 institutions distributed across 12 countries.
2. Materials and Methods
The first version of GWHD_2020, used for the Kaggle competition, was divided into several subdatasets. Each subdataset represented all images from one location, acquired with one sensor while mixing several stages. However, wheat head detection models may be sensitive to the developmental stage and acquisition conditions: at the beginning of head emergence, a part of the head is barely visible because it is still not fully out from the last leaf sheath and possibly masked by the awns. Further, during ripening, wheat heads tend to bend and overlap, leading to more erratic labeling. A redefinition of the subdataset was hence necessary to help investigate the effect of the developmental stage on model performances. The new definition of a subdataset was then formulated as “a consistent set of images acquired over the same experimental unit, during the same acquisition session with the same vector and sensor.” A subdataset defines therefore a domain. This new definition forced to split the original GWHD_2020 subdatasets into several smaller ones. The UQ_1 was split into 6 much smaller subdatasets, Arvalis_1 was split into 3 subdatasets, Arvalis_3 into 2 subdatasets, and utokyo_1 into 2 subdatasets. However, in the case of utokyo_2 which was a collection of images taken by farmers at different stages and in different fields, the original subdataset was kept. Overall, the 11 original subdatasets in GWHD_2020 were distributed into 19 subdatasets for GWHD_2021.
Almost 2000 new images were added to GWHD_2020, constituting a major improvement. A part of the new images comes from the institutions already contributing to GWHD_2020 and was collected during a different year and/or at a different location. This was the case for Arvalis (Arvalis_7 to Arvalis_12), University of Queensland (UQ_7 to UQ_11), Nanjing Agricultural University (NAU_2 and NAU_3), and University of Tokyo (Utokyo_1). In addition, 14 new subdatasets were included, coming from 5 new countries: Norway (NMBU), Belgium (Université of Liège ), United States of America (Kansas State University , TERRA-REF ), Mexico (CIMMYT), and Republic of Sudan (Agricultural Research Council). All these images were acquired at a ground sampling distance between 0.2 and 0.4 mm, i.e., similar to that of the images in GWHD_2020. Because none of them was already labeled, a sample was selected by taking no more than one image per microplot, which was randomly cropped to px patches that will be called images in the following for the sake of simplicity.
With the addition of 1722 images and 86,000 wheat heads, the GWHD_2021 dataset contains 6500 images and 275,000 wheat heads. The increase in the number of subdatasets from 18 to 47 leads to a larger diversity between them which can be observed on Figure 1. The subdatasets are described in Table 1. However, the new definition of a subdataset led also to more unbalanced subdatasets: the smallest (Arvalis_8) contains only 20 images, while the biggest (ETHZ_1) contains 747 images. This provides the opportunity to possibly take advantage of the data distribution to improve model training. Each subdataset has been visually assigned to several development stage classes depending on the respective color of leaves and heads (Figure 2): postflowering, filling, filling-ripening, and ripening. Examples of the different stages are presented in Figure 2. While being approximative, this metadata is expected to improve model training.
VLB: Villiers le Bâcle; VSC: Villers-Saint-Christophe. Utokyo_1 and Utokyo_2 were taken at the same location with different sensors. Utokyo_3 is a special subdataset made from images coming from a large variety of farmers in Hokaido between 2016 and 2019. Italic: Europe: bold: North America; underline: Asia; bold italic: Oceania; bold underline: Africa.
3. Dataset Diversity Analysis
In comparison to GWHD_2020, the GWHD_2021 dataset puts emphasis on metadata documentation of the different subdatasets, as described in the discussion section of David et al. . Alongside the acquisition platform, each subdataset has been reviewed and a development stage was assigned to each, except for Utokyo_3 (formerly utokyo_2) as it is a collection of images from various farmer fields and development stages. Globally, the GWHD_2021 dataset covers well all development stages ranging from postanthesis to ripening (Figure 2).
The diversity between images within the GWHD_2021 dataset was documented using the method proposed by Tolias et al. . The deep learning image features were first extracted from the VGG-16 deep network pretrained on the ImageNet dataset that is considered representing well the general features of RGB images. We then selected the last layer which has a size of and summed it into a unique vector of 512 channels, which is then normalized. Then, the UMAP dimentionality reduction algorithm  was used to project representations into a 2D space. The UMAP algorithm is used to keep the existing clusters during the projection to a low-dimension space. This 2D space is expected to capture the main features of the images. Results (Figure 3) demonstrate that the test dataset used for GWHD_2020 was biased in comparison to the training dataset. The subdatasets added in 2021 populate more evenly the 2D space which is expected to improve the robustness of the models.
4. Presentation of Global Wheat Challenge 2021 (GWC 2021)
The results from the Kaggle challenge based on GWHD_2020 have been analyzed by the authors . The findings emphasize that the design of a competition is critical to enable solutions that improve the robustness of the wheat head detection models. The Kaggle competition was based on a metric that was averaged across all test images, without distinction for the subdatasets, and it was biased toward a strict match of the labelling. This artificially enhances the influence on the global score of the largest datasets such as utokyo_1 (now split into Utokyo_1 and Utokyo_2). Further, the metrics used to score the agreement with the labeled heads and largely used for big datasets, such as MS COCO, appear to be less efficient when some heads are labeled in a more uncertain way as it was the case in several situations depending on the development stage, illumination conditions, and head density. As a result, the weighted domain accuracy is proposed as a new metric . The accuracy computed over image belonging to domain , , is classically defined as where TP, FN, and FP are, respectively, the number of true positive, false negative, and false positive found in image . The weighted domain accuracy (WDA) is the weighted average of all domain accuracies: where is the number of domains (subdatasets) and is the number of images in domain . The training, validation, and test datasets used are presented in Section 5.
The results of the Global Wheat Challenge 2021 are summarized in Table 2. The reference method is a faster-RCN with the same parameters than in the research paper GWHD_2020  and trained on the GWHD_2021 (Global Wheat Challenge 2021 split) training dataset. The full leaderboard can be found at https://www.aicrowd.com/challenges/global-wheat-challenge-2021/leaderboards.
5. How to Use/FAQ
(i)How to download? The dataset can be download on Zenodo: https://zenodo.org/record/5092309(ii)What is the license of the dataset? The dataset is under the MIT license, allowing for reuse without restriction(iii)How to cite the dataset? The present paper can be cited when using the GWHD_2021 dataset. However, cite preferentially  for wheat head detection challenges or when discussing the difficulty to constitute a large datasets(iv)How to benchmark? Depending on the objectives of the study, we recommend two sets of training, validation, and test (Table 3):(a)The Global Wheat Challenge 2021 split when the dataset is used for phenotyping purpose, to allow direct comparison with the winning solutions(b)The “GlobalWheat-WILDS” split is the one used for the WILDS paper . We recommand to use the GlobalWheat-WILDS split when working on out-of-domain distribution shift problems
It is further recommended to keep the weighted domain accuracy for comparison with previous works.
The second edition of the Global Wheat Head Detection, GWHD_2021, alongside the organization of a second Global Wheat Challenge is an important step for illustrating the usefulness of open and shared data across organizations to further improve high-throughput phenotyping methods. In comparison to the GWHD_2020 dataset, it represents five new countries, 22 new subdatasets, 1200 new images, and 120,000 new-labeled wheat heads. Its revised organization and additional diversity are more representative of the type of images researchers and agronomists can acquire across the world. The revised metrics used to evaluate the models during the Global Wheat Challenge 2021 can help researchers to benchmark one-class localization models on a large range of acquisition conditions. GWHD_2021 is expected to accelerate the building of robust solutions. However, progress on the representation of developing countries is still expected and we are open to new contributions from South America, Africa, and South Asia. We started to include nadir view photos from smartphones, to get a more comprehensive dataset and train reliable models for such affordable devices. Additional works are required to adapt such an approach to other vectors such as a camera mounted on unmanned aerial vehicle, or other high-resolution cameras working in other spectral domains. Further, it is planned to release wheat head masks alongside the bounding box given the very large number of boxes that already exists and provides more associated metadata.
The dataset is available on Zenodo (https://zenodo.org/record/5092309).
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
We would like to thank the company “Human in the loop”, which corrected and labeled the new datasets. The help of Frederic Venault (INRAe Avignon) was also precious to check the labelled images. The work received support from ANRT for the CIFRE grant of Etienne David, cofunded by Arvalis for the project management. The labelling work was supported by several companies and projects, including Canada: The Global Institute Food Security, University of Saskatchewan which supported the organization of the competition. France: This work was supported by the French National Research Agency under the Investments for the Future Program, referred as ANR-16-CONV-0004 PIA #Digitag. Institut Convergences Agriculture Numérique, Hiphen supported the organization of the competition. Japan: Kubota supported the organization of the competition. Australia: Grains Research and Development Corporation (UOQ2002-008RTX machine learning applied to high-throughput feature extraction from imagery to map spatial variability and UOQ2003-011RTX INVITA—a technology and analytics platform for improving variety selection) supported competition.
- N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh, and L. M. Aroyo, Everyone wants to do the model work, not the data work: data cascades in high-stakes AI, New York, NY, USA, 2021.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
- T.-Y. Lin et al., “Microsoft coco: common objects in context,” European conference on computer vision, pp. 740–755, 2014.
- J. A. Cruz, X. Yin, X. Liu et al., “Multi-modality imagery database for plant phenotyping,” Machine Vision and Applications, vol. 27, no. 5, pp. 735–749, 2016.
- W. Guo, B. Zheng, A. B. Potgieter et al., “Aerial imagery analysis – quantifying appearance and number of sorghum heads for applications in breeding and agronomy,” Frontiers in Plant Science, vol. 9, p. 1544, 2018.
- D. P. Hughes and M. Salathé, “An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing,” Tech. Rep., CoRR, 2015, http://arxiv.org/abs/1511.08060.
- D. LeBauer et al., “Data from: TERRA-REF, an open reference data set from high resolution genomics, phenomics, and imaging sensors,” Dryad, p. 800302508 bytes, 2020.
- S. Leminen Madsen, S. K. Mathiassen, M. Dyrmann, M. S. Laursen, L. C. Paz, and R. N. Jørgensen, “Open plant phenotype database of common weeds in Denmark,” Remote Sensing, vol. 12, no. 8, p. 1246, 2020.
- H. Lu, Z. Cao, Y. Xiao, B. Zhuang, and C. Shen, “TasselNet: counting maize tassels in the wild via local counts regression network,” Plant Methods, vol. 13, no. 1, p. 79, 2017.
- S. Madec, K. Irfan, E. David et al., The P2S2 segmentation dataset: annotated in-field multi-crop RGB images acquired under various conditions, Lyon, France, 2019, https://hal.inrae.fr/hal-03140124.
- H. Scharr, M. Minervini, A. P. French et al., “Leaf segmentation in plant phenotyping: a collation study,” Machine Vision and Applications, vol. 27, no. 4, pp. 585–606, 2016.
- R. Thapa, K. Zhang, N. Snavely, S. Belongie, and A. Khan, “The Plant Pathology challenge 2020 data set to classify foliar disease of apples,” Applications in Plant Sciences, vol. 8, no. 9, article e11390, 2020.
- T. Wiesner-Hanks, E. L. Stewart, N. Kaczmar et al., “Image set for deep learning: field images of maize annotated with disease symptoms,” BMC Research Notes, vol. 11, no. 1, p. 440, 2018.
- E. David, F. Ogidi, W. Guo, F. Baret, and I. Stavness, Global Wheat Challenge 2020: analysis of the competition design and winning models, 2021.
- N. Hani, P. Roy, and V. Isler, “MinneApple: a benchmark dataset for apple detection and segmentation,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 852–858, 2020.
- M. Minervini, A. Fischbach, H. Scharr, and S. A. Tsaftaris, “Finely-grained annotated datasets for image-based plant phenotyping,” Pattern Recognition Letters, vol. 81, pp. 80–89, 2016.
- S. A. Tsaftaris and H. Scharr, “Sharing the right data right: a symbiosis with machine learning,” Trends in Plant Science, vol. 24, no. 2, pp. 99–102, 2019.
- E. David, S. Madec, P. Sadeghi-Tehran et al., “Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods,” Plant Phenomics, vol. 2020, article 3521852, 12 pages, 2020.
- G. Yu, Y. Wu, J. Xiao, and Y. Cao, “A novel pyramid network with feature fusion and disentanglement for object detection,” Computational Intelligence and Neuroscience, vol. 2021, Article ID 6685954, 13 pages, 2021.
- T. W. Ayalew, J. R. Ubbens, and I. Stavness, “Unsupervised domain adaptation for plant organ counting,” European conference on computer vision, pp. 330–346, 2020.
- M. N. Datta, Y. Rathi, and M. Eliazer, “Wheat heads detection using deep learning algorithms,” Annals of the Romanian Society for Cell Biology, pp. 5641–5654, 2021.
- F. Fourati, W. S. Mseddi, and R. Attia, “Wheat head detection using deep, semi-supervised and ensemble learning,” Canadian Journal of Remote Sensing, vol. 47, no. 2, pp. 198–208, 2021.
- F. Fourati, W. Souidene, and R. Attia, “An original framework for wheat head detection using deep, semi-supervised and ensemble learning within Global Wheat Head Detection (GWHD) dataset,” 2020, https://arxiv.org/abs/2009.11977.
- A. S. Gomez, E. Aptoula, S. Parsons, and P. Bosilj, “Deep regression versus detection for counting in robotic phenotyping,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2902–2907, 2021.
- B. Gong, D. Ergu, Y. Cai, and B. Ma, “Real-time detection for wheat head applying deep neural network,” Sensors, vol. 21, no. 1, p. 191, 2021.
- M.-X. He, P. Hao, and Y. Z. Xin, “A robust method for wheatear detection using UAV in natural scenes,” IEEE Access, vol. 8, pp. 189043–189053, 2020.
- Y. Jiang, C. Li, R. Xu, S. Sun, J. S. Robertson, and A. H. Paterson, “DeepFlower: a deep learning-based approach to characterize flowering patterns of cotton plants in the field,” Plant Methods, vol. 16, no. 1, p. 156, 2020.
- B. Jiang, J. Xia, and S. Li, “Few training data for objection detection,” in Proceedings of the 2020 4th International Conference on Electronic Information Technology and Computer Engineering, pp. 579–584, November 2020.
- A. Karwande, P. Kulkarni, P. Marathe, T. Kolhe, M. Wyawahare, and P. Kulkarni, “Computer vision-based wheat grading and breed classification system: a design approach,” vol. 1311, p. 403, Springer.
- T. Kattenborn, J. Leitloff, F. Schiefer, and S. Hinz, “Review on convolutional neural networks (CNN) in vegetation remote sensing,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 24–49, 2021.
- S. Khaki, N. Safaei, H. Pham, and L. Wang, “WheatNet: a lightweight convolutional neural network for high-throughput image-based wheat head detection and counting,” 2021, https://arxiv.org/abs/2103.09408.
- S. U. Kolhar and J. Jagtap, Bibliometric Review on Image Based Plant Phenotyping, p. 16.
- J. Li, C. Li, S. Fei et al., “Wheat ear recognition based on RetinaNet and transfer learning,” Sensors, vol. 21, no. 14, p. 4845, 2021.
- L. Lucks, L. Haraké, and L. Klingbeil, Detektion von Weizenähren mithilfe neuronaler Netze und synthetisch erzeugter Trainingsdaten, tm-Technisches Messen, 2021.
- T. Misra, A. Arora, S. Marwaha et al., “Web-SpikeSegNet: deep learning framework for recognition and counting of spikes from visual images of wheat plants,” IEEE Access, vol. 9, pp. 76235–76247, 2021.
- L. G. Riera, M. E. Carroll, Z. Zhang et al., “Deep multiview image fusion for soybean yield estimation in breeding applications,” Plant Phenomics, vol. 2021, article 9846470, 12 pages, 2021.
- D. T. Smith, A. B. Potgieter, and S. C. Chapman, “Scaling up high-throughput phenotyping for abiotic stress selection in the field,” Theoretical and Applied Genetics, vol. 134, no. 6, pp. 1845–1866, 2021.
- Y. Suzuki, D. Kuyoshi, and S. Yamane, “Transfer learning algorithm for object detection,” Bulletin of Networking, Computing, Systems, and Software, vol. 10, no. 1, pp. 1–3, 2021.
- R. Trevisan, O. Pérez, N. Schmitz, B. Diers, and N. Martin, “High-throughput phenotyping of soybean maturity using time series UAV imagery and convolutional neural networks,” Remote Sensing, vol. 12, no. 21, p. 3617, 2020.
- K. Velumani, R. Lopez-Lozano, S. Madec et al., “Estimates of maize plant density from UAV RGB images using Faster-RCNN detection model: impact of the spatial resolution,” 2021, https://arxiv.org/abs/2105.11857.
- Y. Wu, Y. Hu, and L. Li, “BTWD: bag of tricks for wheat detection,” in European Conference on Computer Vision, pp. 450–460, Springer, 2020.
- H. Wang, Y. Duan, Y. Shi, Y. Kato, S. Ninomiya, and W. Guo, “EasyIDP: a Python package for intermediate data processing in UAV-based plant phenotyping,” Remote Sensing, vol. 13, no. 13, p. 2622, 2021.
- Y. Wang, Y. Qin, and J. Cui, “Occlusion robust wheat ear counting algorithm based on deep learning,” Frontiers in Plant Science, vol. 12, p. 1139, 2021.
- B. Yang, Z. Gao, Y. Gao, and Y. Zhu, “Rapid detection and counting of wheat ears in the field using YOLOv4 with attention module,” Agronomy, vol. 11, no. 6, p. 1202, 2021.
- H. Lu, L. Liu, Y. N. Li, X. M. Zhao, X. Q. Wang, and Z. G. Cao, “TasselNetV3: Explainable Plant Counting With Guided Upsampling and Background Suppression,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–15, 2021.
- S. Dandrifosse, A. Carlier, B. Dumont, and B. Mercatoris, “Registration and fusion of close-range multimodal wheat images in field conditions,” Remote Sensing, vol. 13, no. 7, p. 1380, 2021.
- X. Wang, H. Xuan, B. Evers, S. Shrestha, R. Pless, and J. Poland, “High-throughput phenotyping with deep learning gives insight into the genetic architecture of flowering time in wheat,” GigaScience, vol. 8, no. giz120, 2019.
- G. Tolias, R. Sicre, and H. Jégou, “Particular object retrieval with integral max-pooling of CNN activations,” 2015, https://arxiv.org/abs/1511.05879.
- L. McInnes, J. Healy, and J. Melville, UMAP: Uuniform manifold approximation and projection for dimension reduction., 2020.
- P. W. Koh, S. Sagawa, H. Marklund et al., “WILDS: a benchmark of in-the-wild distribution shifts,” 2021, April 2021, https://arxiv.org/abs/2012.07421.
Copyright © 2021 Etienne David et al. Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).