Mapping Crop Types in Southeast India with Smartphone Crowdsourcing and Deep Learning

High resolution satellite imagery and modern machine learning methods hold the potential to fill existing data gaps in where crops are grown around the world at a sub-field level. However, high resolution crop type maps have remained challenging to create in developing regions due to a lack of ground truth labels for model development. In this work, we explore the use of crowdsourced data, Sentinel-2 and DigitalGlobe imagery, and convolutional neural networks (CNNs) for crop type mapping in India. Plantix, a free app that uses image recognition to help farmers diagnose crop diseases, logged 9 million geolocated photos from 2017–2019 in India, 2 million of which are in the states of Andhra Pradesh and Telangana in India. Crop type labels based on farmer-submitted images were added by domain experts and deep CNNs. The resulting dataset of crop type at coordinates is high in volume, but also high in noise due to location inaccuracies, submissions from out-of-field, and labeling errors. We employed a number of steps to clean the dataset, which included training a CNN on very high resolution DigitalGlobe imagery to filter for points that are within a crop field. With this cleaned dataset, we extracted Sentinel time series at each point and trained another CNN to predict the crop type at each pixel. When evaluated on the highest quality subset of crowdsourced data, the CNN distinguishes rice, cotton, and “other” crops with 74% accuracy in a 3-way classification and outperforms a random forest trained on harmonic regression features. Furthermore, model performance remains stable when low quality points are introduced into the training set. Our results illustrate the potential of non-traditional, high-volume/high-noise datasets for crop type mapping, some improvements that neural networks can achieve over random forests, and the robustness of such methods against moderate levels of training set noise. Lastly, we caution that obstacles like the lack of good Sentinel-2 cloud mask, imperfect mobile device location accuracy, and preservation of privacy while improving data access will need to be addressed before crowdsourcing can widely and reliably be used to map crops in smallholder systems.