
| Geoinformatics Engineer

Crop Parameter Estimation using Machine learning
The major aim of this study was to derive the polarimetric parameters from SAR datasets and estimate the various plant parameter values for different types of Kharif crops.
Indian Institute of Remote Sensing (IIRS), Indian Space Research Organization (ISRO) | September 2020
Remote Sensing and GIS Research Intern
Guide: Dr. Dipanwita Haldar
Analysis: Created a code in Python using Scikit library for prediction, downloaded and did data cleaning and preprocessing for raster data

Poster of the project
Objectives of the project
​
• Refine the ground data collected from the field, separate it into suitable categories, and derive the point layer shapefile from excel format using ArcGIS software.
• Download the SAR Sentinel-1 data, perform all pre-processing steps and derive the necessary polarimetric parameter layers such as Backscatter, Intensity ratio, Radar Vegetation Index, Coherence, Anisotropy, Alpha and Entropy for the corresponding locations using SNAP software.
• Develop a model to derive the plant parameters such as plant height, plant canopy cover percentage and plant biomass from the polarimetric parameter data using multivariate and univariate linear regression through code written in Python on Spyder software.
• To fill in the gaps of the field data using the regression outputs and plot the data in order to find correlations to establish relationships between all the different plant parameters and SAR parameters.
Abstract
The knowledge about the diversity and quality of all crops on agricultural fields is essential for sustainable and effective agricultural field management. In this study, the Sentinel-1 SLC SAR Imagery was analyzed in relation to predict the plant parameters of Kharif crops like height, percentage canopy cover and biomass for ground sites in the Mysuru District using field data of August 2017. For this analysis, the classical machine learning algorithms like multivariate and univariate linear regression was used to create a model which estimates the plant parameters using the Scikit-learn machine learning Library in Python programming language. The goal of analysis was to find the regression parameters and fill the missing field data. This allowed to determine the extent to which extent the other factors apart from the backscatter (VH, VV) affected the plant parameters, such as the Radar Vegetation Index (RVI), Intensity Ratio (VH/VV), Anisotropy, Alpha, Entropy and Coherence. First, all the SAR parameter layers were obtained and extracted to the field coordinates using SNAP software for further analysis. Then the correlations between these with respect to each plant parameter (dependent variable) was studied and the most suitable set of independent variables were selected. The regression models were created for four plant categories namely millets, vegetables, cotton and sugarcane. The models had more than 85 percent accuracy with errors below 20 percent in the entire range of the test-train data for all the crop categories. The most accurate models were created in the homogeneous crop categories of sugarcane and cotton. Using the model, the gaps of the plant parameters in the field data were populated and the relationships or correlations between them were studied and all three had either low/good/strong positive relationships with each other. The sowing and harvest period of the crops were predicted using the time-series data collected from July to October from the inferences of the August data analysis. Hence, this study outlines the effectiveness of Sentinel-1 SAR data for crop monitoring using machine learning regression analysis for plant parameter predictions, which can help to create and innovate newer technologies to change conventional agricultural practices.
​
​
Study Area
Mysuru (12.1873° N, 76.3637° E) is an administrative district located in the southern part of the state of Karnataka, India. The area is located adjacent to the Kaveri River. Some of the important agricultural crops grown here are cotton, grams, groundnut, jowar, maize, ragi, rice, sugarcane, sunflower and tur which contribute largely to the economy. As seen in figure 1, 125 ground sites were selected in the region for the analysis. These coordinates were converted into vector layer, point shapefile in ArcGIS for the analysis. This area was chosen as the study site due to its abundance of agricultural lands and availability of several kharif crops and to analyse crops such as sugarcane, millets, vegetables, cotton, etc during the months from July 2017 to October 2017. It is also known as the Autumn harvest as these are cropped with the beginning of the first rains in the month of June to July and harvested at the end of October to November months.
​
​
​
​
Methodology
Sentinel-1 SAR data pre-processing is essential for several earth observation applications, including land cover classification, change detection, vegetation monitoring, urban growth, natural hazards, etc. The information can be extracted from the 2x2 covariance matrix [C2] of Sentinel-1 dual-pol (VV-VH) acquisitions. To generate the covariance matrix from Sentinel-1 single look complex (SLC) data, several pre-processing steps are required. The ESA SNAP S-1 toolbox can be used to pre-process the data to generate a [C2] matrix. Journal [4] and NASA ARSET series were used for references to perform the pre-processing procedure.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Conclusion
In this study, the potential of Sentinel 1 SAR data was studied for estimating the plant heights, canopy cover percentages and biomass for four different crop types (millets, cotton, sugarcane and vegetables). The classical Machine learning Algorithms like Multivariate and Univariate Linear Regressions were used to predict these plant parameters for locations in the Mysuru region by using only the outputs derived from the SAR data and no external inputs. Good regressions were demonstrated with high accuracy and low error percentages which aided in finding the relationships between the plant parameters and the SAR parameters. In other words, we showed that Sentinel-1 SAR data integrated with machine learning methods could be an alternative and reliable approach to monitor and estimate crop heights, canopy cover percentages and biomass, compared to direct field measurements.
​
The regression models had more than 85 percent accuracy with errors below 20 percent in the entire range of the test-train data for all the crop categories. The results of the regression analysis were especially successful for the sugarcane fields and cotton fields which had the best model compared to the other categories owing to its single homogeneous crop type. Multivariate regression for crop height for sugarcane led to correlation values above (R=0.80) were found for VH backscatter and all SAR parameters except for VV backscatter. The use of multiple regression improved the R2 values only slightly while estimating biomass, indicating that there is mostly only one single parameter which explains field data variations best. Biomass for sugarcane was found to have good negative correlations with the Intensity Ratio (R= -0.68) and RVI (R= -0.64). Biomass for millets were found to have good correlations with the decomposition outputs as well.
​
Also, the VH and VV coherence for all the acquired images showed lower values (on average 0.33) for all the locations, which suggests that the plant parameters like height, canopy cover and biomass changed between the timeline of July to October. But from the output crop parameters of the regression, no strong correlations could be determined except the VH Coherence and Sugarcane Biomass which was (R= -0.54).
​
However, the results could be made better if more field data were available since most machine learning algorithms like Support Vector Machines (SVM) and deep learning algorithms like Neural Networks require huge quantum of data in order to test, train and predict in an unbiased manner to achieve higher quality models with great accuracies and minimal errors. Additional input variables could such as the meteorological and soil data could be integrated since it can even handle the multi-dimensional and multi-variety data. Also, the Sentinel optical data could be combined with the Radar data and machine learning classification algorithms such as Decision Tree and Random Forest to do further research.
​
Subsequently, this preliminary study can serve as basis for future research with machine learning techniques and SAR Sentinel-1 data focused on crops for plant parameter estimation to create vital Expert Systems using Artificial Intelligence in the domain of agriculture.

