The use of digital image analysis for discriminating between and comparing groups of seeds is becoming an increasingly common practice in taxonomic studies. For this type of study, many variables, concerning different kinds of data such as size, texture and shape, are generally used as inputs in statistical algorithms without any data pre-processing, thereby generating problems with noise and the consistency of the process for new samples.
We propose an approach in which the variables for each kind of data are separately pre-processed by performing principal component analysis and Fourier analysis. Furthermore, the accuracy of the different kinds of data is measured by comparing the results obtained using several classification algorithms: k-Nearest Neighbour, Linear Discriminant Analysis, Naive Bayes, Support Vector Machines and Random Forest.
We have taken as a case study the seeds of 19 cultivars of Sardinian Prunus domestica L. and four cultivars referable to other Prunus species. The combination of size, texture and shape data was able to perform well in discriminating between the seeds of Prunus sp.
The present study confirms that image analysis techniques combined with the pre-processing of data are a useful tool for taxonomic investigation in plant biology and for discrimination at the cultivar level.