Abstract
This study focuses on meeting end-users’ demand for cassava (Manihot esculenta Crantz) varieties with low cyanogenic potential (hydrogen cyanide potential [HCN]) by using near-infrared spectrometry (NIRS). This technology provides a fast, accurate, and reliable way to determine sample constituents with minimal sample preparation. The study aims to evaluate the effectiveness of machine learning (ML) algorithms such as logistic regression (LR), support vector machine (SVM), and partial least squares discriminant analysis (PLS-DA) in distinguishing between low and high HCN accessions. Low HCN accessions averagely scored 1–5.9, while high HCN accessions scored 6–9 on a 1–9 categorical scale. The researchers used 1164 root samples to test different NIRS prediction models and six spectral pretreatments. The wavelengths 961, 1165, 1403–1505, 1913–1981, and 2491 nm were influential in discrimination of low and high HCN accessions. Using selected wavelengths, LR achieved 100% classification accuracy and PLS-DA achieved 99% classification accuracy. Using the full spectrum, the best model for discriminating low and high HCN accessions was the PLS-DA combined with standard normal variate with second derivative, which produced an accuracy of 99.6%. The SVM and LR had moderate classification accuracies of 75% and 74%, respectively. This study demonstrates that NIRS coupled with ML algorithms can be used to identify low and high HCN accessions, which can help cassava breeding programs to select for low HCN accessions.