Abstract
Wheat (Triticum aestivum L.) is crucial to global food security but is often threatened by diseases, pests, and environmental stresses. Wheat-stem sawfly (Cephus cinctus Norton) poses a major threat to food security in the United States, and solid-stem varieties, which carry the stem-solidness locus (Sst1), are the main source of genetic resistance against sawfly. Marker-assisted selection uses molecular markers to identify lines possessing beneficial haplotypes, like that of the Sst1 locus. In this study, an R package titled “HaploCatcher” was developed to predict specific haplotypes of interest in genome-wide genotyped lines. A training population of 1056 lines genotyped for the Sst1 locus, known to confer stem solidness, and genome-wide markers was curated to make predictions of the Sst1 haplotypes for 292 lines from the Colorado State University wheat breeding program. Predicted Sst1 haplotypes were compared to marker-derived haplotypes. Our results indicated that the training set was substantially predictive, with kappa scores of 0.83 for k-nearest neighbors and 0.88 for random forest models. Forward validation on newly developed breeding lines demonstrated that a random forest model, trained on the total available training data, had comparable accuracy between forward and cross-validation. Estimated group means of lines classified by haplotypes from PCR-derived markers and predictive modeling did not significantly differ. The HaploCatcher package is freely available and may be utilized by breeding programs, using their own training populations, to predict haplotypes for whole-genome sequenced early generation material.