Retention index in gas chromatographic analyses is an essential tool for appropriate analyte identification. Currently, many libraries providing retention indices for a huge number of compounds on distinct stationary phase chemistries are available.
However, situation could be complicated in the case of unknown unknowns not present in such libraries. The importance of identification of these compounds have risen together with a rapidly expanding interest in non-targeted analyses in the last decade.
Therefore, precise in silico computation/prediction of retention indices based on a suggested molecular structure will be highly appreciated in such situations. On this basis, a predictive model based on deep learning was developed and presented in this paper.
It is designed for user-friendly and accurate prediction of retention indices of compounds in gas chromatography with the semi-standard non-polar stationary phase. Simplified Molecular Input Entry System (SMILES) is used as the model's input.
Architecture of the model consists of 2D-convolutional layers, together with batch normalization, max pooling, dropout, and three residual connections. The model reaches median absolute error of prediction of the retention index for validation and test set at 16.4 and 16.0 units, respectively.
Median percentage error is lower than or equal to 0.81% in the case of all mentioned data sets. Finally, the DeepRel model is presented in R package, and is available on https://github.com/TomasVrzal/DeepRel together with a user-friendly graphical user interface.