Shola Elijah Adeniji Department of Chemistry, Ahmadu Bello University, Zaria-Nigeria, Email: shola4343@gmail.com
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Science Technology, Nigerian Institute of Leather and Science Technology Samaru, Zaria, Kaduna State, Nigeria.
Received: 11-07-2018
Accepted: 16-07-2018
Published: 19-07-2018
Citation: Shola Elijah Adeniji, Kalen Ephraim Audu, Mustapha Abdullahi, Mahmoud A.Y, Danzarami Danlami (2018) Genetic Function Approximation and Multi-Linear Regression Approach for Activity Modeling of Ciprofloxacin Derivatives as Potential Anti-Prostate Cancer Agents: A Theoretical Approach, Ken Jou Phar Hel Car 4:6-16
Copyrights: © 2018 Shola Elijah Adeniji et al
A theoretical approach was employed on analogue of ciprofloxacin as potent anti-prostate cancer to investigate the bioactivity of the compounds by utilizing Quantitative Structure-Activity Relationship (QSAR) techniques. Genetic Function Algorithm (GFA) and Multiple Linear Regression Analysis (MLRA) were used to select the descriptors and to generate the correlation QSAR models that relate the activity values against prostate cancer with the molecular structures of the active molecules. The models were validated and the best model selected has squared correlation coefficient (R2) of 0.990531, adjusted squared correlation coefficient (Radj) of 0.95962 and Leave one out (LOO) cross validation coefficient (Q_cv^2) value of 0.942963. The external validation set used for confirming the predictive power of the model has its R2pred of 0.8486. Stability and robustness of the model obtained by the validation test indicate that the model can be used to design and synthesis other ciprofloxacin derivatives with improved anti-prostate cancer activity.
Keywords: Ciprofloxacin; Descriptors; Genetic Function Algorithm; prostate cancer; QSAR.
Prostate cancer develops when abnormal cells in the prostate gland start to grow more rapidly than normal cells, and in an uncontrolled way. Prostate Cancer has become the number one cancer in men with increasing incidence and morbidity in African men [1]. It diagnosed primarily in older men, with a majority being over age 65, although men in their 30s and 40s have been diagnosed with the disease. Its incidence and prevalence in black men is in multiples of those from other races in several studies [2]. The reason for this is not yet clear and an explanation for the disparity may lie in studies involving black men from different populations to see if there is an enhancing factor associated with the racial origins of these men.
Ciprofloxacin (CP), an antibiotic has been shown to have anti-proliferative and apoptotic activities in several cancer cell lines. Moreover, several reports have highlighted the interest of increasing the lipophilicity to improve the antitumor efficacy.
Synthesis of novel compounds are developed using a trial and error approach, which is time consuming and expensive. The application of Quantitative Structure Activity Relationship (QSAR) technique to this problem has potential to minimize effort and time required to discover new compounds or to improve current ones in terms of their efficiency. QSAR establishes the mathematical relationship between physical, chemical, biological or environmental activities of interest and measurable or computable parameters such as physicochemical, topological, stereo chemical or electronic indices called molecular descriptors [3]. The aim of this research was to develop various QSAR models for predicting the activity of ciprofloxacin derivatives against prostate cancer.
Data Collection
Data set of ciprofloxacin derivatives as potential anti-prostate cancer that were used in this study were obtained from the literature [4].
Biological Activities (pIC50)
The Biological activities of ciprofloxacin derivatives against prostate cancer measured in IC50 (M) were converted to logarithm unit (pIC50) using the equation (1) below in order to increase the linearity activities values and approach normal distribution. The observed structures and the biological activities of these compounds were presented in Figure 1 and Table 1.
pIC50 = -log (IC50) (1)
Figure 1: General structure of ciprofloxacin derivatives
Optimization
The 2D structures of the compounds presented in the Table 1 were drawn utilizing chemdraw programming [5,6] . The spatial conformations of the compounds were exported from 2D structure to 3D format using the Spartan 14 V1.1.4 Wave Function programming package. All 3D structures were geometrically optimized by minimizing energy. The chemical structures were initially minimized by Molecular Mechanics Force Field (MMFF) count to remove strain energy before subjecting it to quantum chemical estimations. Density Functional Theory (DFT) method was later employed by utilizing the Becke’s three parameter exchange functional (B3) hybrid with Lee, Yang and Parr correlation functional (LYP) which is termed (B3LYP) hybrid functional for complete geometric optimization of the structures. The Spartan files of all the optimized molecules were then saved in SD file format, which is the recommended input format in PaDEL-Descriptor software V2.20 [5].
Molecular Descriptor Calculation
Molecular descriptors are mathematical values that describe the properties of a molecule. Descriptors calculation for all the 20 molecules of ciprofloxacin derivatives were calculated using PaDEL-Descriptor software V2.20. A total of 1876 molecular descriptors were calculated.
Normalization and Data Pretreatment
The descriptors’ value was normalized using Equation 2 in order to give each variable the same opportunity at the onset to influence the model [7].
(2)
Where Xi is the value of each descriptor for a given molecule, Xmax and Xmin are the maximum and minimum value for each column of descriptors X. The normalized data were subjected to pretreatment using Data Pretreatment software obtained from Drug Theoretical and Cheminformatics Laboratory (DTC Lab) in order to remove noise and redundant data [5].
Data Division
In order to obtain validated QSAR models the dataset was divided into training and test sets using Data Division software obtained from Drug Theoretical and Cheminformatics Laboratory (DTC Lab) by employing Kennard and Stone’s algorithm . This algorithm has been applied with great success in many recent QSAR studies and has been highlighted as one of the best ways to build training and test sets [5, 8, 9, 10, 11, 12]. In this algorithm, two compounds with the largest Euclidean distance apart were initially selected for the training set. The remaining compounds for the training set were selected by maximizing the minimum distance between these two compounds and the rest of the compounds in the dataset. This process continues until the desired number of compounds needed for the training set have been selected then, the remaining compounds in the dataset would be used as the test set.
The algorithm employs Euclidean distance EDX (p, q), between the x vectors of each pair (p, q) of samples to ensure a uniform distribution of such a subset along the x data space N is the number variables in x, and M is the number of samples while xp (j) and xq (j) are the jth variable for samples p and q respectively.
(3)
The training set was used to generate the model, while the test set were used for the external validation of the model.
Validation of Model
Validation of the model was carried out using Material studio software version 8 using Genetic Function Approximation (GFA) method. The models were estimated using the LOF, which was measured using a slight variation of the original Friedman formula, so that the best fitness score can be received. In materials studio version 8, LOF is measured using a slight variation of the original Friedman formula. The revised formula is:
(4)
Where:
SEE is the Standard Error of Estimation which is equivalent to the models standard deviation. It’s a measure of model quality and a model is said to be a better model if it has low SEE value. SEE is defined by equation below;
(5)
c is the number of terms in the model, other than the constant term, d is a user-defined smoothing parameter, p is the total number of descriptors contained in the model and M is the number of data in the training set.
The square of the correlation coefficient (R2) describes the fraction of the total variation attributed to the model. The closer the value of R2 is to 1.0, the better the regression equation explains the Y variable. R2 is the most commonly used internal validation indicator and is expressed as follows:
(6)
Where:
Yexp, Ypred and Yare the experimental activity, the predicted activity and the mean experimental activity of the samples in the training set, respectively.
R2 value varies directly with the increase in number of repressors i.e. descriptors, thus, R2 cannot be a useful measure for the stability of model. Therefore, R2 is adjusted for the number of explanatory variables in the model. The adjusted R2 is defined as:
(7)
Where p = number of independent variables in the model.
The capability of the QSAR equation to predict bioactivity of new compounds was determined using the leave-one-out cross validation method. The cross-validation regression coefficient was calculated with the equation below:
(8)
Where
Ypred, Yexp, and are the predicted, experimental and mean values of experimental activity of the training set.
The coefficient of determination for the test set was calculated with the equation below;
(9)
Where and are the predicted and experimental activity test set. While is mean values of experimental activity of the training set.
Y-Randomization Test
To guarantee the created QSAR model is strong and not inferred by chance, the Y-randomization test was performed on the training set data as suggested by [13]. Random MLR models are generated by randomly shuffling the dependent variable (activity data) while keeping the independent variables (descriptors) unaltered. The new QSAR models are expected to have significantly low R2 and Q2 values for several trials, which confirm that the developed QSAR models are robust. Another parameter, is also calculated which should be more than 0.5 for passing this test.
(10)
Where,
= Coefficient of determination for Y-randomization, R = coefficient of determination for Y-randomization and Rr = average ‘R’ of random models.
Quality Assurance of the Model
The fitting ability, stability, reliability and predictive ability of the developed models were evaluated by internal and external validation parameters. The validation parameters were compared with the minimum recommended value for a generally acceptable QSAR model [14] showed in Table 2.
A QSAR examination was performed to investigate the structure activity relationship of 20 compounds as potent anti-prostate cancer. The nature of models in a QSAR study is expressed by its fitting and forecast capacity. In order to assemble a decent QSAR model for anti-prostate cancer with good predictive power for the selected test set. Kennard-Stone algorithm was used to divide the dataset of 20 compounds into a training set of 14 compounds which was used to developed the model and a test set of 6 compounds which was applied to assess the predictive ability built model.
Experimental and Predicted activity for ciprofloxacin derivatives as a potent anti-prostate cancer and the residual values were presented in Table 1. The low residual value between Experimental and Predicted activity indicates that the model is of high predictability.
The Genetic Algorithm- Multi Linear Regression (GA–MLR) investigation led to the selection of three descriptors which were used to assemble a linear model for calculating predictive activity on prostate cancer. Four QSAR models were built using Genetic Function Algorithm (GFA), but due to the statistical significance, model 1 was selected, reported and its parameters were as well calculated.
S/N |
R |
Activity IC50 (μM) |
Experimental Activity (pIC50) |
Predicted activity |
Residual |
1a |
H |
143 |
3.844664 |
3.732084 |
0.11258 |
2 |
COCH2Cl |
8 |
5.09691 |
5.1541 |
-0.05719 |
3 |
C(O)OC(CH3)3 |
26 |
4.585027 |
4.681317 |
-0.09629 |
4 |
COCH2OCOCH3 |
176 |
3.754487 |
3.763877 |
-0.00939 |
5 |
COCH2OCO(CH2)2CH3 |
715 |
3.145694 |
3.137145 |
0.00855 |
6a |
COCH2OCO(CH2)4CH3 |
14 |
4.853872 |
4.917432 |
-0.06356 |
7 |
COCH2OCO(CH2)6CH3 |
23 |
4.638272 |
4.592276 |
0.046 |
8a |
COCH3 |
680 |
3.167491 |
3.12956 |
0.03793 |
9a |
COCH2 CH3 |
352 |
3.453457 |
3.350235 |
0.10322 |
10 |
CO(CH2)2CH3 |
85 |
4.070581 |
3.957798 |
0.11278 |
11 |
CO(CH2)3 CH3 |
73 |
4.136677 |
4.215267 |
-0.07859 |
12 |
COC(CH3)3 |
246 |
3.609065 |
3.428617 |
0.18045 |
13 |
CO(CH2)5CH3 |
779 |
3.108463 |
3.063929 |
0.04453 |
14 |
CO(CH2)7CH3 |
7 |
5.154902 |
5.304892 |
-0.14999 |
15 |
CO(CH2)8CH3 |
4 |
5.39794 |
5.357801 |
0.04014 |
16 |
CO(CH2)10CH3 |
4 |
5.39794 |
5.360478 |
0.03746 |
17 |
CO(CH2)12CH3 |
94 |
4.026872 |
4.062242 |
-0.03537 |
18 a |
CO(CH2)14CH3 |
114 |
3.943095 |
3.239815 |
0,70328 |
19 |
COCH2C6H5 |
243 |
3.614394 |
3.58375 |
0.03064 |
20 a |
COCH2OH |
433 |
3.363512 |
3.122104 |
0.24141 |
Where superscript a represent the test set
Table 1: Molecular structure, Experimental, Predicted and Residual values of ciprofloxacin derivatives as potent anti-prostate cancer
Symbol Value |
Name |
Value |
R2 |
Coefficient of determination |
≥ 0.6 |
P (95%) |
Confidence interval at 95% confidence level |
< 0.05 |
Q2CV |
Cross validation coefficient |
> 0.5 |
R2 - Q2CV |
Difference between R2 and Q2CV |
≤ 0.3 |
Next. test set |
Minimum number of external test set |
≥ 5 |
cR2p |
Coefficient of determination for Y-randomization |
> 0.5 |
Table 2: Minimum recommended value of Validation Parameters for a generally acceptable QSAR model
Model 1
pIC50 = 0.295441891 * AATSC6m + 0.193350923 * MDEC-22 - 1.938081244 * L3v + 7.423458362
Model 2
pIC50 = 0.279119413 * AATSC6m + 0.456910158 * nssCH2 - 1.455230092 * L3v + 7.681216809
Model 3
pIC50 = 0.002899338 * ATSC6v + 0.472513415 * nssCH2 - 1.368491011 * nssCH2 + 8.284970195
Model 4
pIC50 = 0.277548931 * AATSC6m + 0.484912043 * nssCH2 - 1.936444918 * L3v + 6.909123060
External validation and internal validation parameters to confirm that the built QSAR models are stable and robust were reported in Table 3. These parameters were in agreement with the threshold value reported in Table 2 which actually confirmed the robustness and stability of the model.
S/N |
Validation Parameters |
Model 1 |
Model 2 |
Model 3 |
Model 4 |
1 |
Friedman LOF |
0.287447 |
0.29417 |
0.319241 |
0.36543 |
2 |
R-squared |
0.990531 |
0.948212 |
0.875503 |
0.82954 |
3 |
Adjusted R-squared |
0.95962 |
0.958676 |
0.955154 |
0.91245 |
4 |
Cross validated R-squared |
0.942963 |
0.935828 |
0.934816 |
0.87353 |
56 |
Significant Regression |
Yes |
Yes |
Yes |
Yes |
7 |
Significance-of-regression F-value |
103.980981 |
101.528362 |
93.293244 |
91.3344 |
8 |
Critical SOR F-value (95%) |
3.871034 |
3.871034 |
3.871034 |
3.871034 |
9 |
Replicate points |
0 |
0 |
0 |
0 |
10 |
Computed experimental error |
0 |
0 |
0 |
0 |
11 |
Lack-of-fit points |
10 |
10 |
10 |
0 |
12 |
Min expt. error for non-significant LOF (95%) |
0.186643 |
0.208814 |
0.266695 |
0.31900 |
Table 3: Validation parameters from material studio
The name and symbol of the descriptors used in the QSAR optimization model was reported in Table 4. The presence of the 2D and 3D descriptors in the model suggests that these types of descriptors are able to characterize better anti-prostate cancer activities of the compounds. Pearson’s correlation matrix and statistics of the three descriptors employed in the QSAR Model were reported in Table 5 which shows clearly that the correlation coefficients between each pair of descriptors is very low thus, it can be inferred that there exist no significant inter-correlation among the descriptors used in building the model [5]. The estimated Variance Inflation Factor (VIF) values for all the descriptors were less than 4 which imply that the Model generated was statistically significant and the descriptors were orthogonal. The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis. The null hypothesis implies that there is no association between the descriptors and the activities of the molecules. The P-values of all the descriptors in the model at 95% confidence level shown in Table 5 are less than 0.05. This implies that the alternative hypothesis is accepted. Hence there is a relationship between the descriptors used in the model and the activities molecules which take preference over the null hypothesis [5].
Y- Randomization parameter Test
Y- Randomization parameter test were reported in Table 6. The low R2 and Q2 values for several trials confirm that the developed QSAR model is robust. While the cR2p value greater than 0.5 affirms that the created model is powerful and not inferred by chance.
Plot of predicted activity against experimental activity of training and test set were shown in Figure 2 and Figure 3 respectively. The R2 value of 0.9905 for training set and R2 value of 0.8486 for test set recorded in this study was in agreement with GFA derived R2 value reported in Table 2. This confirms the reliability of the model. Plot of Standardized residual versus experimental activity shown in Figure 4 indicates that there was no systemic error in model development as the spread of residuals was pragmatic on both sides of zero [15].
S/NO |
Descriptors symbols |
Name of descriptor(s) |
Class |
|
1 |
AATSC6m |
Average centered Broto-Moreau autocorrelation - lag 6 / weighted by mass |
2D |
|
2 |
MDEC-22 |
Molecular distance edge between all secondary carbons |
2D |
|
3 |
L3v |
3rd component size directional WHIM index / weighted by relative van der Waals volumes |
3D |
Table 4: List of some descriptors used in the QSAR optimization model
Inter-correlation Statistics |
|||||
Descriptors |
AATSC6m |
MDEC-22 |
L3v |
VIF |
P- value |
AATSC6m |
1 |
|
|
2.56436 |
3.34E-05 |
MDEC-22 |
-0.15654 |
1 |
|
1.84743 |
4.23E-04 |
L3v |
-0.19444 |
0.45585 |
1 |
2.34556 |
5.34E-07 |
Table 5: Pearson’s correlation matrix and statistics for descriptor used in the QSAR optimization model.
Model |
R |
R^2 |
Q^2 |
Original |
0.965475 |
0.932142 |
0.831909 |
Random 1 |
0.674003 |
0.45428 |
-0.31323 |
Random 2 |
0.61843 |
0.382455 |
-0.50841 |
Random 3 |
0.311542 |
0.097058 |
-1.37797 |
Random 4 |
0.632995 |
0.400683 |
-0.27203 |
Random 5 |
0.665103 |
0.442362 |
-0.76461 |
Random 6 |
0.385191 |
0.148372 |
-1.09687 |
Random 7 |
0.583435 |
0.340396 |
-0.68669 |
Random 8 |
0.446102 |
0.199007 |
-1.00243 |
Random 9 |
0.413199 |
0.170734 |
-0.91905 |
Random 10 |
0.788129 |
0.621147 |
0.008176 |
Random Models Parameters |
|||
Average r : |
0.551813 |
||
Average r^2 : |
0.325649 |
||
Average Q^2 : |
-0.69331 |
||
cRp^2 : |
0.764888 |
Table 6: Y- Randomization Parameters test
Figure 2: Plot of predicted activity against experimental activity of training set
Figure 3: Plot of predicted activity against experimental activity of test set
Figure 4: Plot of standardized residual activity versus experimental activity
This work addresses the Quantitative structure activity relationship (QSAR) between ciprofloxacin derivatives and their (pIC50) against prostate cancer. Results from the optimal model showed that the pIC50 of the studied molecules against prostate cancer was affected by (AATSC6m, MDEC-22 and L3v) descriptors. The robustness and applicability of QSAR equation has been established by internal and external validation techniques. Stability and robustness of the model obtained by the validation test indicate that the model can be used to design other ciprofloxacin derivatives with improved anti-prostate cancer activity.
Delongchamps N.B, Singh. A, Haas G. P (2007). Epidemiology of prostate cancer in Africa: another step in the understanding of the disease. Current Problems in Cancer, 31: 226–236.
Odedina F.T, Ogunbiyi J.O, Ukoli F. A (2006).Roots of prostate cancer in African-American men. Journal of the National Medical Association 98: 539.
Rathod.A (2011).Antifungal and Antibacterial activities of Imidazolylpyrimidines derivatives and their QSAR Studies under Conventional and Microwave-assisted. Int J PharmTech Res 3: 1942–1951.
Azéma .J, Guidetti .B, Dewelle .J, Le Calve .B, Mijatovic.T et al. (2009).7-((4-Substituted) piperazin-1-yl) derivatives of ciprofloxacin: synthesis and in vitro biological evaluation as potential antitumor agents. Bioorganic & Medicinal Chemistry 17:5396–5407.
Shola E.A, Uba.S, Uzairu.A (2018). A Novel QSAR Model for the Evaluation and Prediction of (E)-N’-Benzylideneisonicotinohydrazide Derivatives as the Potent Anti-mycobacterium Tuberculosis Antibodies Using Genetic Function Approach. Physical. Chemistry Research. 6: 479-492.
Li .Z, Wan.H, Shi.Y, Ouyang. P (2004). Personal experience with four kinds of chemical structure drawing software: review on ChemDraw, ChemWindow, ISIS/Draw, and ChemSketch. Journal of Chemical Information and Computer Sciences 44: 1886–1890.
Singh.P (2013). Quantitative Structure-Activity Relationship Study of Substituted-[1, 2, 4] Oxadiazoles as S1P1 Agonists. Journal of Current Chemical and Pharmaceutical Sciences.
Afantitis.A, Melagraki.G, Sarimveis.H, Koutentis.P.A, Markopoulos.J (2006). A novel QSAR model for predicting induction of apoptosis by 4-aryl-4H-chromenes. Bioorganic & Medicinal Chemistry 14: 6686–6694.
Chakraborti A.K, Gopalakrishnan.B, Sobhia M. E, Malde .A (2003). 3D-QSAR studies of in dole derivatives as phosphodiesterase IV inhibitors. European Journal of Medicinal Chemistry 38: 975–982.
Khaled K. F. (2011). Modeling corrosion inhibition of iron in acid medium by genetic function approximation method: A QSAR model. Corrosion Science, 53: 3457–3465.
Melagraki .G, Afantitis .A, Makridima .K, Sarimveis .H, Igglessi-Markopoulou. O (2006).Prediction of toxicity using a novel RBF neural network training methodology. Journal of Molecular Modeling, 12: 297–305.
Wu.W, Walczak.B, Massart.D, Heuerding.S, Erni.F et al. (1996).Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometrics and Intelligent Laboratory Systems 33: 35–46.
Tropsha.A, Gramatica. P, Gombar V. K. (2003). The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Molecular Informatics 22: 69–77.
Veerasamy .R, Raja H, Jain .A, Sivadasan .S, Varghese C. P et al. (2011). Validation of QSAR models-strategies and importance. International Journal of Drug Design & Discovery 3: 511–519.
Jalali-Heravi M, Kyani .A (2004). Use of computer-assisted methods for the modeling of the retention time of a variety of volatile organic compounds: a PCA-MLR-ANN approach. Journal of Chemical Information and Computer Sciences, 44:1328–1335.