Shola Elijah Adeniji Department of Chemistry, Ahmadu Bello University, ZariaNigeria, Email: shola4343@gmail.com
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Chemistry, Ahmadu Bello University (ABU) Zaria, Kaduna State, Nigeria.
Department of Science Technology, Nigerian Institute of Leather and Science Technology Samaru, Zaria, Kaduna State, Nigeria.
Received: 11072018
Accepted: 16072018
Published: 19072018
Citation: Shola Elijah Adeniji, Kalen Ephraim Audu, Mustapha Abdullahi, Mahmoud A.Y, Danzarami Danlami (2018) Genetic Function Approximation and MultiLinear Regression Approach for Activity Modeling of Ciprofloxacin Derivatives as Potential AntiProstate Cancer Agents: A Theoretical Approach, Ken Jou Phar Hel Car 4:616
Copyrights: © 2018 Shola Elijah Adeniji et al
A theoretical approach was employed on analogue of ciprofloxacin as potent antiprostate cancer to investigate the bioactivity of the compounds by utilizing Quantitative StructureActivity Relationship (QSAR) techniques. Genetic Function Algorithm (GFA) and Multiple Linear Regression Analysis (MLRA) were used to select the descriptors and to generate the correlation QSAR models that relate the activity values against prostate cancer with the molecular structures of the active molecules. The models were validated and the best model selected has squared correlation coefficient (R2) of 0.990531, adjusted squared correlation coefficient (Radj) of 0.95962 and Leave one out (LOO) cross validation coefficient (Q_cv^2) value of 0.942963. The external validation set used for confirming the predictive power of the model has its R2pred of 0.8486. Stability and robustness of the model obtained by the validation test indicate that the model can be used to design and synthesis other ciprofloxacin derivatives with improved antiprostate cancer activity.
Keywords: Ciprofloxacin; Descriptors; Genetic Function Algorithm; prostate cancer; QSAR.
Prostate cancer develops when abnormal cells in the prostate gland start to grow more rapidly than normal cells, and in an uncontrolled way. Prostate Cancer has become the number one cancer in men with increasing incidence and morbidity in African men [1]. It diagnosed primarily in older men, with a majority being over age 65, although men in their 30s and 40s have been diagnosed with the disease. Its incidence and prevalence in black men is in multiples of those from other races in several studies [2]. The reason for this is not yet clear and an explanation for the disparity may lie in studies involving black men from different populations to see if there is an enhancing factor associated with the racial origins of these men.
Ciprofloxacin (CP), an antibiotic has been shown to have antiproliferative and apoptotic activities in several cancer cell lines. Moreover, several reports have highlighted the interest of increasing the lipophilicity to improve the antitumor efficacy.
Synthesis of novel compounds are developed using a trial and error approach, which is time consuming and expensive. The application of Quantitative Structure Activity Relationship (QSAR) technique to this problem has potential to minimize effort and time required to discover new compounds or to improve current ones in terms of their efficiency. QSAR establishes the mathematical relationship between physical, chemical, biological or environmental activities of interest and measurable or computable parameters such as physicochemical, topological, stereo chemical or electronic indices called molecular descriptors [3]. The aim of this research was to develop various QSAR models for predicting the activity of ciprofloxacin derivatives against prostate cancer.
Data Collection
Data set of ciprofloxacin derivatives as potential antiprostate cancer that were used in this study were obtained from the literature [4].
Biological Activities (pIC_{50})
The Biological activities of ciprofloxacin derivatives against prostate cancer measured in IC_{50} (M) were converted to logarithm unit (pIC_{50}) using the equation (1) below in order to increase the linearity activities values and approach normal distribution. The observed structures and the biological activities of these compounds were presented in Figure 1 and Table 1.
pIC_{50} = log (IC_{50}) (1)
Figure 1: General structure of ciprofloxacin derivatives
Optimization
The 2D structures of the compounds presented in the Table 1 were drawn utilizing chemdraw programming [5,6] . The spatial conformations of the compounds were exported from 2D structure to 3D format using the Spartan 14 V1.1.4 Wave Function programming package. All 3D structures were geometrically optimized by minimizing energy. The chemical structures were initially minimized by Molecular Mechanics Force Field (MMFF) count to remove strain energy before subjecting it to quantum chemical estimations. Density Functional Theory (DFT) method was later employed by utilizing the Becke’s three parameter exchange functional (B3) hybrid with Lee, Yang and Parr correlation functional (LYP) which is termed (B3LYP) hybrid functional for complete geometric optimization of the structures. The Spartan files of all the optimized molecules were then saved in SD file format, which is the recommended input format in PaDELDescriptor software V2.20 [5].
Molecular Descriptor Calculation
Molecular descriptors are mathematical values that describe the properties of a molecule. Descriptors calculation for all the 20 molecules of ciprofloxacin derivatives were calculated using PaDELDescriptor software V2.20. A total of 1876 molecular descriptors were calculated.
Normalization and Data Pretreatment
The descriptors’ value was normalized using Equation 2 in order to give each variable the same opportunity at the onset to influence the model [7].
(2)
Where Xi is the value of each descriptor for a given molecule, Xmax and Xmin are the maximum and minimum value for each column of descriptors X. The normalized data were subjected to pretreatment using Data Pretreatment software obtained from Drug Theoretical and Cheminformatics Laboratory (DTC Lab) in order to remove noise and redundant data [5].
Data Division
In order to obtain validated QSAR models the dataset was divided into training and test sets using Data Division software obtained from Drug Theoretical and Cheminformatics Laboratory (DTC Lab) by employing Kennard and Stone’s algorithm . This algorithm has been applied with great success in many recent QSAR studies and has been highlighted as one of the best ways to build training and test sets [5, 8, 9, 10, 11, 12]. In this algorithm, two compounds with the largest Euclidean distance apart were initially selected for the training set. The remaining compounds for the training set were selected by maximizing the minimum distance between these two compounds and the rest of the compounds in the dataset. This process continues until the desired number of compounds needed for the training set have been selected then, the remaining compounds in the dataset would be used as the test set.
The algorithm employs Euclidean distance EDX (p, q), between the x vectors of each pair (p, q) of samples to ensure a uniform distribution of such a subset along the x data space N is the number variables in x, and M is the number of samples while xp (j) and xq (j) are the jth variable for samples p and q respectively.
(3)
The training set was used to generate the model, while the test set were used for the external validation of the model.
Validation of Model
Validation of the model was carried out using Material studio software version 8 using Genetic Function Approximation (GFA) method. The models were estimated using the LOF, which was measured using a slight variation of the original Friedman formula, so that the best fitness score can be received. In materials studio version 8, LOF is measured using a slight variation of the original Friedman formula. The revised formula is:
(4)
Where:
SEE is the Standard Error of Estimation which is equivalent to the models standard deviation. It’s a measure of model quality and a model is said to be a better model if it has low SEE value. SEE is defined by equation below;
(5)
c is the number of terms in the model, other than the constant term, d is a userdefined smoothing parameter, p is the total number of descriptors contained in the model and M is the number of data in the training set.
The square of the correlation coefficient (R2) describes the fraction of the total variation attributed to the model. The closer the value of R2 is to 1.0, the better the regression equation explains the Y variable. R2 is the most commonly used internal validation indicator and is expressed as follows:
(6)
Where:
Y_{exp}, Y_{pred} and Yare the experimental activity, the predicted activity and the mean experimental activity of the samples in the training set, respectively.
R^{2} value varies directly with the increase in number of repressors i.e. descriptors, thus, R^{2} cannot be a useful measure for the stability of model. Therefore, R^{2} is adjusted for the number of explanatory variables in the model. The adjusted R^{2 }is defined as:
(7)
Where p = number of independent variables in the model.
The capability of the QSAR equation to predict bioactivity of new compounds was determined using the leaveoneout cross validation method. The crossvalidation regression coefficient was calculated with the equation below:
(8)
Where
Ypred, Yexp, and are the predicted, experimental and mean values of experimental activity of the training set.
The coefficient of determination for the test set was calculated with the equation below;
(9)
Where and are the predicted and experimental activity test set. While is mean values of experimental activity of the training set.
YRandomization Test
To guarantee the created QSAR model is strong and not inferred by chance, the Yrandomization test was performed on the training set data as suggested by [13]. Random MLR models are generated by randomly shuffling the dependent variable (activity data) while keeping the independent variables (descriptors) unaltered. The new QSAR models are expected to have significantly low R^{2} and Q^{2} values for several trials, which confirm that the developed QSAR models are robust. Another parameter, is also calculated which should be more than 0.5 for passing this test.
(10)
Where,
= Coefficient of determination for Yrandomization, R = coefficient of determination for Yrandomization and Rr = average ‘R’ of random models.
Quality Assurance of the Model
The fitting ability, stability, reliability and predictive ability of the developed models were evaluated by internal and external validation parameters. The validation parameters were compared with the minimum recommended value for a generally acceptable QSAR model [14] showed in Table 2.
A QSAR examination was performed to investigate the structure activity relationship of 20 compounds as potent antiprostate cancer. The nature of models in a QSAR study is expressed by its fitting and forecast capacity. In order to assemble a decent QSAR model for antiprostate cancer with good predictive power for the selected test set. KennardStone algorithm was used to divide the dataset of 20 compounds into a training set of 14 compounds which was used to developed the model and a test set of 6 compounds which was applied to assess the predictive ability built model.
Experimental and Predicted activity for ciprofloxacin derivatives as a potent antiprostate cancer and the residual values were presented in Table 1. The low residual value between Experimental and Predicted activity indicates that the model is of high predictability.
The Genetic Algorithm Multi Linear Regression (GA–MLR) investigation led to the selection of three descriptors which were used to assemble a linear model for calculating predictive activity on prostate cancer. Four QSAR models were built using Genetic Function Algorithm (GFA), but due to the statistical significance, model 1 was selected, reported and its parameters were as well calculated.
S/N 
R 
Activity IC50 (μM) 
Experimental Activity (pIC50) 
Predicted activity 
Residual 
1^{a} 
H 
143 
3.844664 
3.732084 
0.11258 
2 
COCH_{2}Cl 
8 
5.09691 
5.1541 
0.05719 
3 
C(O)OC(CH_{3})_{3} 
26 
4.585027 
4.681317 
0.09629 
4 
COCH_{2}OCOCH_{3} 
176 
3.754487 
3.763877 
0.00939 
5 
COCH_{2}OCO(CH_{2})2CH_{3} 
715 
3.145694 
3.137145 
0.00855 
6^{a} 
COCH_{2}OCO(CH_{2})4CH_{3} 
14 
4.853872 
4.917432 
0.06356 
7 
COCH_{2}OCO(CH_{2})6CH_{3} 
23 
4.638272 
4.592276 
0.046 
8^{a} 
COCH_{3} 
680 
3.167491 
3.12956 
0.03793 
9^{a} 
COCH_{2} CH_{3} 
352 
3.453457 
3.350235 
0.10322 
10 
CO(CH_{2})_{2}CH_{3} 
85 
4.070581 
3.957798 
0.11278 
11 
CO(CH_{2})_{3} CH_{3} 
73 
4.136677 
4.215267 
0.07859 
12 
COC(CH_{3})_{3} 
246 
3.609065 
3.428617 
0.18045 
13 
CO(CH_{2})_{5}CH_{3} 
779 
3.108463 
3.063929 
0.04453 
14 
CO(CH_{2})_{7}CH_{3} 
7 
5.154902 
5.304892 
0.14999 
15 
CO(CH_{2})_{8}CH_{3} 
4 
5.39794 
5.357801 
0.04014 
16 
CO(CH_{2})_{10}CH_{3} 
4 
5.39794 
5.360478 
0.03746 
17 
CO(CH_{2})_{12}CH_{3} 
94 
4.026872 
4.062242 
0.03537 
18 ^{a} 
CO(CH_{2})_{14}CH_{3} 
114 
3.943095 
3.239815 
0,70328 
19 
COCH_{2}C_{6}H_{5} 
243 
3.614394 
3.58375 
0.03064 
20^{ a} 
COCH_{2}OH 
433 
3.363512 
3.122104 
0.24141 
Where superscript a represent the test set
Table 1: Molecular structure, Experimental, Predicted and Residual values of ciprofloxacin derivatives as potent antiprostate cancer
Symbol Value 
Name 
Value 
R^{2} 
Coefficient of determination 
≥ 0.6 
P _{(95%)} 
Confidence interval at 95% confidence level 
< 0.05 
Q^{2}_{CV} 
Cross validation coefficient 
> 0.5 
R2  Q^{2}_{CV} 
Difference between R2 and Q^{2}_{CV} 
≤ 0.3 
N_{ext. test set} 
Minimum number of external test set 
≥ 5 
cR^{2}_{p} 
Coefficient of determination for Yrandomization 
> 0.5 
Table 2: Minimum recommended value of Validation Parameters for a generally acceptable QSAR model
Model 1
pIC50 = 0.295441891 * AATSC6m + 0.193350923 * MDEC22  1.938081244 * L3v + 7.423458362
Model 2
pIC50 = 0.279119413 * AATSC6m + 0.456910158 * nssCH2  1.455230092 * L3v + 7.681216809
Model 3
pIC50 = 0.002899338 * ATSC6v + 0.472513415 * nssCH2  1.368491011 * nssCH2 + 8.284970195
Model 4
pIC50 = 0.277548931 * AATSC6m + 0.484912043 * nssCH2  1.936444918 * L3v + 6.909123060
External validation and internal validation parameters to confirm that the built QSAR models are stable and robust were reported in Table 3. These parameters were in agreement with the threshold value reported in Table 2 which actually confirmed the robustness and stability of the model.
S/N 
Validation Parameters 
Model 1 
Model 2 
Model 3 
Model 4 
1 
Friedman LOF 
0.287447 
0.29417 
0.319241 
0.36543 
2 
Rsquared 
0.990531 
0.948212 
0.875503 
0.82954 
3 
Adjusted Rsquared 
0.95962 
0.958676 
0.955154 
0.91245 
4 
Cross validated Rsquared 
0.942963 
0.935828 
0.934816 
0.87353 
56 
Significant Regression 
Yes 
Yes 
Yes 
Yes 
7 
Significanceofregression Fvalue 
103.980981 
101.528362 
93.293244 
91.3344 
8 
Critical SOR Fvalue (95%) 
3.871034 
3.871034 
3.871034 
3.871034 
9 
Replicate points 
0 
0 
0 
0 
10 
Computed experimental error 
0 
0 
0 
0 
11 
Lackoffit points 
10 
10 
10 
0 
12 
Min expt. error for nonsignificant LOF (95%) 
0.186643 
0.208814 
0.266695 
0.31900 
Table 3: Validation parameters from material studio
The name and symbol of the descriptors used in the QSAR optimization model was reported in Table 4. The presence of the 2D and 3D descriptors in the model suggests that these types of descriptors are able to characterize better antiprostate cancer activities of the compounds. Pearson’s correlation matrix and statistics of the three descriptors employed in the QSAR Model were reported in Table 5 which shows clearly that the correlation coefficients between each pair of descriptors is very low thus, it can be inferred that there exist no significant intercorrelation among the descriptors used in building the model [5]. The estimated Variance Inflation Factor (VIF) values for all the descriptors were less than 4 which imply that the Model generated was statistically significant and the descriptors were orthogonal. The pvalue is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis. The null hypothesis implies that there is no association between the descriptors and the activities of the molecules. The Pvalues of all the descriptors in the model at 95% confidence level shown in Table 5 are less than 0.05. This implies that the alternative hypothesis is accepted. Hence there is a relationship between the descriptors used in the model and the activities molecules which take preference over the null hypothesis [5].
Y Randomization parameter Test
Y Randomization parameter test were reported in Table 6. The low R^{2} and Q^{2} values for several trials confirm that the developed QSAR model is robust. While the cR^{2}_{p} value greater than 0.5 affirms that the created model is powerful and not inferred by chance.
Plot of predicted activity against experimental activity of training and test set were shown in Figure 2 and Figure 3 respectively. The R^{2} value of 0.9905 for training set and R^{2} value of 0.8486 for test set recorded in this study was in agreement with GFA derived R^{2} value reported in Table 2. This confirms the reliability of the model. Plot of Standardized residual versus experimental activity shown in Figure 4 indicates that there was no systemic error in model development as the spread of residuals was pragmatic on both sides of zero [15].
S/NO 
Descriptors symbols 
Name of descriptor(s) 
Class 

1 
AATSC6m 
Average centered BrotoMoreau autocorrelation  lag 6 / weighted by mass 
2D 

2 
MDEC22 
Molecular distance edge between all secondary carbons 
2D 

3 
L3v 
3rd component size directional WHIM index / weighted by relative van der Waals volumes 
3D 
Table 4: List of some descriptors used in the QSAR optimization model
Intercorrelation Statistics 

Descriptors 
AATSC6m 
MDEC22 
L3v 
VIF 
P value 
AATSC6m 
1 


2.56436 
3.34E05 
MDEC22 
0.15654 
1 

1.84743 
4.23E04 
L3v 
0.19444 
0.45585 
1 
2.34556 
5.34E07 
Table 5: Pearson’s correlation matrix and statistics for descriptor used in the QSAR optimization model.
Model 
R 
R^2 
Q^2 
Original 
0.965475 
0.932142 
0.831909 
Random 1 
0.674003 
0.45428 
0.31323 
Random 2 
0.61843 
0.382455 
0.50841 
Random 3 
0.311542 
0.097058 
1.37797 
Random 4 
0.632995 
0.400683 
0.27203 
Random 5 
0.665103 
0.442362 
0.76461 
Random 6 
0.385191 
0.148372 
1.09687 
Random 7 
0.583435 
0.340396 
0.68669 
Random 8 
0.446102 
0.199007 
1.00243 
Random 9 
0.413199 
0.170734 
0.91905 
Random 10 
0.788129 
0.621147 
0.008176 
Random Models Parameters 

Average r : 
0.551813 

Average r^2 : 
0.325649 

Average Q^2 : 
0.69331 

cRp^2 : 
0.764888 
Table 6: Y Randomization Parameters test
Figure 2: Plot of predicted activity against experimental activity of training set
Figure 3: Plot of predicted activity against experimental activity of test set
Figure 4: Plot of standardized residual activity versus experimental activity
This work addresses the Quantitative structure activity relationship (QSAR) between ciprofloxacin derivatives and their (pIC_{50}) against prostate cancer. Results from the optimal model showed that the pIC_{50} of the studied molecules against prostate cancer was affected by (AATSC6m, MDEC22 and L3v) descriptors. The robustness and applicability of QSAR equation has been established by internal and external validation techniques. Stability and robustness of the model obtained by the validation test indicate that the model can be used to design other ciprofloxacin derivatives with improved antiprostate cancer activity.
Delongchamps N.B, Singh. A, Haas G. P (2007). Epidemiology of prostate cancer in Africa: another step in the understanding of the disease. Current Problems in Cancer, 31: 226–236.
Odedina F.T, Ogunbiyi J.O, Ukoli F. A (2006).Roots of prostate cancer in AfricanAmerican men. Journal of the National Medical Association 98: 539.
Rathod.A (2011).Antifungal and Antibacterial activities of Imidazolylpyrimidines derivatives and their QSAR Studies under Conventional and Microwaveassisted. Int J PharmTech Res 3: 1942–1951.
Azéma .J, Guidetti .B, Dewelle .J, Le Calve .B, Mijatovic.T et al. (2009).7((4Substituted) piperazin1yl) derivatives of ciprofloxacin: synthesis and in vitro biological evaluation as potential antitumor agents. Bioorganic & Medicinal Chemistry 17:5396–5407.
Shola E.A, Uba.S, Uzairu.A (2018). A Novel QSAR Model for the Evaluation and Prediction of (E)N’Benzylideneisonicotinohydrazide Derivatives as the Potent Antimycobacterium Tuberculosis Antibodies Using Genetic Function Approach. Physical. Chemistry Research. 6: 479492.
Li .Z, Wan.H, Shi.Y, Ouyang. P (2004). Personal experience with four kinds of chemical structure drawing software: review on ChemDraw, ChemWindow, ISIS/Draw, and ChemSketch. Journal of Chemical Information and Computer Sciences 44: 1886–1890.
Singh.P (2013). Quantitative StructureActivity Relationship Study of Substituted[1, 2, 4] Oxadiazoles as S1P1 Agonists. Journal of Current Chemical and Pharmaceutical Sciences.
Afantitis.A, Melagraki.G, Sarimveis.H, Koutentis.P.A, Markopoulos.J (2006). A novel QSAR model for predicting induction of apoptosis by 4aryl4Hchromenes. Bioorganic & Medicinal Chemistry 14: 6686–6694.
Chakraborti A.K, Gopalakrishnan.B, Sobhia M. E, Malde .A (2003). 3DQSAR studies of in dole derivatives as phosphodiesterase IV inhibitors. European Journal of Medicinal Chemistry 38: 975–982.
Khaled K. F. (2011). Modeling corrosion inhibition of iron in acid medium by genetic function approximation method: A QSAR model. Corrosion Science, 53: 3457–3465.
Melagraki .G, Afantitis .A, Makridima .K, Sarimveis .H, IgglessiMarkopoulou. O (2006).Prediction of toxicity using a novel RBF neural network training methodology. Journal of Molecular Modeling, 12: 297–305.
Wu.W, Walczak.B, Massart.D, Heuerding.S, Erni.F et al. (1996).Artificial neural networks in classification of NIR spectral data: design of the training set. Chemometrics and Intelligent Laboratory Systems 33: 35–46.
Tropsha.A, Gramatica. P, Gombar V. K. (2003). The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. Molecular Informatics 22: 69–77.
Veerasamy .R, Raja H, Jain .A, Sivadasan .S, Varghese C. P et al. (2011). Validation of QSAR modelsstrategies and importance. International Journal of Drug Design & Discovery 3: 511–519.
JalaliHeravi M, Kyani .A (2004). Use of computerassisted methods for the modeling of the retention time of a variety of volatile organic compounds: a PCAMLRANN approach. Journal of Chemical Information and Computer Sciences, 44:1328–1335.