## Bioactivity - Structure. Interrelation of Electronic and Information Factors of Biologically Activity of Chemical Compounds

*Trends Journal of Sciences Research*, Volume 1, Issue 1, 2018, Pages 38–48.

https://doi.org/10.31586/Biochemistry.0101.06

Received September 29, 2014; Revised November 25, 2014; Accepted December 23, 2014;

Published December 30, 2014

### Abstract

The biological activity of chemical compounds is analyzed using electronic and information factors. We found a linear interrelation between the electronic and information factors of molecules. Moreover, these molecular factors are calculated from different principles. Electronic factor is determined by the quantum-mechanical method from the molecular pseudopotential, whereas the information factor is determined by using the information function. It is shown that these factors are separated off statistically significant bioactive chemical compounds of inactive chemicals. To determine these factors is sufficient to know only the chemical formula of molecules. We analyzed the chemical compounds for toxicity, antiradiation activity, carcinogenicity, antifungal activities. To identify biologically active chemical compounds we used the statistical conjugation method of qualitative attributes.

### 1. Introduction

The knowledge of quantitative stochastic interrelation between the chemical structure of the molecule and its physiological activities has important theoretical and practical significance. Such knowledge is essential for elucidating of the mechanism of biochemical action of molecules, as well as to improve existing products, and for the search of new drugs. However, the task is complicated by the fact that for different classes of chemical compounds there are no reliable, complete and homogeneous experimental data. Here we take into account the comments of Alexander and Bacq ^{ 1} of the importance of the primary chemical structure of the drug for the manifestation of its biological effect. The lack of reliable and complete of experimental physical-chemical and biochemical information about the chemical compounds necessitates the development of methodology for assessing the biological activity based only on the knowledge of the chemical formula of the molecule. We offer here a mathematical approach to analyze the relationship between the structure of a chemical compound and physiological response. This approach does not require of cumbersome calculations.

In this paper, we propose to evaluate the biological activities of chemical compounds by using electronic and information factors. To determine these factors is sufficient to know only the chemical formula of the drug. We suppose that the molecule has some effective electrostatic potential ^{ 2, 3}. There is good reason to believe that this potential may influence the processes regulating the vital functions of the biological object, and thereby determines the biological activity of chemical compounds. Within pseudopotential approach has been shown that the potential of interaction of the molecule with an external electron may be represented as follows:

here functions$f(r)$ and $F(r)$ are the corrections to the Coulomb potential, which depends on the distance *r* between the molecule and the electron. It is well known that the chemical properties of molecules are determined by the outer shell electrons. This approximation is called the “approximation of frozen-core.” In this approximation, the electrostatic potential is determined by the first term of the equation (1). This pseudopotential was applied successfully ^{ 4} to calculate of the molecular potentials of purine and pyrimidine molecules.

### 2. Methods and Discussion

Table 1 shows two groups of bioactive and inactive of sulfur-containing chemical compounds. The first group includes chemicals having radio protective activity (higher, than 50% efficiency), and the second group is chemicals not having radio protective activity even when used at very high doses.

Analysis showed that the parameter (descriptor) *Z* statistically authentically separates preparations having radioprotective effect of chemical compounds that do not have a protective effect. Indeed, for effective radio protectors (dose < 1 mmol/kg), the value of the parameter *Z* ≤ *Z*^{(av)} = 3.0. Here *Z*^{(av)} is the average value of the random sample of the Table 1. We accept the parameter *Z*^{(av)} = *Z*^{*} as threshold character. At the same time, for chemical compounds that do not have a protective effect the parameter *Z* > *Z*^{(av)}. To confirm the validity of the statistical separation of chemical compounds into groups we use the statistical method of dichotomous signs comparison. Using the data in Table 2, we define the coefficient of association of Pearson $\Phi $ ^{ 7}:

here *q*_{11} = 29 is the quantity of effective preparations. For these preparations the parameter *Z* < *Z*^{(av)}; *q*_{12} = 3 are the quantity of bioactive chemical compounds which *Z* ≥ *Z*^{(av)}; *q*_{22} = 23 are the quantity of inefficient chemical compounds which *Z** *≥ *Z*^{(av)}; *q*_{21 }= 5 are the quantity of inefficient chemical compounds which *Z* < *Z*^{(av)} (Table 2). Obviously, the classification model is better if a table close to the diagonal form.

Asymptotic standard error of the coefficient of association is equal to

Checking of the significance of the coefficient Φ by the criterion $\chi {}^{2}$ gives the following inequality ^{ 7}:

here $n=$${q}_{11}+{q}_{12}+{q}_{22}+{q}_{21}$. This inequality confirms the non-randomness of the interrelation between *Z* and bioactivities of sulfur-containing chemical compounds.

We introduce one more descriptor to calculate of it sufficiently to know only the structural formula of a chemical compound. We define this descriptor using the methods of the information theory. It is known that a quantitative measure of the content of information in a multicomponent systems consisting of objects belonging to the same set is determined by the Shannon information function ^{ 8, 9}. For a discrete collection of objects, it is determined in *bits* units in the following way: $H=-{\displaystyle \sum _{i=1}^{N}{p}_{i}{\mathrm{log}}_{2}{p}_{i}}$ under additional conditions: 0 ≤ *p*_{i} ≤ 1 and $\sum _{i=1}^{N}{p}_{i}$= 1; $N$ is the discrete number of objects (atoms) of the set, which determine the space of possible values of ${p}_{i}={n}_{i}/N$; *n*_{i} is the quantity of atoms *i*th kind. Function *H* is an integral index of the state of the multicomponent system. The values of *p*_{i} determine the share of the *i*th element in the entire collection of the set of elements, i.e., *p*_{i} assigns the number of realizations or possible outcomes. Actually, to calculate the specific numbers of *p*_{i}, we use A.N. Kolmogorov’s combinatorial approach ^{ 9} for a collection of *n*_{i} elements entering this set with mass *p*_{i}. Function *H* is used for a quantitative determination of the measure of organization or diversity of multicomponent systems. The values of *p*_{i }are calculated from the data on the content of atoms in molecular structure. In this case, the quantity of information in a molecule is only a function of number of the various atoms of a finite set. The less the value of the information function a multicomponent system is more diverse.

From Table 1 it follows that for efficacious radio protectors the average value of the information function is equal to ${H}_{1}^{(av)}=1.79$ *bits* (*S*_{1} = 0.16, *N*_{1} = 32). At the same time, for the sulfur-containing chemical compounds without the radio protective effect the information function is equal to ${H}_{2}^{(av)}=1.97$ *bits* (*S*_{2} = 0.16, *N*_{2} = 32). We will check the statistical significance of the distinction between these average values. At first we determine the distinction between the variances of ${S}_{1}^{2}$ and ${S}_{2}^{2}$: $F={S}_{1}^{2}/{S}_{2}^{2}=1.14<{F}_{31,27;0.05}^{(cr)}=1.8$. Consequently, the comparison of the average values of the information functions can be done using the following equation ^{ 10}:

This inequality indicates the statistical significance of differences in average values of the information functions ${H}_{1}^{(av)}$ and ${H}_{2}^{(av)}$. Thus, around these average values are grouped active and inactive chemicals, respectively.

The average value of ${H}_{}^{(av)}$ = *H*^{*} = 1.87 *bits* is the threshold value of the information function. Value of entropy ${H}_{}^{(av)}$ was obtained for the random sample of chemical compounds of Table 1. The following inequality *H* $\le {H}_{}^{(av)}$ is realized for effective radio protectors. For the inefficient with respect to of the chemical compounds are holds the inequality: *H *$>{H}_{}^{(av)}$. Again, using the comparison method of qualitative features we obtained the following statistics: $\Phi =0.77$; ${\chi}^{2}=35.6>>{\chi}_{1;0.05}^{2(cr)}=3.84$.

We shall check the validity of these classification rules for the following sulfur-containing chemical compounds: NH_{2}CH_{2}CH_{2}CH_{2}SH, NH_{2}CH_{2}CH_{2}SH and NH_{2}CH_{2}CH_{2}SС(=NH)NH_{2}. These chemical compounds have significant antiradiation action. They were not included into the initial random sample. The information function (the electronic factor) has following values: *H* = 1.43 (*Z* = 2.29), 1.49 (*Z* = 2.36) and 1.62 (*Z* =2.63) *bits*, respectively. That is, the classification rules are executed for these sulfur-containing chemical compounds.

It is easy to show that the information function and the electronic parameter are interrelated (Figure 1). We obtained the following statistics:

here *A* = 0.258, *B* = 0.527; $t=9.5>{t}_{58;0.05}^{(cr)}=2.00$, $R=0.78>{R}_{59;0.05}^{(cr)}=0.22$, $F=99.8>>{F}_{1;58;0.05}^{(cr)}=4.0$.

*H*, the electronic factor

*Z*and radioprotective efficacy of N-substituted S-2-aminoethylthiosulfates (RNHCH

_{2}CH

_{2}SSO

_{3}H)

Table 1 presents the radio protectors, which belong to different chemical classes of sulfur-containing compounds. We now analyze the chemical compounds of the homologous series of N-substituted S-2-aminoethylthiosul fates (Table 3). These chemicals are used also as radio protectors. We estimated the information function *H* and the electronic factor *Z* for molecular structures of this series compounds. Applying the method of conjugation of qualitative attributes we can establish the relationship between the magnitude of the therapeutic index *T* and the values of the electronic factors and the information function (Figure 2).

We can determine the contingency coefficient (Figure 2A and Figure 2B) using the threshold characters: *Z** = 2.75 и *H** = 1.7 *bits*. As a result, we obtained Φ = 0.43 (${\chi}^{2}=10.3>>{\chi}_{1;0.05}^{2(cr)}=3.84$) for Figure 2A and Φ = 0.52 (${\chi}^{2}=15.3>>{\chi}_{1;0.05}^{2(cr)}=3.84$) for Figure 2B. These coefficients indicate the statistical significance of the association of the chemical compounds bioactivity with factors of *Z* and *H*. Figure 3 shows the interrelation of the factors *Z* and *H*. We found that there exists a statistically significant linear relationship between *H* and *Z* both different classes of chemical compounds (Table 1) and homologous series of compounds (Table 3):

Importantly, the parameters A and B are close to each other in the correlating equations (6) and (7). Thus, the interrelation between the factors *Z* and *H* is very close both the homologous series of radiation protectors and nonhomologous of series chemical compounds. That is, for highly active sulfur-containing chemical compounds the regions of this interrelation are congruent. However, we should note that this technique is not adapted oneself to tryptamine derivatives ^{ 12}. Apparently, this is due to the fact of the mechanism of the antiradiation action of these compounds fundamentally different from the mechanism action of sulfur-containing compounds.

We examine the linkage between the carcinogenic action of chemical compounds with factors *Z* and *H*. Statistical analysis of the data in Table 4 it shows that by using factors *H* and Z we can be separated the carcinogenic chemicals from non-carcinogenic drugs. We will take the average values of these factors (*H*^{(av)} = *H*^{*} = 1.57 *bits* and *Z*^{(av)} = *Z*^{*} = 2.95) as threshold characters. Then the method of conjugation of qualitative factors *H* and *Z* with the carcinogenic action of chemical compounds has the following values for the coefficients of association: Φ = 0.61 (${\chi}^{2}=22.3>{\chi}^{2(cr)}$) and Φ = 0.86 (${\chi}^{2}=44.4>{\chi}^{2(cr)}$), respectively.

For chemical compounds of Table 4 the interrelation between factors of *Z* and *H* is linear (Figure 4):

That is, the interrelation is linear.

Thus, we found that the regions of the factor variables *Z* and *H* separated the chemical compounds statistically authentically into two subsets with different biological activity. It is important to note that we have used various principles of search explanatory factors, namely quantum and information. However, we obtained close results.

Now we analyze the interrelation of antifungal activity of benzo-2,1,3-thia- and selenadiazoles derivatives with electronic and information factors (Figure 5). For this goal we use experimental data of the paper ^{ 14}. The antifungal activity of benzo-2,1,3-thiadiazole derivatives ^{ 15}, was detected at various test objects. However, there is not exist a common method for testing of antifungal drugs. This makes difficulties for identify the quantitative relationship between the chemical structure of compounds and their biological actions. Therefore, we apply the methodology that used above.

Antifungal Activity and toxicity test series of compounds list in Table 5. We use the following symbols: *Ft* is the inhibition of late blight, *Ve* is the inhibition of fungi species Venturia inaqualis, *As* is the inhibition of fungi species Aspergillus niger, *Fu* is the inhibition of fungi species Fusarium moniliforme.

We analyze the conjugation of qualitative attributes: the toxicity of the compounds of a number of benzo-2,1,3-thia- and selenadiazoles derivatives with the factors *Z* and *H*. These factors are possessed of alternative variation. This allows us the results of observations and the values of factors *H* and *Z* represent as the fourfold table. For the boundary values we will assume the following values: *Z*^{*} = 3.73, *H*^{*} = 2.1 *bits* and $\mathrm{lg}\overline{L}{\overline{D}}_{50}^{*}$ = 2.34, which are average values obtained for the full sample compounds. Figure 6A and Figure 6B represent the diagrams of the distribution of chemical compounds of Table 5.

Using the data in Table 5 we can define a quantity of chemical compounds q22 (lower right quadrant) for which are equitable the following inequalities: $\mathrm{lg}L{D}_{50}<\mathrm{lg}\overline{L}{\overline{D}}_{50}^{*}$, and *Z* > *Z*^{*}. The total quantity of chemical compounds, which fall into this area are equal to *q*_{22} = 8 (Figure 6A). Average value of toxicity and the value of the electronic parameter are equal to $\mathrm{lg}\overline{L}{\overline{D}}_{50}^{(22)}$ = 1.73 and ${\overline{Z}}^{(22)}$ = 4.26, respectively. The variances are equal to ${S}_{22}^{2(LD)}$= 0.251 and ${S}_{22}^{2(Z)}$= 0.345 for this group of chemical compounds. For qualitatively alternative groups when $\mathrm{lg}L{D}_{50}>\mathrm{lg}\overline{L}{\overline{D}}_{50}^{*}$ and *Z* <*Z*^{*} (upper left quadrant of Figure 6A) we have the following statistics: $\mathrm{lg}\overline{L}{\overline{D}}_{50}^{(11)}$ = 2.78 и ${\overline{Z}}^{(11)}$ = 3.39, ${S}_{11}^{2(LD)}$ = 0.329 and ${S}_{11}^{2(Z)}$= 0.254. The total quantity of the chemical compounds in this quadrant is equal to *q*_{11} = 15. We verify the statistical significance of distinction in average values of the parameters that define the alternative grouping. Preliminarily we will define the statistical significance of distinction of the variances. For this goal we use the *F-* distribution of Fisher:

In both cases, the distinctions are statistically insignificant. Therefore, to determine the distinctions in the average value of grouping we can use the following relations:

The weighted average dispersion *S*^{2} is determined from the following relationship:

The distinction of average values is statistically significant if the difference is greater than the parameter *T*. Using the equations (10) we find the following differences: $|{\overline{Z}}_{11}-{\overline{Z}}_{22}|$ = 0.86 > ${T}^{(Z)}$ = 0.43 and $|\mathrm{lg}\overline{L}{\overline{D}}_{50}^{(11)}-\mathrm{lg}\overline{L}{\overline{D}}_{50}^{(22)}|$ =1.04 > ${T}^{(LD)}$ = 0.43. Hence, highly toxicity chemicals and the chemical compounds with low toxicity are grouped in two distinct domains of plane *Z*, $\mathrm{lg}L{D}_{50}$. We verify grouped chemicals on statistical homogeneity. That is, we will find out whether there are chemical compounds whose parameters are significantly different from the mean values. We will check maximum (*max*) and minimum (*min*) values of the factors for the chemical compounds. For this goal we will use τ – distribution ^{ 10}:

Calculations show that the grouped compounds belong to a homogeneous group. For example, for the compounds of *q*_{22} we obtained the following inequality: $|\overline{Z}-{Z}_{\mathrm{max}}|/\sqrt{{S}_{22}^{2(Z)}}=0.90<{\tau}_{7;0.05}^{(cr)}=2.09$. This inequality confirms the statistically homogeneous of the aggregate set of elements. Similar inequalities hold for other grouping of the chemical compounds. Thus, toxic and weakly toxic the chemical compounds separated authentically from each other in the parameters space (*Z*, $\mathrm{lg}L{D}_{50}$). Coefficient of contingency is equal to Φ = 0.66 (${\chi}^{2}=n{\Phi}^{2}=16.6>{\chi}_{1;0.05}^{2(cr)}=3.84$). Similarly, we can show that the information function *H* is also partitions reliably the chemical compounds on bioactive and inactive drugs. In this case coefficient of contingency is equal to Φ = 0.58 (${\chi}^{2}=n{\Phi}^{2}=9.8>{\chi}_{1;0.05}^{2(cr)}=3.84$).

Using this method, we analyze the interrelation of the structural properties of the chemical compounds with the antifungal activity. Figure 7A and Figure 7B represents the fourfold table for bioactivity (*Fu*) with factors of *Z* and *H*.

*q*

_{ij}) and boundary values of

*Z*

^{*}and

*H*

^{*}. In parentheses are the values for the information function

By analogy, we can create the fourfold table for the bioactivity of the chemical compounds associated with the suppression of late blight and antifungal activity: Venturia inaqualis (*Ve*), Aspergillus niger (*As*). Statistical characteristics of the interrelation of bioactivity with the structure of the molecules are shown in Table 6. The active compounds are grouped around the average value ${\overline{Z}}_{11}$, and inactive chemicals around the average value of ${\overline{Z}}_{22}$. Thus, this approach allows perform mathematical analysis of the source database and opens the possibility to use this technique to search for new drugs in this series of the chemical compounds.

The domain of frequencies *q*_{11}: activity is < 50%, *H* < *H*^{*}, *Z* < *Z*^{*}; the domain of frequencies* q*_{22}: activity is > 50%, *H* > *H*^{*}, Z > *Z*^{*}; the domain of frequencies* q*_{21}: activity is > 50%, *H* < *H*^{*}, Z < *Z*^{*}; the domain of frequencies* q*_{12}: activity is < 50%, *H* > *H*^{*}, *Z* >* Z*^{*}. We have revealed the interrelation of bioactivities with a single parameter *Z* (or the information function *H*). This allows us to suggest an interrelation between bioactivities of the chemical compounds. The results in Table 6 indicate that the bioactivities of these chemical compounds are interlinked with the factors *Z* and *H*. That is, these factors are statistically alternative.

By using the data in Table 5 it can be shown that the information function *H* is linearly related to the factor *Z* (Figure 8). The correlations are linear and positive for all sulfur-containing radio protectors, carcinogenic chemical compounds and antifungal drugs. The slope of the correlation line lies in a narrow range from 0.30 to 0.53 indicating that the interrelation is not random but is inherent in the different molecular structures. The magnitude of this slope is a measure of the mutual coupling of factors *H* and *Z*. Chemicals of No. 50-52 were not considered in constructing of the correlation equation since the parameters *Z* and *H* for these compounds do not satisfy the condition of the homogeneity. Heterogeneity occurs when the quantity of atoms in the substituent *R*_{i} exceeds the quantity of atoms of the base molecule.

Inasmuch as ${\chi}^{2}>>{\chi}_{1;0.05}^{2(cr)}=3.84$ we can presume that there exists a statistically significant the interrelation of bioactivities. Threshold processes of biological activity are characterized by the following principle: as long as the descriptor is not reached of the threshold value, bioactivities of the chemical compounds probably negligible. At the same time if the magnitude of the descriptor lies beyond the threshold value, the biological effect of the drug is close to the maximum value. Thus, without a complete set of experimental physical and chemical data on chemical compounds, we can set a statistically significant correlation between molecular structure of a chemicals and bioactivities of the molecules, as well as to predict the bioactivity of new chemical compounds.

### 3. Conclusion

It is also possible propagation of this approach to modeling bioactivity for other classes of chemical compounds. The fact that the factors *Z* and *H* are not abstractions, but they have a physical meaning, which is associated with such properties of molecules as the molecular pseudopotential, the hydrophobic of molecules ^{ 3}, the total electronic energy of the molecules. If these properties of the chemicals are crucial in the mechanism of bioactivities, we can expect some commonality of the proposed approach to assess the interrelation bioactivity of chemical compounds with their molecular structure. The positive linear interrelation between the factors *Z* and *H* is not a random. As is known, the information function determines the diversity of the molecular structure. In turn, the molecular structure is determined by the quantity of different atoms that form the associated molecular formation. At the same time, the structure of the molecule is not an arbitrary number of different atoms. The structure of the molecule is determined by the quantity of valence electrons in the outer shell of an atom. Apparently, this quantum-chemical property of molecular compounds establishes the linear interrelation between the factors *Z* and *H*.

### References

*Fundamentals of Radiobiology*. Pergamon Press, Oxford – New-York – Paris.

*Phys. Lett.,*45A, 59-60.

*. J. Chem. Eng. Chem. Res*., 1(1), 54-65.

*Survey of compounds from the antiradiation drug development programm.*Washington.

*Methoden der Korrelations- und Regressionsanalyse*. Verlag Die Wirtschaft. Berlin.

*Bell Syst. Techn. Journal*. 27, 379-423.

*Information Theory and Theory of Algotithms*. Nauka. Moscow. (in Russian).

*Statistical Methods of Analysis and Processing Observations*. Nauka, Moscow. (in Russian).

*J. Med. Chem.*11, 1190-1201.

*Chem. Rapid Commun*. 1(1), 15-20.

*Pharm. Chem. Journal.*. 25(12), 900-906.