Scoring Function

Scoring Function

There are three important applications of scoring functions in molecular docking. The first of these is the determination of the binding mode and site of a ligand on a protein. Given a protein target, molecular docking generates hundreds of thousands of putative ligand binding orientations/conformations at the active site around the protein. A scoring function is used to rank these ligand orientations/conformations by evaluating the binding tightness of each of the putative complexes. An ideal scoring function would rank the experimentally determined binding mode most highly. Given the determined binding mode of a ligand, scientists would be able to gain a deep understanding of the molecular mechanism of ligand binding and to further design an efficient drug by modifying the protein or ligand. The second application of a scoring function, which is related to the first application, is to predict the absolute binding affinity between protein and ligand. This is particularly important in lead optimization. Lead optimization refers to the process to improve the tightness of binding for low-affinity hits or lead compounds that have been identified. During this process, an accurate scoring function can greatly increase the optimization efficiency and save costs by computationally predicting the binding affinities between the protein and modified ligands before the much more expensive step of ligand synthesis and experimental testing. The third application, perhaps the most important one in structure-based drug design, is to identify the potential drug hits/leads for a given protein target by searching a large ligand database, i.e. virtual database screening. A reliable scoring function should be able to rank known binders most highly according to their binding scores during database screening. Given the expensive cost of experimental screening and sometimes unavailability of high-throughput assays, virtual database screening has played an increasingly important role in drug discovery. All of these three applications, ligand binding mode identification, binding affinity prediction, and virtual database screening, are related to each other. Presumably, an accurate scoring function would perform equally well on each of them. Despite over a decade of development, scoring is still an open question. Many existing scoring functions perform well only on one or two of the three applications. Roughly, the scoring functions can be grouped into three basic types according to how they are derived: force field-based, empirical, and knowledge-based.

Force field scoring function

Force field (FF) scoring functions are developed based on physical atomic interactions, including van der Waals (VDW) interactions, electrostatic interactions, and bond stretching/bending/torsional forces. Force field functions and parameters are usually derived from both experimental data and ab initio quantum mechanical calculations according to the principles of physics. Despite its lucid physical meaning, a major challenge in the force field scoring functions is how to treat the solvent in ligand binding. One typical force field scoring function in molecular docking is the scoring function of DOCK whose energy parameters are taken from the Amber force fields. The scoring function is composed of two energy components of Lennard-Jones VDW and an electrostatic term

 

 

where rij stands for the distance between protein atom i and ligand atom j, Aij and Bij are the VDW parameters, and qi and qj are the atomic charges. Here, the effect of solvent is implicitly considered by introducing a simple distancedependent dielectric constant e(rij) in the Coulombic term.

 

Empirical scoring function

A second kind of scoring functions are empirical scoring functions, which estimate the binding affinity of a complex on the basis of a set of weighted energy terms

 

 

Gi represents different energy terms such as VDW energy, electrostatics, hydrogen bond, desolvation, entropy, hydrophobicity, etc. The corresponding coefficients Wi are determined by fitting the binding affinity data of a training set of protein–ligand complexes with known three-dimensional structures. Compared to the force field scoring functions, the empirical scoring functions are much faster in binding score calculations due to their simple energy terms. By calibrating with a dataset of 45 protein–ligand complexes, Bohm developed an empirical scoring function (SCORE1) consisting of four energy terms: hydrogen bonds, ionic interactions, the lipophilic protein–ligand contact surface, and the number of rotatable bonds in the ligand. This empirical scoring function was further improved by expanding the dataset to 82 protein–ligand complexes with known 3D structures and binding constants and by considering the energy parameters for the following terms: the number and geometry of intermolecular hydrogen bonds and ionic interactions, the size of the lipophilic contact surface, the flexibility

of the ligand, the electrostatic potential in the binding site, water molecules in the binding site, cavities along the protein–ligand interface, and specific interactions between aromatic rings. An  empirical scoring

function referred to as ChemScore  was introduced by taking into account hydrogen bonds, metal atoms, the lipophilic effects of atoms, and the effective number of rotatable bonds in the ligand. A new empirical scoring function, X-Score was also introduced , consisting of four energy terms including VDW interactions, hydrogen bonds, hydrophobic effects and effective rotatable bonds. Empirical scoring functions have been exensively  used in many well-known protein–ligand docking programs such as FlexXand Surflex.

 

Knowledge-based scoring function

A third kind of scoring functions are knowledge-based scoring functions (also referred to as statistical-potential based scoring functions), which employ energy potentials that are derived from the structural information embedded in experimentally determined atomic structures. The principle behind knowledge-based scoring functions is simple: Pairwise potentials are directly obtained from the occurrence frequency of atom pairs in a database using the inverse Boltzmann relation

 

 

 

Compared to the force field and empirical scoring functions, the knowledge-based scoring functions offer a good balance between accuracy and speed. Most of the current knowledge-based scoring functions approximate the reference state with an atom-randomized state by ignoring the effects of excluded volume, interatomic connectivity,etc. Gohlke et al. developed a knowledge-based scoring function (DrugScore) based on 17 atom types and 1376 protein–ligand complex structures. The scoring function consists of a distance-dependent pair-potential term and a surface-dependent singlet-potential term. It was validated by using two sets of protein–ligand complexes. A further comparative evaluation of DrugScore and AutoDock shows that DrugScore yields slightly superior results in flexible docking. An improved version (DrugScoreCSD) was also developed based on the Cambridge Structural Database (CSD) of small molecules,which contain low-molecular-weight structures with higher resolution than huge-molecular-weight structures in the Protein Data Bank (PDB). PMF (potential of mean force), was the first knowledge-based scoring function to be extensively tested for affinity predictions. It is developed   by Muegge and Martin.

 

Consensus scoring

 

To take the advantages and balance the deficiencies of different scoring functions, the consensus scoring technique has been introduced to improve the probability of finding correct solutions by combining the scores from multiple scoring functions. Commonly used consensus scoring strategies include vote-bynumber, number-by-number, rank-by-number, average rank, linear combination, etc. Examples of consensus scoring are MultiScore,  X-Cscore, GFscore,  SCS  and SeleX-CS.

 

 

 


Comments

Popular posts from this blog

Bovine Spongiform Encephalopathy (BSE)

Biological databases

Kirby – Bauer disc diffusion method