Figure 1 Flowchart of the proposed system.


MATERIALS AND METHODS
The proposed algorithm is composed of five major steps (see Figure 1),
a PDB identifier or a PDB file can be imported into the system. Multiple chains or specific range of a protein structure can be assigned . Each step is briefly introduced as follows.
Gridbased Protein Structure Construction
The retrieved file should be stored in the PDB format .
In this proposed system, the coordinates of atoms and corresponding van der Waals radii are transformed into corresponding volume pixels (voxels) within a grid structure.
After discretization processes, the query protein can be represented as a set of discrete voxels which are categorized to inside, outside and surface portions of the query protein respectively.
Solid Angle Computation
For each surface voxel within a protein, the proposed system computes corresponding solid angle by using the following formula:
(1)
In this step, the recommended radius of the sphere by Connolly is defined as 6 A for all surface voxels. The proposed system employed CUDA codes to enhance the computational performance.

Surface Anchor Residues and Clustering
Since we are looking for binding cavities from the query protein, only surface voxels with values of solid angles ranked in the top 20% were clustered into representative groups.
Two surface voxels were clustered into the same group when they were neighboring voxels and located within a threshold distance (8 A) and both voxels possess solid angles at similar level.
A voxel with the largest solid angle within a clustered group was considered as a feature point and claimed as an anchor of the group.
Figure 3 The clustered surface points of the query protein structure (PDB ID: 1TPA) after clustering
based on solid angles.
Geometric Feature Calculation
After the assignment of clustered groups and representative
anchors, the system calculates extra geometric characteristics for each group. The
following sections describe the geometric features in details:
Average Depth of Cavity
To avoid such high variations of neighboring surface residues within a group, an average depth of a potential cavity was calculated and verified.
The average depth was heuristically defined and evaluated
according to the following formula:
Figure 4 A simplified example of average depth indicator for an anchor
cluster with 6 neighboring surface residues.
Volume of a potential cavity
The volume of selected cavities provides powerful discrimination between binding and nonbinding regions. In this study, the volume indicator of a cluster was obtained by taking the anchor surface residue as a center and formulating a virtual sphere with a radius of 10 A.
Those voxels located within the virtual sphere but not inside the query protein were then evaluated by taking 7 directional vectors including the edge and diagonal vectors of a cube.
If extending both directions of one of the directional vectors could intersect with the query protein simultaneously, then this directional vector was assigned as the interior directional vector.
For each voxel under investigation, if it possesses more than or equal to 4 verified interior vectors, this voxel is defined as part of the volume within the cavity.
After examining all voxels in the virtual sphere, total interior voxel counts could provide as the volume value for the cluster..
Figure 5 An example of calculating volume indicator:
When both geometric features of average depth and volume were obtained, a measuring score
combining with linear weighting coefficients was performed for ranking all identified
potential binding regions. The formula is written as the following equation:
Where the RV(p) is the ranking value for anchor residue p; CD(p)avg is the average depth
value for anchor residue p; CDmax is the maximum depth for the query protein; CV(p) is the
volume for anchor residue p; CVmax is the maximum volumn for the query protein; the sum of
both weighting coefficients of w_{1} and w_{2} is equal to 1.

