Identifying Enzymes Active Site with Genetic Algorithms

Sandro Carvalho Izidoro, Raquel C. de Melo-Minardi, Gisele Lobo Pappa.

Abstract

Motivation: Given the number of proteins cataloged but with unknown function, the development of computational methods to perform function prediction efficiently and accurately is still a challenge. This paper focus on identifying new attributes to help enzyme function prediction. The proposed method searches for an arrangement of amino acids directly involved in the catalysis reaction, called catalytic or active site, which are responsible for molecular recognition. Due to their importance, the active site amino acids are more conserved during evolution than the sequence as a whole, and can be successfully used for protein function prediction. The objective of this work is to present a new technique to find similar active sites based on genetic algorithms (GA). The method can perform non-exact amino acid matches (taking conservative evolution into account) without restriction on the number of amino acids and find active sites in different protein chains.

Results: The use of GA in search of similar active sites, hitherto unpublished, proved promising in data sets with different characteristics. In specific enzymes families, GA found active sites according to CSA (Catalytic Site Atlas). Tests using enzymes family Serine Protease and comparing GA proposed with other existing software showed that it is able to recover more active sites with better accuracy. The implementation of a ranking to select the best individuals (possible active sites) for each enzyme and the adaptation of the mutation operator to deal with conservative mutations, gave more to flexibility and robustness to the GA. Furthermore, the possibility of finding residues of a site in more than one chain and the absence of restriction on the size of the active site, make GA a good tool to be used in the prediction function of proteins.

Coming soon: GASS Web Server
Source code and data set exemple: