ProjectsvisGReMLIN: Interactions between proteins and ligands play an important role in biological processes of living systems. For this reason, the development of computational methods to facilitate the understanding of the ligand-receptor recognition process is of fundamental importance, since this comprehension is a major step towards ligand prediction, target identification, lead discovery, among others. This work presents visGReMLIN, a visual interactive interface to explore protein-ligand interactions and their conserved substructures for a set of similar proteins. To illustrate the potential of our strategy, we used two test datasets, Ricin and human CDK2, which have their protein-ligand interface modeled as bipartite graphs, where an edge depicts an interaction between a protein node and a ligand node. Such graphs are the input to search for frequent subgraphs that are the conserved interaction patterns over the datasets. The input graphs and their patterns can be explored to find general trends and exceptions concerning types of atoms and interactions. A text search to help users to find residues/atoms of interest (for example, atoms from CDK2 hinge region and from Ricin A chain active site) is also provided. Additionally, visGReMLIN provides some visualizations of basic statistics on frequencies of atoms and interactions of specific types in the dataset. Finally, our strategy permits users to select an interaction pattern to highlight it in the context of 2D interface graphs and in a 3D molecule viewer.
nAPOLI: Molecular recognition plays an important role in biological systems and is observed between receptor-ligand, antigen-antibody, DNA-protein, RNA-ribossome, etc. Molecular recognition is a phenomenon of organization very difficult to predict or design even for small molecules. Due to its remarkable importance, molecular recognition was studied under different perspectives in Bioinformatics. Several studies focused on seeking patterns of molecular recognition on datasets consisting of a specific receptor and multiple ligands or vice versa by using varied data mining techniques. The analytical process is extremely toilsome as an expert has to assay each of the patterns carefully and they can be very voluminous. Therefore, this paper proposes a quite different approach which aims at being more easy and intuitive. The use of images to represent information is becoming more and more appreciated for the benefits it can bring to science by providing a powerful means both to make sense of data and to communicate. Data visualization has in recent years become an established area of study in academia and is increasingly being used in biological data visualization. In this paper, we propose visual and interactive strategies to depict the types of interactions established between a protein and its ligands and materialized its strategies into a tool called nAPOLI (Analysis of PrOtein Ligand Interactions). Particularly, to show an example of use of the tool, we are going to focus on a case study of an important family of enzymes: the Cyclin Dependent Kinases II (CDK2).
VERMONT: In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.
ENZYMAP: The volume and diversity of biological data are increasing at very high rates. Vast amounts of protein sequences and structures, protein and genetic interactions and phenotype studies have been produced. The majority of data generated by high-throughput devices is automatically annotated because manually annotating them is not possible. Thus, efficient and precise automatic annotation methods are required to ensure the quality and reliability of both the biological data and associated annotations. We proposed ENZYMatic Annotation Predictor (ENZYMAP), a technique to characterize and predict EC number changes based on annotations from UniProt/Swiss-Prot using a supervised learning approach. We evaluated ENZYMAP experimentally, using test data sets from both UniProt/Swiss-Prot and UniProt/TrEMBL, and showed that predicting EC changes using selected types of annotation is possible. Finally, we compared ENZYMAP and DETECT with respect to their predictions and checked both against the UniProt/Swiss-Prot annotations. ENZYMAP was shown to be more accurate than DETECT, coming closer to the actual changes in UniProt/Swiss-Prot. Our proposal is intended to be an automatic complementary method (that can be used together with other techniques like the ones based on protein sequence and structure) that helps to improve the quality and reliability of enzyme annotations over time, suggesting possible corrections, anticipating annotation changes and propagating the implicit knowledge for the whole dataset.
ADVISe: In this paper, we propose an interactive visualization called ADVISe (Annotation Dynamics Visualization), which tackles the problem of visualizing evolutions in enzyme annotations across several releases of the UniProt/SwissProt database. More specifically, we visualize the dynamics of Enzyme Commission numbers (EC numbers), which are a numerical and hierarchical classification scheme for enzymes based on the chemical reactions they catalyze. An EC number consists of four numbers separated by periods and represents a progressively finer classification of the catalyzed reaction. The proposed interactive visualization gives a macro view of the changes and presents further details on demand, such as frequencies of change types segmented by levels of generalization and specialization as well as by enzyme families. Users can also explore entry metadata. With this tool, we were able to identify trends of specialization, database growth and exceptions in which EC numbers were deleted, divided or created and revisions of past annotation errors.