Academic Positions

  • Present 2012

    Associate Professor

    Federal University of Minas Gerais, Department of Computer Science

  • 2011 2010

    Postdoctoral Researcher

    University of Campinas, Institute of Computing

Education & Training

  • Ph.D. 2010

    Ph.D. in Computer Science

    University of Maryland, College Park, USA

  • M.Sc.2005

    Master in Computer Science

    Federal University of Parana, Brazil

  • B.Sc.2003

    Bachelor in Computer Science

    Federal University of Parana, Brazil

Honors, Awards and Fellowships and Grants

  • 2015 -
    FAPEMIG Researcher
  • 2013 -
    CNPq Productivity Fellowship - level 2
  • 2005 - 2009
    CAPES/Fulbright Ph.D. Fellowship
  • 2003 - 2005
    CNPq M.Sc. Fellowship
  • 2001 - 2003
    SESu/CAPES Special Training Program (PET) Fellowship

Research Topics

  • image

    Anomalous Event Detection

    Computer Vision | Smart Surveillance

    During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. In surveillance systems anomaly behavior is a very important task. Actually, this task is performed by humans. Automatizing this task is pretty complicated since exist infinity possible situations if we think in a scalable automatic surveillance. To reduce the complexity we may look for patterns, normal and anomalous. An anomalous event might be characterized as an event that deviates from the normal or usual, but not necessarily in an undesirable manner, e.g., an anomalous event might just be different from normal but not a suspicious event from the surveillance stand point.

    The current project is about anomalous pattern recognition. Assuming images captured from a single camera, our model use our proposed feature called Flow Orientation and Magnitude (HOFM). This feature is based on optical flow information to describe the normal patterns on the scene, so that we can employ a simple nearest neighbor search to identify whether a given unknown pattern should be classified as an anomalous event. Our descriptor captures spatiotemporal information from cuboids (regions with spatial and temporal support) and encodes both magnitude and orientation of the optical flow separately into histograms.

  • image

    Automatic License Plate Recognition

    Computer Vision | Machine Learning | Smart Surveillance | Image Processing

    ALPR is a field in which many researchers have been working over the years. This reflect the important role that it plays in numerous real-life applications such as automatic toll collection, traffic law enforcement, parking lot access control and road traffic monitoring. The process of license plates automatic identification is composed by single tasks that must work as well. Examples of these are car and plate detection, character segmentation, optical character recognition and others. This study is challenging when we do not have control about all the condition of the relevant environment such as car speed and acceleration, plate position, light conditions and some others.

  • image

    Change Detection

    Computer Vision | Machine Learning | Remote Sensing

    Information regarding changes in a scene plays a central role in a myriad of applications. Disaster management, urban growth, security, burned areas, and surveillance are examples of problems which the knowledge of local changes is essential. In general, video monitoring performed by humans is error prone due to the known lack of attention in repetitive tasks that take may many hours. The manual monitoring of changes is very labor intensive and often impractical, resulting in inconsistent and infrequent data sampling as well as human error. The basic task of change detection in images is to find pixels (or regions) from a reference image that are different from other images. Significant changes may include appearance or disappearance of objects, motion of objects, and shape changes in objects. On the other hand, variations in the image caused by camera motion, noise, illumination changes, nonuniform attenuation,atmospheric absorption, swaying trees, rippling water or flickering monitors, must be ignored by the change detection method.

  • image

    Face Recognition

    Computer Vision | Machine Learning | Smart Surveillance

    The three primary face recognition tasks are verification, identification, and watch list. In verification, the task is to accept or deny the identity claimed by a person. In identification, an image of an unknown person is matched to a gallery of known people. In the watch list task, a face recognition system must first detect if an individual is on the watch list. If the individual is on the watch list, the system must then correctly identify the individual. We address the identification task. Due to the availability of large amounts of data acquired in a variety of conditions, techniques that are robust to uncontrolled acquisition conditions, handle small sample sizes, and scalable to large gallery sizes are desirable.

  • image

    Feature Extraction

    Computer Vision | Image Processing

    Visual information contained in images is usually represented by low-level feature descriptors focusing on different types of information, such as color, texture, and shape. An adequate feature descriptor is able to discriminate between regions with different characteristics and it allows similar regions to be grouped together even when captured under noisy conditions. However, it is usually difficult to have a single feature descriptor adequate for many application domains; this has motivated researchers to develop a variety of feature extraction methods.

  • image

    Image Grouping

    Machine Learning | Computer Vision

    Clustering techniques have been widely used in areas that handle massive amounts of data, such as statistics, information retrieval, data mining and image analysis.

    An increasing volume of digital images and videos has become available over the years due to the growth of smartphones, tablets and Internet of Things in general. Therefore, the development of techniques capable of managing large amount of data in a fast and accurate way is important to extract any valuable information.

    Finding natural groupings is the goal of clustering methods, such as K-means. They can help classify and separate information in order to make data analysis easier. Examples of problems related to data grouping are data indexing, data compression and natural image classification. Image Grouping can also be used in order to find discriminative visual concepts which can be used as mid-level features.

  • image

    Pedestrian Detection

    Computer Vision | Machine Learning | Smart Surveillance

    Effective techniques for human detection are of special interest in computer vision since many applications involve people’s locations and movements. Thus, significant research has been devoted to detecting, locating and tracking people in images and videos. Over the last few years the problem of detecting humans in single images has received considerable interest. Variations in illumination, shadows, and pose, as well as frequent inter- and intra-person occlusion render this a challenging task.

  • image

    Person Re-Identification

    Machine Learning | Computer Vision | Smart Surveillance

    Person re-identification deals with automatically tracking individuals across cameras without overlapping field of view. It is an important task to assist security personnel to monitor pedestrian behavior at wider areas with reduced costs. In person re-identification, we have a gallery set and the objective is to match a probe image with one of these individuals. Using image pairs, present in training set, we can learn similarity models. One the main challenges in person re-identification is the higher inter-class similarities than intra-class similarities caused by similar dressing patterns and acquisition conditions (illumination, viewpoint, etc.).

  • image

    Self-Organizing Pedestrian Traffic Lights

    Computer Vision | Smart Surveillance

    The traffic light is one of the valuable devices to control the vehicular and pedestrian traffic. One of its main issues is that several traffic lights might be improperly calibrated once they do not consider the differences in pedestrian mobility from region to region. As each region presents different pedestrians with different characteristics, there is a need for automatic approaches. In this work, we propose a new approach to automatically adjust the pedestrian traffic light program to provide greater safety for pedestrians and to allow a better flow of the traffic. We deal with two challenging cases in transport engineering literature. The first case happens when pedestrians with reduced speed cannot cross the street within the available time. The second case happens when the traffic light for pedestrians remains open for a long time even when there are no pedestrians waiting to cross.

  • image

    Spoofing Detection

    Machine Learning | Computer Vision | Computer Forensics

    Nowadays, we are experiencing an increasing demand for highly secure identification and personal verification technologies. This demand becomes even more apparent as we become aware of new security breaches and transaction frauds. In this context, biometrics has played a key role in the last decade providing tools and solutions either to verify or recognize the identity of a person based on physiological or behavioral characteristics. Among the used features are face, fingerprints, hand geometry, handwriting, iris, retinal vein, and voice. Such methods, however, sometimes can be fooled (spoofed) by an identity thief, specially the ones based on face recognition, in which the thief can obtain a photo of an authentic user from a significant distance, or even obtain it from the Internet.

  • image

    Surveillance Systems

    Computer Vision | Smart Surveillance

    Computer Vision problems applied to visual surveillance have been studied for several years aiming at finding accurate and efficient solutions, required to allow the execution of surveillance systems in real environments. The main goal of such systems is to analyze the scene focusing on the detection and recognition of suspicious activities performed by humans in the scene, so that the security personnel can pay closer attention to these preselected activities. To accomplish that, several problems have to be solved first, for instance background subtraction, person detection, tracking and re-identification, face recognition, and action recognition. Even though each of these problems have been researched in the past decades, they are hardly considered in a sequence, each one is usually solved individually. However, in a real surveillance scenarios, the aforementioned problems have to be solved in sequence considering only videos as the input.

Research Group

Antonio Carlos Nazare Jr.

Ph.D. student

+ Follow

Carlos Antonio Caetano Junior

Ph.D. student

Marco Tulio Alves Nolasco Rodrigues

Ph.D. student

Raphael Felipe de Carvalho Prates

Ph.D. student

Rensso Victor Hugo Mora Colque

Ph.D. student

Victor Hugo Cunha de Melo

Ph.D. student

Artur Jordao Lima Correia

MSc student

Cristianne Rodrigues Santos Dutra

MSc student

Gabriel Resende Goncalves

MSc student

Jessica Sena de Souza

MSc student

Rafael Henrique Vareto

MSc student

Ricardo Barbosa Kloss

MSc student

Samira Santos da Silva

MSc student

My research group is called Smart Surveillance Interest Group (SSIG) and is composed by researchers, graduate and undergraduate students focusing their works on Smart Surveillance, Forensics and Biometrics.

The SSIG maintains a first class computational structure composed by several servers and multiple surveillance cameras that allow large-scale experiments.

Dissertations and Theses

  • 2015
    Cassio Elias dos Santos Jr. (M.Sc.)

    Partial Least Squares for Face Hashing

    Face identification is an important research topic due to areas such as its application to surveillance, forensics and human-computer interaction. In the past few years, a myriad of methods for face identification has been proposed in the literature, with just a few among them focusing on scalability. In this work, we propose a simple but efficient approach for scalable face identification based on partial least squares (PLS) and random independent hash functions inspired by locality-sensitive hashing (LSH), resulting in the PLS for hashing (PLSH) approach. The original PLSH approach is further extended using feature selection to reduce the computational cost to evaluate the PLS-based hash functions, resulting in the state-of-the-art extended PLSH approach (ePLSH). The proposed approach is evaluated in the dataset FERET and in the dataset FRGCv1. The results show significant reduction in the number of subjects evaluated in the face identification (reduced to 0.3% of the gallery), providing averaged speedups up to 233 times compared to evaluating all subjects in the face gallery and 58 times compared to previous works in the literature.

  • 2014
    Antonio Carlos Nazaré Jr. (M.Sc.)

    A Scalable and Versatile Framework for Smart Video Surveillance

    The availability of surveillance cameras placed in public locations has increased vastly in the last years, providing a safe environment for people at the cost of huge amount of visual data collected. Such data are mostly processed manually, a task which is labor intensive and prone to errors. Therefore, automatic approaches must be employed to enable the processing of the data, so that human operators only need to reason about selected portions.

    Focused on solving problems in the domain of visual surveillance, computer vision problems applied to this domain have been developed for several years aiming at finding accurate and efficient solutions, required to allow the execution of surveillance systems in real environments. The main goal of such systems is to analyze the scene focusing on the detection and recognition of suspicious activities performed by humans in the scene, so that the security staff can pay closer attention to these preselected activities. However these systems are rarely tackled in a scalable manner.

    Before developing a full surveillance system, several problems have to be solved first, for instance: background subtraction, person detection, tracking and re-identification, face recognition, and action recognition. Even though each of these problems have been researched in the past decades, they are hardly considered in a sequence. Each one is usually solved individually. However, in a real surveillance scenario, the aforementioned problems have to be solved in sequence considering only videos as the input.

    Aiming at the direction of evaluating approaches in more realistic scenarios, this work proposes a framework called Smart Surveillance Framework (SSF), to allow researchers to implement their solutions to the above problems as a sequence of processing modules that communicates through a shared memory.

    The SSF is a C++ library built to provide important features for a surveillance system, such as a automatic scene understanding, scalability, real-time operation, multi-sensor environment, usage of low cost standard components, runtime re-configuration, and communication control.

  • 2014
    Victor Hugo Cunha de Melo (M.Sc.)

    Fast and Robust Optimization Approaches for Pedestrian Detection

    The large number of surveillance cameras available nowadays in strategic points of major cities provides a safe environment. However, the huge amount of data provided by the cameras prevents its manual processing, requiring the application of automated methods. Among such methods, pedestrian detection plays an important role in reducing the amount of data by locating only the regions of interest for further processing regarding activities being performed by agents in the scene. However, the currently available methods are unable to process such large amount of data in real time. Therefore, there is a need for the development of optimization techniques. Towards accomplishing the goal of reducing costs for pedestrian detection, we propose in this work two optimization approaches. The first approach consists of a cascade of rejection based on Partial Least Squares (PLS) combined with the propagation of latent variables through the stages. Our results show that the method reduces the computational cost by increasing the number of rejected background samples in earlier stages of the cascade. Our second approach proposes a novel optimization that performs a random filtering in the image to select a small number of detection windows, allowing a reduction in the computational cost. Our results show that accurate results can be achieved even when a large number of detection windows are discarded.

Filter by topic:

Sort by year:

Selected Publications

Histograms of Optical Flow Orientation and Magnitude to Detect Anomalous Events in Videos

R. V. H. M. Colque, C. Caetano, W. R. Schwartz
Conference PaperConference on Graphics, Patterns and Images (SIBGRAPI), pp. 1-8, 2015

Abstract

Modeling human behavior and activity patterns for recognition or detection of anomalous events has attracted significant research interest in recent years, particularly among the video surveillance community. An anomalous event might be characterized as an event that deviates from the normal or usual, but not necessarily in an undesirable manner, e.g., an anomalous event might just be different from normal but not a suspicious event from the surveillance stand point. One of the main challenges of detecting such events is the difficulty to create models due to their unpredictability. Therefore, most works model the expected patterns on the scene, instead, based on video sequences where anomalous events do not occur. Assuming images captured from a single camera, we propose a novel spatiotemporal feature descriptor, called \emph{Histograms of Optical Flow Orientation and Magnitude} (HOFM), based on optical flow information to describe the normal patterns on the scene, so that we can employ a simple nearest neighbor search to identify whether a given unknown pattern should be classified as an anomalous event. Our descriptor captures spatiotemporal information from cuboids (regions with spatial and temporal support) and encodes both magnitude and orientation of the optical flow separately into histograms, differently from previous works, which are based only on the orientation. The experimental evaluation demonstrates that our approach is able to detect anomalous events with success, achieving better results than the descriptor based only on optical flow orientation and outperforming several state-of-the-art methods on one scenario (Peds2) of the well-known UCSD anomaly data set, and achieving comparable results in the other scenario (Peds1).

Using Visual Rhythms for Detecting Video-Based Facial Spoof Attacks

A. S. Pinto, H. Pedrini, W. R. Schwartz, A. Rocha
Journal PaperIEEE Transactions on Information Forensics and Security, 10 (5), pp. 1025-1038, 2015

Abstract

Spoofing attacks or impersonation can be easily accomplished in a facial biometric system wherein users without access privileges attempt to authenticate themselves as valid users, in which an impostor needs only a photograph or a video with facial information of a legitimate user. Even with recent advances in biometrics, information forensics and security, vulnerability of facial biometric systems against \rv{spoofing attacks} is still an open problem. Even though several methods have been proposed for photo-based spoofing attack detection, attacks performed with videos have been vastly overlooked, which hinders the use of the facial biometric systems in modern applications. In this paper, we present an algorithm for video-based spoofing attack detection through the analysis of global information which is invariant to content, since we discard video contents and analyze content-independent noise signatures present in the video related to the unique acquisition processes. Our approach takes advantage of noise signatures generated by the recaptured video to distinguish between fake and valid access videos. For that, we use the Fourier spectrum followed by the computation of video visual rhythms and the extraction of different characterization methods. For evaluation, we consider the novel Unicamp Video-Attack Database (UVAD) which comprises 17,076 videos composed of real access and spoofing attack videos. In addition, we evaluate the proposed method using the Replay-Attack Database, which contains photo-based and video-based face spoofing attacks.

Classification schemes based on Partial Least Squares for face identification

G. de P. Carlos, H. Pedrini, W. R. Schwartz
Journal PaperJournal of Visual Communication and Image Representation, 32, pp. 170-179, 2015

Abstract

Approaches based on the construction of highly discriminative models, such as one-against-all classification schemes, have been employed successfully in face identification. However, their main drawback is the reduction in the scalability once the models for each individual depend on the remaining subjects. Therefore, when new subjects are enrolled, it is necessary to rebuild all models to take into account the new individuals. This work addresses different classification schemes based on Partial Least Squares employed to face identification. First, the one-against-all and the one-against-some classification schemes are described and, based on their drawbacks, a classification scheme referred to as one-against-none is proposed. This novel approach considers face samples that do not belong to subjects in the gallery. Experimental results show that it achieves similar results to the one-against-all and one-against-some even though it does not depend on the remaining subjects in the gallery to build the models.

Thermal-to-Visible Face Recognition Using Partial Least Squares

S. Hu, J. Choi, A. L. Chan, W. R. Schwartz
Journal PaperJournal of the Optical Society of America A, 32 (3), pp. 431–442, 2015

Abstract

Although visible face recognition has been an active area of research over the past few decades, cross-modal face recognition is only beginning to be explored by the biometrics community. A thermal-to-visible face recognition system is proposed in this work, consisting of preprocessing, feature extraction, and partial least squares (PLS)-based matching. The preprocessing and feature extraction stages effectively reduce the wide modality gap between the thermal face signature and the visible face signature, facilitating the subsequent one-vs-all PLS-based matching. Thermal cross-examples are introduced into the matching framework to further reduce the modality gap by incorporating cross-modal information into the PLS model building procedure. Performance of the proposed thermal-to-visible face recognition is evaluated on three databases containing visible and thermal imagery acquired under different experimental conditions: time-lapse, physical tasks, mental tasks, and range. The extensive performance evaluation characterizes the impact of the experimental conditions on cross-modal face recognition performance, as well as demonstrating the robustness of the proposed approach across conditions.

Learning to Hash Faces Using Large Feature Vectors

C. E. dos Santos Jr., E. Kijak, G. Gravier, W. R. Schwartz
Conference Paper13th International Workshop on Content-Based Multimedia Indexing (CBMI), Pages 1-8, 2015

Abstract

Face recognition has been largely studied in past years.However, most of the related work focus on increasing accuracy and/or speed to test a single pair probe-subject. In this work, we present a novel method inspired by the success of locality sensing hashing (LSH) applied to large general purpose datasets and by the robustness provided by partial least squares (PLS) analysis when applied to large sets of feature vectors for face recognition. The result is a robust hashing method compatible with feature combination for fast computation of a short list of candidates in a large gallery of subjects. We provide theoretical support and practical principles for the proposed method that may be reused in further development of hash functions applied to face galleries. The proposed method is evaluated on the FERET and FRGCv1 datasets and compared to other methods in the literature. Experimental results show that the proposed approach is able to speedup 16 times compared to scanning all subjects in the face gallery.

CBRA: Color-Based Ranking Aggregation for Person Re-Identification

R. F. C. Prates, W. R. Schwartz
Conference PaperIEEE International Conference on Image Processing (ICIP), Pages 1-5, 2015
c

Abstract

The problem of automatically tracking a pedestrian within camera networks with non-overlapping field-of-view, known as person re-identification, is a challenging task with still suboptimal results. Different features have been proposed in the literature, specially colors which achieved the best results when fused in a unique feature representation. Despite being better than considering individually, the fusion still does not explores all the feature discriminative power. Therefore, we propose the use of rank aggregation to improve the results. In this paper, we address the person re-identification problem using a Color-based Ranking Aggregation (CBRA) method, which explores different feature representations to obtain complementary ranking lists and combine them using the Stuart ranking aggregation method. The obtained experimental results demonstrate a great improvement in state-of-the-art, reaching top-1 rank recognition rates of 50.0% and 56.9% in the ViPER and PRID450S data sets, respectively.

Appearance-Based Person Re-identification by Intra-Camera Discriminative Models and Rank Aggregation

R. F. C. Prates, W. R. Schwartz
Conference PaperIAPR International Conference on Biometrics, Pages 1-8, 2015

Abstract

The main challenges in person re-identification are related to different camera acquisition conditions and high inter-class similarities. These aspects motivated us to handle such problems by learning intra-camera discriminative models, based on training samples, to discover representative individuals for a given sample (probe or gallery samples), referred to as prototypes. These prototypes are used to weight the features according to their discriminative power by using the Partial Least Square (PLS) method. We also exploit models built from the gallery and probe samples to generate re-identification results that will be combined in a single ranking using ranking aggregation techniques. According to the experiments, the proposed method achieves state-of-the-art results. They also demonstrate that aggregating the results achieved by our method with results achieved by a distance metric learning method, outperforms the state-of-the-art, e.g., the top-1 rank is increased in almost 10 percent points for VIPeR and PRID 450S data sets.

An Optimized Sliding Window Approach to Pedestrian Detection

V. H. C. de Melo, S. Leao, D. Menotti, W. R. Schwartz
Conference PaperIAPR International Conference on Pattern Recognition, Pages 4346-4351, 2014

Abstract

While a large number of surveillance cameras available nowadays provide a safe environment, the huge amount of data generated by them prevents a manual processing, requiring the application of automated methods to understand the scene. However, the majority of the currently available methods are still unable to process this amount of data in real time, mainly those focusing on pedestrian detection. To optimize pedestrian detection methods, this work proposes a novel approach that performs a random filtering supported by the Maximum Search Problem theorem to select a very small number from all possible detection windows. Although the random filtering is able to select regions that capture every person on an image, some windows can cover only parts of a person, diminishing the accuracy. To solve that, a regression is applied to adjust the windows to the person's location. The computational cost reduction comes from the fact that the proposed approach does not need to perform any processing while selecting windows, differently from cascades of rejection that must evaluate at least simple features for every window. The experiments performed using a pedestrian detection based on Partial Least Squares show that the approach is effective in both accuracy and computational cost reduction.

Extending Face Identification to Open-Set Face Recognition

C. E. dos Santos Jr., W. R. Schwartz
Conference PaperConference on Graphics, Patterns and Images, Pages 188-195, 2014

Abstract

Face identification plays an important role in biometrics and surveillance. However, before applying face identification methods in real scenarios, we have to determine whether the subject in a test sample is known (enrolled in the face gallery). In this work, we focus on approaches to determine whether a given face sample belongs to a subject enrolled in the face gallery. We show how the approaches can be combined with face identification methods so they can perform open-set face recognition. Among the five approaches described in this work, four are based on responses from the face identification, and one is based on comparisons between known samples and samples from an independent background set. The approaches differ on features explored in the data, scalability and accuracy. We evaluate the proposed approaches in two standard and challenging datasets for face recognition (FRGC and PubFig83). Results considering different number of enrolled subjects show which approach can be considered in scenarios where, for instance, one is interested in recognizing few wanted subjects.

Smart Surveillance Framework: A Versatile Tool for Video Analysis

A. C. Nazare Jr., Cassio E. dos Santos, R. Ferreira, W. R. Schwartz
Conference PaperIEEE Winter Conference on Applications of Computer Vision, Pages 753-760, 2014

Abstract

Computer Vision problems applied to visual surveillance have been studied for several years aiming at finding accurate and efficient solutions, required to allow the execution of surveillance systems in real environments. The main goal of such systems is to analyze the scene focusing on the detection and recognition of suspicious activities performed by humans in the scene, so that the security personnel can pay closer attention to these preselected activities. To accomplish that, several problems have to be solved first, for instance background subtraction, person detection, tracking and re-identification, face recognition, and action recognition. Even though each of these problems have been researched in the past decades, they are hardly considered in a sequence, each one is usually solved individually. However, in a real surveillance scenarios, the aforementioned problems have to be solved in sequence considering only videos as the input. Aiming at the direction of evaluating approaches in more realistic scenarios, this work proposes a framework called Smart Surveillance Framework (SSF), to allow researchers to implement their solutions to the above problems as a sequence of processing modules that communicate through a shared memory.

An Adaptive Vehicle License Plate Detection at Higher Matching Degree

R. F. C. Prates, G. Camara-Chavez, W. R. Schwartz, D. Menotti
Conference PaperIberoamerican Congress on Pattern Recognition, Pages 375-382, 2014

Abstract

In this paper, a novel approach for vehicle license plate detection that improves in both efficiency and quality over the common multiscale search method is proposed. The detection efficiency is improved by employing the result of a single scale sliding window search as a promising guess of the license plate location. The quality is assured by locally refining the initial detection in multiple scales. The main benefit of our method is that we have reached a more precise detection with the analysis of 20 times fewer detection windows with high reliability (96% recall and 70% precision). We also compared our method with an edge-based hybrid approach.

A Data-Driven Detection Optimization Framework

W. R. Schwartz and V. H. C. de Melo and H. Pedrini and L. S. Davis
Journal PaperElsevier Neurocomputing, Volume 104, Pages 35-49, 2013

Abstract

Due to the large amount of data to be processed by visual applications aiming at extracting high-level understanding of the scene, low-level methods such as object detection are required to have not only high accuracy but also low computational cost in order to provide fast and reliable information. Training sets containing samples representing multiple scenes are used to learn object detectors that can be reliably used in different scenarios. In general, information extracted from multiple feature channels is combined to capture the large variability present in these different environments. Although this approach provides accurate detection results, it usually leads to a high computational cost. On the other hand, if characteristics of the scene are known before-hand, a set of simple and fast computing features might be sufficient to provide high accuracy at a low computational cost. Therefore, it is valuable to seek a balance between these two extremes such that the detection method not only works well in different scenarios but also is able to extract enough information from a scene. We integrate a set of data-driven regression models with a multi-stage based human detection method trained to be used in different environments. The regressions are used to estimate the detector response at each stage and the location of the objects. The use of the regression models allows the method to reject large number of detection windows quickly. Experimental results based on human detection show that the addition of the regression models reduces the computational cost by as much as ten times with very small or no degradation on detection accuracy.

Multi-Scale Gray Level Co-Occurrence Matrices for Texture Description

F. R. de Siqueira, W. R. Schwartz, H. Pedrini
Journal PaperElsevier Neurocomputing, Volume 120, Pages 336-345, 2013

Abstract

Texture information plays an important role in image analysis. Although several descriptors have been proposed to extract and analyze texture, the development of automatic systems for image interpretation and object recognition is a difficult task due to the complex aspects of texture. Scale is an important information in texture analysis, since a same texture can be perceived as different texture patterns at distinct scales. Gray level co-occurrence matrices (GLCM) have been proved to be an effective texture descriptor. This paper presents a novel strategy for extending the GLCM to multiple scales through two different approaches, a Gaussian scale-space representation, which is constructed by smoothing the image with larger and larger low-pass filters producing a set of smoothed versions of the original image, and an image pyramid, which is defined by sampling the image both in space and scale. The performance of the proposed approach is evaluated by applying the multi-scale descriptor on five benchmark texture data sets and the results are compared to other well-known texture operators, including the original GLCM, that even though faster than the proposed method, is significantly outperformed in accuracy.

Face Identification Using Large Feature Sets

W. R. Schwartz, H. Guo, J. Choi, L. S. Davis
Journal PaperIEEE Transactions on Image Processing, Volume 21, Number 4, Pages 2245-2255, 2012

Abstract

With the goal of matching unknown faces against a gallery of known people, the face identification task has been studied for several decades. There are very accurate techniques to perform face identification in controlled environments, especially when large numbers of samples are available for each face. However, face identification under uncontrolled environments or with a lack of training data is still an unsolved problem. We employ a large and rich set of feature descriptors (with more than 70,000 descriptors) for face identification using Partial Least Squares (PLS) to perform multi-channel feature weighting. Then, we extend the method to a tree-based discriminative structure to reduce the time required to evaluate probe samples. The method is evaluated on FERET and FRGC datasets. Experiments show that our identification method outperforms current state-of-the-art results, especially for identifying faces acquired across varying conditions.

Person-Specific Subspace Analysis for Unconstrained Familiar Face Identification

G. Chiachia, N. Pinto, W. R. Schwartz, A. Rocha, A. X. Falcao, D. Cox
Conference PaperBritish Machine Vision Conference, Pages 1-8, 2012

Abstract

While significant strides have been made in the recognition of faces under controlled viewing conditions, face recognition "in the wild" remains a challenging unsolved problem. Interestingly, while humans are generally excellent at identifying familiar individuals under such conditions, their performance is significantly worse with unfamiliar individuals and groups, leading to the idea that brain may have enhanced or specialized representations of familiar individuals. Inspired by these observations, we explored the use of different subspace analysis techniques applied to a number of underlying visual representations, to generate person-specific subspaces of "familiar" individuals for face identification. In particular, we introduce a person-specific application of partial least squares (PS-PLS) to generate per-individual subspaces, and show that operating in these subspaces yields state of the art performance on the challenging PubFig83 familiar face identification benchmark. The results underscore the potential importance of incorporating a notion of familiarity into face recognition systems.

A Novel Feature Descriptor Based on the Shearlet Transform

W. R. Schwartz, R. D. da Silva, L. S. Davis, H. Pedrini
Conference PaperIEEE International Conference on Image Processing, Pages 1033-1036, 2011

Abstract

Problems such as image classification, object detection and recognition rely on low-level feature descriptors to represent visual information. Several feature extraction methods have been proposed, including the Histograms of Oriented Gradients (HOG), which captures edge information by analyzing the distribution of intensity gradients and their directions. In addition to directions, the analysis of edge at different scales provides valuable information. Shearlet transforms provide a general framework for analyzing and representing data with anisotropic information at multiple scales. As a consequence, signal singularities, such as edges, can be precisely detected and located in images. Based on the idea of employing histograms to estimate the distribution of edge orientations and on the accurate multi-scale analysis provided by shearlet transforms, we propose a feature descriptor called Histograms of Shearlet Coefficients (HSC). Experimental results comparing HOG with HSC show that HSC provides significantly better results for the problems of texture classification and face identification.

Face Verification using Large Feature Sets and One Shot Similarity

H. Guo, W. R. Schwartz, L. S. Davis
Conference PaperInternational Joint Conference on Biometrics, Pages 1-8, 2011

Abstract

We present a method for face verification that combines Partial Least Squares (PLS) and the One-Shot similarity model. First, a large feature set combining shape, texture and color information is used to describe a face. Then PLS is applied to reduce the dimensionality of the feature set with multi-channel feature weighting. This provides a discriminative facial descriptor. PLS regression is used to compute the similarity score of an image pair by One-Shot learning. Given two feature vector representing face images, the One-Shot algorithm learns discriminative models exclusively for the vectors being compared. A small set of unlabeled images, not containing images belonging to the people being compared, is used as a reference (negative) set. The approach is evaluated on the Labeled Face in the Wild (LFW) benchmark and shows very comparable results to the state-of-the-art methods (achieving 86.12% classification accuracy) while maintaining simplicity and good generalization ability.

Human Detection Using Partial Least Squares Analysis

W. R. Schwartz, A. Kembhavi, D. Harwood, L. S. Davis
Conference PaperIEEE International Conference on Computer Vision, Pages 24-31, 2009

Abstract

Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.

Learning Discriminative Appearance-Based Models Using Partial Least Squares

W. R. Schwartz, L. S. Davis
Conference PaperBrazilian Symposium on Computer Graphics and Image Processing, Pages 1-8, 2009

Abstract

Appearance information is essential for applications such as tracking and people recognition. One of the main problems of using appearance-based discriminative models is the ambiguities among classes when the number of persons being considered increases. To reduce the amount of ambiguity, we propose the use of a rich set of feature descriptors based on color, textures and edges. Another issue regarding appearance modeling is the limited number of training samples available for each appearance. The discriminative models are created using a powerful statistical tool called Partial Least Squares (PLS), responsible for weighting the features according to their discriminative power for each different appearance. The experimental results, based on appearance-based person recognition, demonstrate that the use of an enriched feature set analyzed by PLS reduces the ambiguity among different appearances and provides higher recognition rates when compared to other machine learning techniques.

Análise de Imagens Digitais: Princípios, Algoritmos e Aplicações

Hélio Pedrini, William Robson Schwartz
Book In Portuguese | Thomson Learning | 2008 | ISBN-13: 978-85-221-0595-3
image

Indicado como um dos 10 finalistas ao Prêmio Jabuti 2008 na categoria Ciências Exatas, Tecnologia e Informática

Este livro aborda, de maneira abrangente, os principais temas relacionados às áreas de processamento e análise de imagens, buscando fundamentá-los sob o ponto de vista matemático e ilustrá-los por meio de vários exemplos para auxiliar a compreensão de seus aspectos teóricos e práticos. A seleção dos tópicos visa contemplar os assuntos mais relevantes às áreas sob investigação, destacando conceitos e técnicas relacionados aos processos de aquisição, realce, segmentação, representação, compressão, classificação e registro de imagens.

Todos os capítulos contêm exercícios, uma grande quantidade de imagens para ilustrar as técnicas descritas, exemplos, algoritmos e referências bibliográficas para facilitar o entendimento dos tópicos abordados. Os algoritmos são descritos em pseudocódigo, de modo a torná-los mais legíveis para os leitores com pouca experiência em programação.



Complete List of Publications