Building 3D models of objects, faces and humans from a single and multiple cameras is a challenging problem. Over the years, we have developed methods based on shape from shading, stereo and structure from motion. One of the applications we have worked on is referred to as markerless motion capture where several calibrated and synchronized cameras are used for building accurate 3D models of humans.  More recently, we have used a generative adversarial model for building 3D mesh models of humans. These models can be used for AR/VR applications, diagnosis of movement-related disorders and face recognition under illumination and pose variations.

A.Ghosh and R. Chellappa, “Single-Shot 3D Mesh Estimation via Adversarial Domain Adaptation – Learning Directly from Synthetic Data”,SN Computer Science 1(1): 25:1-25:21, 2020.

Read More

Despite the power of deep architectures, localizing small activities in large video frames with variable durations is still an ongoing challenge. The underlying activities are defined by movements of people and vehicles across video frames, or their interactions with several objects. Major challenges in this domain are scarcity of activities in large videos, detecting and tracking small objects, and identifying types of activities with limited supervision. On the other hand, high performance systems require large amounts of computation and therefore create performance bottlenecks in many hardware platforms, which is another challenge in this domain.


  1. J. Gleason, R. Ranjan, S. Schwarcz, C. D. Castillo, J. C. Chen and R. Chellappa, “A proposal-based solution to spatio-temporal action detection in untrimmed videos”, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
  2. J. Gleason, C. D. Castillo, R. Chellappa, “Real-time detection of activities in untrimmed videos”, Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.


Read More

This multicenter collaboration will focus on the use of artificial intelligence to improve long-term health, independence for older people. More details here.

Read More

Recent years have witnessed the great success of deep neural networks (DNNs) in various fields of artificial intelligence (AI) including computer vision, speech, and robot control. Despite their superior performance, DNNs are shown to be vulnerable to adversarial attacks that add, often imperceptible, manipulations to inputs to mislead the models. This poses a huge challenge in security critical applications such as autonomous driving and medicine. We develop principled algorithms for defending AI systems against adversarial attacks. Leveraging the representative power of Generative Adversarial Networks (GANs), we proposed DefenseGAN and InvGAN, which project input images into the generator manifold to remove adversarial perturbations. We continue to develop novel defending algorithms by exploiting manifold information and study the characteristics of adversarial attacks as well as their effects in real-world applications.

Relevant Publications:

W. Lin, Y. Balaji, P. Samangouei, and R. Chellappa, “Invert and Defend: Model-based Approximate Inversion of Generative Adversarial Networks for Secure Inference”, arXiv preprint arXiv:1911.10291.

P. Samangouei, M. Kabkab and R. Chellappa, “Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models”, ICLR, Vancouver, Canada, April 2018.

Read More

Pose has been shown to be a discriminative cue for inferring activities. The project aims at improving action recognition using pose as input. Instead of relying on 3D skeletons from a depth camera, our approach takes in heatmaps from an off-the-shelf pose extractor enabling application on standard action datasets. Our current model improves over the state of the art pose based models on JHMDB, HMDB and Charades through enhanced temporal features, effective data augmentations and improved modeling of joint motion information by focusing on most discriminative joints.

Read More

We have developed an end-to-end system for unconstrained face verification and search. Our system allows us to build robust, succinct face representations of many faces of the same identity that are amenable to accurate and efficient verification, search, clustering and indexing. Our system has been tested extensively both with public and sequestered datasets and exhibits state of the art performance. In building this system we have made advances in face detection, keypoint detection and loss functions for the training of deep convolutional neural networks.


  1. Rajeev Ranjan, Swami Sankaranarayanan, Ankan Bansal, Navaneeth Bodla, Jun-Cheng Chen, Vishal M Patel, Carlos D Castillo and Rama Chellappa. “Deep learning for understanding faces: Machines may be just as good, or better, than humans”, Journal IEEE Signal Processing Magazine, 2018
  2. Rajeev Ranjan, Swami Sankaranarayanan, Carlos D. Castillo and Rama Chellappa. “An all-in-one convolutional neural network for face analysis.” 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2017.
  3. Kumar, Amit, Azadeh Alavi, and Rama Chellappa. “Kepler: Keypoint and pose estimation of unconstrained faces by learning efficient h-cnn regressors.” 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017.
  4. Ranjan, Rajeev, Vishal M. Patel, and Rama Chellappa. “A deep pyramid deformable part model for face detection.” 7th international conference on biometrics theory, applications and systems (BTAS). IEEE, 2015.

Read More

In this project we aim to extract highly discriminative features from a vehicle image that can naturally facilitate tracking vehicles in the traffic cameras network.  Different from prior methods that mainly use license plate to distinguish vehicles, we only rely on visual cues that can be leveraged to re-identify vehicles. The extracted feature vector has much lower dimensionality compared to the original vehicle image and can serve as a representation that maintains critical attributes needed to represent a certain vehicle identity. Attributes such as, vehicle’s model, trim, color, design of head/tail lights, grill, bumpers and wheels. However, this task is challenging due a number of factors; Variations in vehicle’s orientation, illumination and occlusion can significantly deteriorate the quality of feature extraction.

In this project we design Deep Convolutional Neural Network (DCNN) models that not only consider the entire image of the given vehicle, but also assert attention on local regions of vehicles that carry discriminating information regarding the vehicle identity. This particularly alleviates the issue of orientation and occlusion, as these models learn to associate vehicles not only by focusing of the entire vehicle image but also through local regions that might be visible during partial occlusion and in overlapping orientations.

  • Adaptive Attention for Vehicle Re-Identification

Adaptive Attention for Vehicle Re-Identification (AAVER) localizes the vehicle’s key-points and adaptively selects most prominent key-points based on the vehicle’s orientation. AAVER then extracts both global and local information in the vicinity of selected key-points and encodes it into a single feature vector.

  1. Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S.S., Chen, J.C. and Chellappa, R., 2019. A dual-path model with adaptive attention for vehicle re-identification. In Proceedings of the IEEE International Conference on Computer Vision (pp. 6132-6141).
  2. Khorramshahi, P., Peri, N., Kumar, A., Shah, A., & Chellappa, R. (2019, June). Attention Driven Vehicle Re-identification and Unsupervised Anomaly Detection for Traffic Understanding. In CVPR Workshops (pp. 239-246).


  • Self-Supervised Attention for Vehicle Re-Identification

Self-Supervised Attention for Vehicle Re-Identification (SAVER) bypasses the requirment of having extra annotations for vehicles’ key-points and part’s bounding boxes to train specialized detectors. Instead SAVER generates a coarse version of the vehicle image that is free from all the details needed for successful re-identification. Consequently, SAVER produces the sparse residual image that mainly contains the details of the vehicle and is used to enhance and highlight these details in the original image via a trainable convex combination. By adopting latest practices in deep feature extraction, SAVER yields highly discriminating features.

  1. Khorramshahi, N. Peri, J.C. Chen and R. Chellappa, “The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification”, Proccedings of European Conference on Computer Vision, Edinburgh, August 2020.


  • Excited Vehicle Re-Identification

Excited Vehicle Re-Identification (EVER), benefits from the idea of residual generation developed in SAVER to excite the intermediate feature maps within a single path DCNN only during the training phase. This helps the DCNN to learn to better focus on the critical regions of the vehicle that can help in the re-identification process. Since computing residuals and excitation of intermediate feature maps only occur in the training phase, the inference of EVER is super-fast which makes it appealing for real-time applications.

  1. Peri, N., Khorramshahi, P., Saketh Rambhatla, S., Shenoy, V., Rawat, S., Chen, J. C., & Chellappa, R. (2020). Towards real-time systems for vehicle re-identification, multi-camera tracking, and anomaly detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(pp. 622-623).


Read More