|Research at the Statistical Visual Computing Lab covers a wide range of subjects in the areas of computer vision, image processing, machine learning, and multimedia. Below are some descriptions of on-going or past projects. Please check the publications page too, as we currently do not have a web page for some of the projects.|
|Architecture tuning for multi-domain recognition
Real-world applications of object recognition often require efficiently solving multiple tasks in a single platform. To address this problem, we propose a transfer learning procedure in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new architectures for each task.
|High-Quality Object Detection
Cascade R-CNN is a novel and very simple architecture for high-quality object detection. The detection results have tight coverage with the ground-truth objects. The Cascade R-CNN has achieved the state-of-the-art performance on many popular object detection datasets, including COCO, and it is widely used by the winning teams in many object detection challenges.
|Universal Object Detection
We introduced a new universal object detector, which can work for a large number of different domains, from everyday objects and face to medical lesion. The computations and parameters are shared across domains. It can outperform the domain-specific object detector sometimes.
|Low-precision Neural Networks
HWGQ is a new quantization technique for low-precision neural networks, in which both weights and activations are quantized to low bit-width. This will substantially reduce the model size and computation by about 32 times, enabling neural networks to run on non-GPU devices, e.g. CPU or FPGA. The HWGQ-Net also achieved very close performance to full-precision baselines for almost all popular networks, including AlexNet, VGG-Net, ResNet, GoogleNet.
|Multi-Scale Object Detection
MS-CNN is an effcient and effective multi-scale object detection architecture, which has achieve state-of-the-art results on multiple detection tasks, including vehicle, pedestrian, face, etc., at real-time running speeds.
|Objects Obtained With fLight (OOWL)
The OOWL dataset is a real-world multiview dataset collected with drones, enabling flexibility and scalability. Currently OOWL contains 120,000 images of 500 objects and is the largest "in the lab" multiview image dataset available when both number of classes and objects per class are considered. OOWL is designed to have large class overlap with ImageNet and ModelNet, and has multiple domains.
|Self-Supervised Generation of Spatial Audio for 360° Video
We introduce an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere. Our system consists of end-to-end trainable neural networks that separate individual sound sources and localize them on the viewing sphere, conditioned on multi-modal analysis of audio and 360° video frames. We introduce several datasets consisting of 360° videos with spatial audio.
|CNN based Semantic Transfer
We investigate the feasibility of Across domain transfer in vision, specifically in the case where source domain is that of objects and target domain is holistic scenes. We approach the problem using a the description of a scene as a Bag-of-Semantics (BoS). Sophisticated non-linear embeddings of a scene BoS are proposed for holistic inference.
|Attribute Guided Data Augmentation
We propose a new approach to data synthesis called attribute guided augmentation. The underlying principle is to generate new examples from a given object image, by hallucinating changes in its attributes such as 3D pose and scene depth. The ability to synthsize non-trivial variations of data is found to be beneficial especially in few-shot learning scenarios.
|Complex Video Event Understantding
We develop algorithms for complex events understanding by exploiting the video temporal strucutre modeling and semantic attribute representation, which aims to enable intelligent anaylsis, encoding and retrieval of informative and semantic content from open-source video data (e.g., those typical sequences on YouTube).
We investigate the design of vision-based control algorithms for unmanned aerial vehicles (UAVs), so as to enable a UAV to autonomously follow a person. A new vision-based control architecture is proposed with the goals of 1) robustly following the user and 2) implementing following behaviors programmed by manipulation of visual patterns.
|Regularization on (Content-Based) Image Retrieval
Representation of image in semantic spaces is at the very core of many problems in Computer Vision. We provide an approach to transfer knowledge from texts associated with images in order to improve the accuracy of image semantic representations. Results are shown in the task of content-based image retrieval in three different datasets.
|Cross-Modal Multimedia Retrieval
The problem of joint modeling text and image components of multimedia documents is studied. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction.
|Compressed-based Saliency Measure : In this work, we propose a simple and effective estimation method for video based on a new compressed-based domain feature.|
|Training Detector Cascade: In this work, the problem of automatic and optimal design of embedded object detector cascades is considered.|
|Pedestrian Detection: The goal of this project is to build pedestrian detectors with low false-positive and high detection rates, which can operate in real-time.|
|Scaling Rapid Object Detection: The goal of this project is to design the algorithms needed to scale real-time object detection to thousands of objects.|
|TaylorBoost: First and Second Order Boosting Algorithms: In this project A new family of boosting algorithms, denoted, TaylorBoost, is proposed.|
|Multiclass Boosting: MCBoost: In this project the problem of multi-class boosting is considered. A new framework, based on multi-dimensional codewords and predictors is introduced.|
|Multi-Resolution Cascade: In this project, the problem of designing a multiclass detector cascade is considered.|
|Boosting Algorithms for Simultaneous Feature Extraction and Selection: In this project the problem of simultaneous feature extraction and selection, for classifier design, is considered.|
|Automated and Distributed Crowd Analytics In this work, we developed a common experimental platform, which connects a distributed camera network with SVCL Analytics, specifically Crowd Counting.|
|Holistic Context Models for Visual Recognition In this work, we investigate an approach to context modeling based on the probability of co-occurrence of objects and scenes. This modeling is quite simple, and builds upon the availability of robust appearance classifiers.|
|Feedforward saliency network with a trainable neuron model We investigate the biological plausibility of statistical inference and learning, tuned to the statistics of natural images. It is shown that a rich family of statistical decision rules, confidence measures and risk estimates can be implemented with the computations to the standard neurophysiological model of V1. This is used to augment object recognition networks with top-down saliency.|
|Amorphous object detection in the wild A discriminant saliency network is applied to the problem of amorphous object detection. Amorphous objects are defined as objects without distinctive edge or shape structure.|
|Panda detection In this project, we utilize a deep learning framework, Caffe, for the purpose of detecting pandas. The reference caffe model is finetuned with images collected from the San Diego zoo panda cam.|
|Anomaly Detection: anomaly detection in crowded scenes using a mixture of dynamic textures representation.|
|Biological Plausibility of Discriminant Tracking: Psychophysics experiments and neurophysiological evidence to demonstrate the biological plausibility of the connections between discriminant center surround saliency and tracking.|
|Discriminant Tracking: a biologically inspired framework for tracking based on discriminant center surround saliency.|
|Semantic Image Representation: A novel image represenation is proposed where a semantic space is defined and images are represented on the semantic space as a posterior probability distribution for a given vocabulary of concepts. Benefits of semantic image represenation are illustrated through design of two visual recognition systems: Query by Semantic Example for the task of image retrieval and Low-dimensional Semantic Space Classification for scene classification.|
|Discriminant Hypothesis for Visual Saliency: a decision-theoretic formulation of visual saliency, its biological plausibility, and applications to computer vision.|
|Bottom-up Saliency and Its Biological Plausibility: biological plausibility of bottom-up saliency by combination of the discriminant hypothesis and center-surround operators.|
|Top-down Discriminant Saliency: learning discriminant salient features for visual recognition.|
|Understanding Video of Crowded Environments : motion segmentation and motion classification in video of crowded environments, such as pedestrian scenes and highway traffic.|
|Dynamic Textures: A family of generative stochastic dynamic texture models for analyzing motion.|
|Background Subtraction: Background subtraction in dynamic scenes.|
|Semantic Image Annotation and Retrieval: automatically labeling images with content-based keywords, and image retrieval via automatic annotations.|
|Pedestrian Crowd Counting: estimate the size of moving crowds in a privacy preserving manner, i.e. without people models or tracking.|
|Classification and Retrieval of Traffic Video: classification of traffic video using a generative probabilistic motion model and probabilistic kernel classifiers.|
|Motion Segmentation: robust segmentation of motion in video.|
|Classifier Loss Function Design: The design and theory of Bayes consistent loss functions and classifiers with applications.|
|Real-Time Object Detection Cascades: Real-time face, car, pedestrian and logo detection in images.|
|Real-Time EEG Surprise Signal Detection: Cost sensitive boosting and real-time detection cascades for EEG surprise signal detection.|
|Cost Sensitive Learning: The design and theory of cost sensitive classifiers with applications.|
|Image compression using Object-based Regions of Interest: learning ROI masks for image and video coding at very low bit-rates|
|Optimal Features for Large-scale Visual Recognition: learning algorithms for feature design that are optimal, in the minimum probability of error sense, and scalable in the number of visual classes.|
|Probabilistic Kernel Classifiers: design of kernels functions between probability densities.|
|Minimum Probability of Error Image Retrieval: optimal search of large image collections with content-based queries.|
|Semantic Image Classification: augmenting retrieval systems with understanding of image semantics.|
|Learning Mixture Hierarchies: learning hierarchical mixture models for efficient classifier design, image indexing, and semantic classification hierarchies.|
|Measuring Image Manifold Distances: image similarity measures that are invariant to spatial transformations.|
|Motion Analysis: motion models and estimation algorithms for segmentation, mosaicking, and layered representations.|
|Modeling the Structure of Video: statistical models of video structure and Bayesian inference procedures for improved parsing and semantic classification.|
Copyright @ 2007 www.svcl.ucsd.edu