Structural Sparse Tracking

Robust Structural Sparse Tracking

Tianzhu Zhang Changsheng Xu Ming-Hsuan Yang

Abstract

Sparse representations have been applied to visual tracking by finding the best candidate region with minimal reconstruction error based on a set of target templates. However, most existing sparse trackers only consider holistic or local representations and do not make full use of the intrinsic structure among and inside target candidate regions, thereby making them less effective when similar objects appear at close proximity or under occlusion. In this paper, we propose a novel structural sparse representation, which not only exploits the intrinsic relationships among target candidate regions and local patches to learn their representations jointly, but also preserves the spatial structure among the local patches inside each target candidate region. For robust visual tracking, we take outliers resulting from occlusion and noise into account when searching for the best target region. Constructed within a Bayesian filtering framework, we show that the proposed algorithm accommodates most existing sparse trackers with respective merits. The formulated problem can be efficiently solved using an accelerated proximal gradient method that yields a sequence of closed form updates. Qualitative and quantitative evaluations on challenging benchmark datasets demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.

Overview

Global sparse appearance model: Figure 1(a) shows the global sparse appearance model [23, 25, 26, 27, 28,33]. These trackers adopt the holistic representation of a target as the appearance model and tracking is carried out by solving $\ell_{1}$ minimization problems. As a result, the target candidate $x_{i}$ is represented by a sparse number of elements in $T$ , and these methods are less effective in handling heavy occlusions.

Local sparse appearance model: Figure 1(b) shows the local sparse appearance model [24, 30]. These trackers represent each local patch inside one possible target candidate $x_{i}$ by a sparse linear combination of the local patches in $T$ . Note that, the local patches inside the target candidate $x_{i}$ may be sparsely represented by the corresponding local patches inside different dictionary templates. Although this model addresses some issues of global sparse appearance models, such tracking algorithms [24, 30] do not consider the spatial layout structure among the local patches inside each target candidate or the correlations among the local patches from all target candidates.

Joint sparse appearance model: Figure 1(c) shows the joint sparse appearance model [29, 31, 32]. These trackers exploit the intrinsic relationship among particles $X$ to learn their sparse representations jointly. The joint sparsity constraints encourage all particle representations to be jointly sparse and share the same (few) dictionary templates that reliably represent them. However, such models still use the holistic representations to describe object appearance

Structural sparse appearance model: Figure 1(d) shows the proposed structural sparse appearance model incorporates the above three models together. Our model exploits the intrinsic relationship among particles $X$ and their local patches to learn their sparse representations jointly. In addition, our method also preserves the spatial layout structure among the local patches inside each target candidate, which is ignored by the above three models [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Using our model, all particles $X$ and their local patches are represented with joint sparsity, i.e., only a few (but the same) dictionary templates are used to represent all the particles and their local patches at each frame. Note that, the local patches inside all particles $X$ are represented with joint sparsity by the corresponding local patches inside the same dictionary templates used to represent $X$ .

Figure 1 - Sparse representation based trackers [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. Given an image with the $n$ sampled particles $X=[x_1,\ldots,x_i,\ldots,x_n]$ and the dictionary templates $T$ . Sparse representations of the $n$ sampled particles are learned for visual tracking. These tracking methods are grouped based on their sparse appearance models.

Our Approach

Structural Sparse Tracking: Based on the structural sparse appearance model, we propose a computationally efficient structural sparse tracking (SST) algorithm within the particle filter framework. All particles and their local patches are represented via the proposed structural sparse appearance model, and the next target state is the particle that it and its local patches have the smallest reconstruction error with target dictionary templates and their corresponding patches. Unlike previous methods, the proposed SST algorithm not only exploits the intrinsic relationships among particles and their local patches to learn their sparse representations jointly, but also preserves the spatial layout structure among the local patches inside each target candidate region.

Robust Structural Sparse Tracking: The SST algorithm assumes that the same local patches of all particles are expected to be similar, and the local patches of a particle should be represented by the local patches of the same target templates. This assumption generally does not usually hold in visual tracking applications, since outlier patches often exist. For example, a small number of particles sampled far away from the majority of particles are likely to have little overlap with other particles and thus considered as outliers. Furthermore, due to occlusions or noises, some local patches of a particle may select different target templates for representation. Based on the fact that most of the particles are relevant and outliers often exist, we improve the SST and introduce a robust structural sparse tracking (RSST) algorithm to capture the underlying relationships shared by all local patches and outliers due to occlusion and noise.

Related Publications

"Robust Structural Sparse Tracking"

Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang.
TPAMI
[All Files] [Code] [PCA Model] [VGG Model]

Video Tracking Results

We show tracking results of 25 challenging videos.

bicycle	biker	car4	carchase	car11
david indoor	faceocc	faceocc2	fernando	girl
jumping	OneLeaveShopReenter1cor	OneLeaveShopReenter2cor	OneShopOneWait2cor	PETS01D1Human1
shaking	singer1	sphere	sunshade	surfer
surfing	trellis70	tud crossing	tunnel	football