Persons-In-Places: a Deep Features Based Approach for Searching a Specific Person in a Specific Location
Abstract
Video retrieval is a challenging task in computer vision, especially with complex queries. In this paper, we consider a new type of complex query which simultaneously covers person and location information. The aim to search a specific person in a specific location. Bag-Of-Visual-Words (BOW) is widely known as an effective model for presenting rich-textured objects and scenes of places. Meanwhile, deep features are powerful for faces. Based on such state-of-the-art approaches, we introduce a framework to leverage BOW model and deep features for person-place video retrieval. First, we propose to use a linear kernel classifier instead of using $L_2$ distance to estimate the similarity of faces, given faces are represented by deep features. Second, scene tracking is employed to deal with the cases face of the query person is not detected. Third, we evaluate several strategies for fusing individual person search and location search results. Experiments were conducted on standard benchmark dataset (TRECVID Instance Search 2016) with more than 300 GB in storage and 464 hours in duration.Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







