You are on page 1of 39

OBJECT TRACKING IN VIDEO

Master Thesis Project of Andrea Ferri


Supervised by Jordi Torres and Xavier Giro I Nieto
20 th October 2016, UPC, Barcelona
Summary
1.Project Overview;
2.Methodology;
3.Project Development;
4.Solved Problems;
5.Running Example;
6.Evaluations;
7.Conclusions;
8.References.
2
Project Overview: Goals
Build a working Model for Object Tracking in Video

Object Detection from Video

3
Can
be the fastest adaptable environment
for Machine Learning,
to implement Research and Development
into a different infrastructure architecture
with a great improvement perspective?
4
Goals:
is an Open Source Software
Library for Machine Intelligence
Based on:
Work in:
Powered by:
5
Goals:

6
Goals: Environment
NEW (November 9, 2015)
Less than 1 year
Great Potentials of Improvement
Great Supported Community
Still Few Available Components
I exploit the usable projects
at the Best! 7
Goals:
Image Database organized according
to the WordNet hierarchy
Used for the:
ILSVRC

8
VID Challenge:
30 Moving Object Classes;
Specific Datasets Provided:
- Train 3862 Snippets;
- Validation 555 Snippets;
- Test 973 Snippets.
9
Model Neural Network

System of programs and data structures which


approximates the operation of the human
brain.

10
Model Neural Network
INPUT

INPUT OUTPUT

INPUT
Output
Input Neural 1Neural N Neural Last Neural Neural
Layer Layer Layer Layer Layer
11
Back Propagation

Method for training artificial neural networks


used with an optimization method

12
Back Propagation
ERROR BACK PROPAGATION
INPUT

weights weights
INPUT
ERROR

weights
INPUT
weights

INPUT LAYER HIDDEN LAYER/S


13
Goals: Model for Object Tracking in Video

t=0 t=1 t=2 t=3

Class: Dog conf 0.78 Class: Dog conf 0.59 Class: Dog conf 0.34 No Objects

For each frame:


I. Detect possible objects;
II. Identify possible detections.
Track them in time and space.
14
Project Overview: Architecture
Model for Object Tracking in Video
Per-frame Analysis:
I. Detect possible objects; Still-Image
II. Identify possible detections. Approach

Post-Processing
In Time & Space Analysis
Approach
15
Modular Structure
GENERAL POST IMAGE
OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)

AIRPLANES

16
Methodology
Time Constraints of 5 month
Starting from Scratch

The Power
Fast of
Learn
Fast
the Develop
Community
17
Project Development
GENERAL TensorBox
OBJECT (GitHub Repo)
DETECTOR
Still-Image
Approach IMAGE Inception
CLASSIFIER
(OBJECT CLC) (GitHub Repo)
Post
Processing POST Python
PROCESSING
Implementation
Approach TRACKER
18
Still Image Analysis
TensorBox (GitHub Repo)
OverFeat Model (Pierre Sermanet et al.)
GENERAL
OBJECT
DETECTOR

Unbalanced
19
Trained as Single Class on the 30 VID Classes

Lots of
Peaks

20
Trained as Single Class on the 30 VID Classes

Regular
Curve

21
Inception(GitHub Repo)
Inception V3 Model
(Christian Szegedy et al.)
IMAGE
CLASSIFIER
(OBJECT CLC)

Well Balanced

22
Trained as Multi Class on the 30 VID Classes

Really Smooth

23
Post Processing Analysis
Python Implementation
Not a Trainable Model
POST
Based on simplification a of
PROCESSING The Slow and Steady Features Analysis
TRACKER

- Bounding Boxes
- Object movement
24
Solved Problems
Environment Installation;
Libraries Setting;
Components Training;
Components Combination;
Post Processing Implementation;
Dataset usage. 25
Results: VID ImageNET Challenge
Number of ob ject
Team name Entry description mAP
categories won
cascaded region
NUIST 10 0.808292
regression + tracking
cascaded region
NUIST 10 0.803154
regression + tracking
4-model ensemble
CUVideo (Multi-Context .. & 9 0.767981
Motion-Guided .. )
Trimps-Soushen Ensemble 2 1 0.709651

With Provided Data


26
Results: VID ImageNET Challenge
Number of object
Team name Entry description mAP
categories won
cascaded region
NUIST 17 0.79593
regression + tracking
cascaded region
NUIST 5 0.781144
regression + tracking
Trimps-Soushen Ensemble 6 5 0.720704
An ensemble for
ITLab-Inha detection, MCMOT for 3 0.731471
tracking

With Additional Data


27
Results: VID ImageNET Challenge
Team name Entry description mAP
CUVideo 4-model ensemble 0.558557

Tracking + With Provided Data

Description of
Team name Entry description mAP
outside data used
cascaded region proposal network is
NUIST regression + fine-tuned from 0.583898
tracking COCO

Tracking + With Additional Data


28
Results: Validation Developed Model
0.002263 mAP
Class mAP Class mAP Class mAP
airplane 0 elephant 0 red panda 0
antelope 0 fox 0 sheep 0.0329
bear 0 giant panda 0 snake 0
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus 0 lion 0 train 0
car 0.0002 lizard 0 turtle 0.0615
cattle 0 monkey 0 watercraft 0.0001
dog 0.0006 motorcycle 0.0219 whale 0
domestic cat 0.1492 rabbit 0 zebra 0
29
Evaluations: Developed Model
Modular Structure
GENERAL POST IMAGE
OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)

LOW STARTING NOT ENOUGH


ACCURACY TO COMPENSATE
30
LOC Validation Results for the G.O.D.
Class mAP Class mAP Class mAP
airplane 0 elephant -0.0021 red panda 0
antelope 0 fox +0.0843 sheep -0
bear 0 giant panda 0 snake +0.0214
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus +0.0003 lion 0 train +0.0011
car +0.0019 lizard +0.0001 turtle +0.0991
cattle 0 monkey 0 watercraft +0.0003
dog -0 motorcycle -0 whale +0.0002
domestic cat -0.0006 rabbit +0.0003 zebra 0

Best Overlap 31
LOC Validation Results for the G.O.D.
Class mAP Class mAP Class mAP
airplane 0 elephant +0.4077 red panda 0
antelope 0 fox 0 sheep +0.0789
bear 0 giant panda 0 snake -0
bicycle 0 hamster 0 squirrel 0
bird 0 horse 0 tiger 0
bus +0.0014 lion +0.0007 train +0.0056
car +0.0091 lizard 0 turtle +0.4935
cattle +0.0002 monkey 0 watercraft +0.0013
dog -0.0001 motorcycle -0 whale +0.0010
domestic cat -0.0103 rabbit 0 zebra 0

Best Intersection Over Union


32
Evaluations: Possible Improvements
Change Modules Order

GENERAL POST IMAGE


OBJECT PROCESSING CLASSIFIER
DETECTOR TRACKER (OBJECT CLC)

IDENTIFY 30 SPECIFIC TRAINABLE


OBJECTS MODELS MODEL
33
Initial Question
Can
be the fastest adaptable environment
for Machine Learning,
to implement Research and Development,
into a different infrastructure architecture
with a great improvement perspective?
34
Conclusions
I Started without any clue about
Deep Learning
And Visual Recognition Topic .
I Finished implementing
a working model
for Object Tracking in Video.
35
Conclusions

Yes! I think
demonstrate to be adaptable and
with a great improvement
perspectives.
36
THANKS !
37
References
Thesis Project GitHub;
Tensorbox GitHub;
YOLO GitHub;
Inception GitHub;
TensorFlow.

Andrea Ferri: hause.blackwarhol@gmail.com


38
Questions & Answers

39