Action Recognition using Neural Networks Presentation

using HOF-LBP Flow patterns

Binu M Nair, Vijayan K Asari

distance.

Which is invariant to sequence length normalization

Can classify human actions from 10-15 frames (for real time operation)

To account for variation in speed of an action

Define and extract suitable motion descriptors based on the optical flow at each

frame

Using the extracted motion descriptors, define action manifolds for each class.

Contains variations of motion with respect to the sequence

Proposed Methodology

1.

Motion Representation using Histogram of Oriented Flow and Local Binary Flow patterns(HOFLBP).

2.

3.

Motion descriptor computed from optical flow for each frame of the video sequence

Computing an action manifold for each action class using Principal Component Analysis

Flow Patterns

Flow Patterns

Gives information about the extent of motion on a local scale and the direction of

motion

Algorithm

Compute Optical Flow < , > between consecutive frames at location (, )

Compute the magnitude and direction images from optical flow.

Divide them into blocks

Histogram of flow: weighted histogram of the flow direction with the weights being the

corresponding magnitude.

These are local distributions which change during the course of an action

sequence.

Patterns

To extract relationship between the flow vectors in different regions of the body

This textural context can be extracted by using the Local Binary Pattern

encoding on optical flow magnitude and direction.

2 ( )

, =

=1

5 6 7

4 0

3 2 1

A sampling grid of (P,R) = (16,2) where P refers to the number of neighbors and R

refers to the radius of the neighborhood.

The concatenation of HOF and LBP constitutes the action feature set

HOF (5,5)

+

LBP(16,2)

LBP(16,2)

Action Feature

PCA

Aim is to perform regression analysis on the set of action features

Action features will be considered as the regressors/input variables to a

Frame k

regression function.

Selection of the response/output variable should

dim 2

Be invariant to the time : selecting time will not be the solution

Frame 1

dim 1

correspond to an action manifold(Reduced posture space).

The frames of an action sequence is then considered as points on a particular manifold.

Prinicipal Component Analysis or Empirical Orthogonal Function Analysis

Time series data is represented as a linear combination of time-independent orthogonal

basis functions(Eigen vectors) with time varying amplitude(Eigen coefficients).

PCA for action class

EOF Analysis

Let = 1 2 .

. ( ) ; ;

=1

XK(m)xD

Eigen

Vectors( )

PCA

Coefficients

( )

Extending this to our motion feature set = [1 , 2 .

] of the action

We get time independent basis functions which are Eigen vectors V = [1 , 2 , . d ]

We get time dependent coefficients = [1 , 2 .

] and

GRNN

Generalized Regression Neural Networks

Used to learn the functional mapping between and for an action class .

Based on the radial basis function network

Faster training scheme which is one-pass algorithm

The number of input nodes depends on number of training samples.

K-Mean clustering is used before training so to reduce training sample size

= { : 1

= { : 1 ()

()

=

. (

)

( )

Generalized Regression Neural Networks

If there are () clusters from training pairs , ,

=

=

2

,

()

=1 , . exp( 2 2 )

; ,

2

() ,

=1 2 2

= ,

1

2

1,

exp( 2 )

2

1,

2,

3

1

2

(),

exp(

)

2 2

Algorithm (Testing)

Compute HOF-LBP motion feature for each frame of test sequence(partial 15 frames or full

60-80 frames)

Project the test features on Eigen basis for each action class

Estimate the projections of each action by applying the feature set onto the trained GRNN

model

The model which gives the smallest difference between the eigen space projections and the GRNN

estimations is the correct class.

individuals)

Testing strategy:- Leave 9 sequences out of training

a1-bend

a2-jump p

a3-jjack

a1

a4-jump f

a5-run

a6-side

a1

100

a2

a3

a4

a7-wave1

a8-skip

a9-wave2

a5

a6

a7

a2

a3

75

22

a4

a5

a6

a7

a9

a10

a9

a10

100

88

12

93

5

78

21

100

a8

a10-walk

a8

100

1

99

100

With bag

Legs

Occluded

With dog

Normal

Walk

Knees Up

With

Briefcase

Limping

With Pole

Moonwalk

With Skirt

Test Seq

1st Best

2nd Best

Swinging a

bag

3.094 3.9390

Carrying a

briefcase

2.170 3.6418

a dog

2.338 3.8249

Knees Up

3.270 4.0910

2.922 3.8217

2.132 3.6633

Occluded

Legs

2.594 2.6249

2.624 3.6338

a pole

2.945 3.8801

skirt

2.159 3.5401

Median to all

actions

Test Seq

1st Best

2nd Best

Dir. 0

Walk

1.7606

Skip

2.3435

3.6550

Dir. 9

Walk

1.6975

Skip

2.3138

3.6286

Dir. 18

Walk

1.7342

Skip

2.2600

3.6066

Dir. 27

Walk

1.7314

Skip

2.3225

3.5359

Dir. 36

Walk

1.7721

Skip

2.3296

3.5050

Dir. 45

Walk

1.7750

Skip

2.2099

3.4217

Dir. 54

Walk

1.7796

Skip

2.1169

3.3996

Dir. 63

Walk

1.9683

Skip

2.3181

3.2095

Dir. 72

Walk

2.2900

Skip

2.4930

3.3460

Dir. 81

Side

2.6917

Side

2.8095

3.7771

Conclusions/Inferences

Motion Information is used.

Occurs between at most two actions.

Only an approximate mask is required

Can be used in a higher level activity recognition system where the scores

for the primitive actions is available.

Thank You

Questions?

