3 views

Uploaded by binuq8usa

Region based body joint tracking

- SPIE_EI_9026_2014
- CAS ScienceDirect 2011
- 201304
- SPIE DSS 2013 Corrected
- IEAAIE_LNAI_2012
- 7
- isp rubric
- ISVC LNCS 2014 AcceptedPaper
- Kalman Filtering of Movement Trajectories
- VISAPP_2011
- Smc Ieee 2013
- Thesis Robot Localization and Kalman Filters on Finding Your Position in a Noisy World
- Rr411005 Digital Control Systems
- 10.1.1.15.7061
- 81634
- 46.TechTips_LMDevices
- Línea base
- IUT
- ACCURACY-PRECISION-SIGNIFICANT-DIGITS.pdf
- B Inggeris > Skema BI Kertas 1 Set 2

You are on page 1of 10

Binu M Nair1 , Kimberly D Kendricks2 , Vijayan K Asari1 , and

Ronald F Tuttle3

1

{nairb1,vasari1}@udayton.edu

CHPSA Lab, Central State University, Wilberforce, OH, USA

kkendricks@centralstate.edu

Air Force Institute of Technology, 2950 Hobson Wa, OH, USA

ronald.tuttle@afit.edu

track and estimate continuous joint locations in low resolution imagery

where the estimated trajectories can be analyzed for specific gait signatures. The true transition between the joint states are of a continuous

nature and specifically follows a sinusoidal trajectory. Recent state of art

techniques enables us to estimate pose at each frame from which joint

locations can be deduced. But these pose estimates at low resolution are

often noisy and discrete and hence not suitable for further gait analysis.

Our proposed 2-level region-based tracking scheme gets a good approximation to the true trajectory and obtains finer estimates. Initial joint

locations are deduced from a human pose estimation algorithm and subsequent finer locations are estimated and tracked by a Kalman filter. We

test the algorithm on sequences containing individuals walking outdoors

and evaluate their gait using the estimated joint trajectories.

of oriented gradients, low resolution

Introduction

Most of the research work done in the field of tracking from surveillance videos

has been restricted to detecting and tracking large objects in the scene such as

people in shopping malls, players on a soccer/basketball court, detection and

tracking of cars etc. But, when it comes to tracking body joints in a scene,

it borders on the line of human pose estimation in images and videos. In the

present research community, the human body pose estimation problem is being

tackled in two different scenarios; one which uses the depth information and

the other which uses only the images. The former uses the depth information

from the Kinect (Shotten et al [14]) and mainly suited for indoor applications

such as gaming consoles, human interactive systems etc.. The latter is used in

surveillance applications which uses video feed from multiple CCTV cameras

Nair et al.

(a) Manual annotation provided by point lights (b) Human pose estimation using

software [10]

Articulated Models[15]

monitoring a parking lot or a shopping mall. Some early research for tracking

motion and pose in surveillance videos has been developed where interest points

detected on the human body can be tracked. The trajectories are then modeled

to differentiate between human actions [8]. Recently in an approach proposed by

Huang et al. [7], human body pose is estimated and tracked across the scene using information acquired by a multi-camera system. The human pose estimates

obtained from such algorithms give continuous smooth sinusoidal like trajectories and therefore are deemed useful for gait analysis. However, one limitation is

the requirement of high resolution imagery for accurate estimation of joint trajectories. Therefore, the use of such algorithms on low-resolution videos does not

guarantee joint location estimate suitable for gait analysis and a pre-processing

mechanism should be applied on these noisy discrete estimates. An illustration of

the pose estimates obtained by a proprietary point light software and articulated

part-based models are shown in Figures 1a and 1b.

Related Work

One of the earlier and popular works which does not use the depth information

and uses only a single video camera to track human motion is done by Markus

Kohler [9]. Here, a Kalman Filter is designed to track non-linear human motion

in such a way that non-linearity in motion is considered as motion with constant

velocity and changing acceleration modeled as white noise. In our proposed algorithm, we use a modification of this Kalman filter and the design of the process

noise covariance to track the body joints across the video sequence. Kaniche et

al [8] used the extended Kalman filter to track specific points or corners detected

at every frame of the video sequence for the purpose of gesture recognition.

In recent years, the problem of human body pose estimation has not just

being limited to tracking points or corners or using depth information. One of

the state of art methods for human pose estimation on static images is the

flexible mixture of parts model, proposed by Yang and Ramanan [15]. Instead of

explicitly using variety of oriented body part templates(parameterized by pixel

location and orientation) in a search-based template matching scheme, a family

of affine-warped templates is modeled, each template containing a mixture of

aware algorithm which tracks human body pose in a sequence where the human

body is modeled as a combination of single parts such as the head and neck and

symmetric part pairs such as the shoulders, knees and feet. Here, the important

aspect in this algorithm is that it can differentiate between similar looking parts

such as the left or right leg/arm, thereby giving a suitable estimate of the human

pose. Although these methods show an increased accuracy on datasets such as

the Buffy Dataset [5] and the Image Parse dataset [13], the performance on very

low-resolution imagery is not yet evaluated. Further processing of the human

pose estimates can provide coarse locations of a joint which can form the basis

of many tracking schemes. One such work was done by Xavier et al [3] where

they propose a generalization of the non-maximum suppression post processing

schemes to merge multiple post estimates either in a single frame or in multiple

consecutive frames of a video sequence. We focus on an alternative problem

where we require smooth trajectories of individual joints in low resolution video

scene for realistic and online analysis for gait signatures. The work proposed in

this paper is a alternative and more accurate method to our preliminary model

[10] in body joint tracking where a combination of optical flow and LBP-HOG

descriptors with Kalman filter had been evaluated.

Theory

In this section, we explain the various modules such as the region-based feature

matching and the tracking scheme using the Kalman filter used in the proposed

framework.

3.1

[4] and the Local Binary Patterns (LBP) [11] are used to describe the edge

information and the textural content in a local region respectively. Both can be

very effective descriptors for region-based image matching in low resolution. The

histogram of oriented gradients (HOG) [4] descriptor is a weighted histogram of

the pixels over the edge orientation where the weights are the corresponding

q

edge magnitude. The gradient magnitude and direction are given by G2x + G2y

G

The local binary pattern is an image coding scheme which brings out the

textural features in a region. For representing a joint region and to associate a

joint in successive frames, the texture of the region plays a vital part in addition

to the edge information. The LBP considers a local neighborhood of 88 or 16

16 in a joint region and generates a coded value which represents the underlying

texture in its local region. The LBP operator is defined as

(

P

X

1 z0

p

LBPP,R =

s(gp gc )2

s(z) =

(1)

0 z<0

p=0

Nair et al.

where (P, R) is the number of points around the local neighborhood and its

radius. The textural representation of the joint region will then be the histogram

of these LBP-coded values. For our purpose, we use P = 8 with R = 1 which

reduces to a local region of size 8 8. The matching between two joint regions

represented either by HOG or LBP is done using the Chi-squared metric [11]

in Equation 2 where f1 ,f2 are feature vectors corresponding to a certain joint in

successive frames.

X (f1 (b) f2 (b))2

(2)

2 (f1 , f2 ) =

f1 (b) + f2 (b)

b

3.2

Kalman Filter

The recursive version of the Kalman filter can also be used for tracking purposes and in literature, it has been widely applied for tracking points in video

sequences. In this proposed algorithm, we use the Kalman filter to track a specific body joint across the scene. This is done by setting the state of the process

(which in this case is the human body movement) as the (x, y) coordinates of

the joint along with its velocity (vx , vy ) to get a state vector xk R4 . The

measurement vector zk = [xo , yo ] R2 will be provided either by the coarse

joint location estimates or by the region-based estimate. By approximating the

motion of a joint in a small time interval by a linear function, we can design the

transition matrix A so that the next state is a linear function of the previous

states. As done by Kohler[9], to account for non-constant velocity often associated with accelerating image structures, we use the process noise covariance

matrix Q defined in Equation 3 where a is the acceleration and 4t is the time

step determined by the frame rate of the camera.

2(4t)2

0

34t 0

a2 4t

2(4t)2

34t

0

(3)

Q=

0

6

6 34t

0

34t

6

Proposed Framework

consists of two main stages: a) 2-level region based matching using LBP/HOG

b) tracking of region-based estimates using Kalman filter. Following are the steps

in the proposed algorithm flow :

1: Extract the first frame(time instant t = 1) of the sub-trajectory. Compute

dense optical flow within the foreground region to get the global velocity

estimate (median flow).

2: Initialize the Kalman filter with the coarse joint location and the global

velocity. The state of the tracker for each body joint is then xt = [x, y, vx , vy ]

where (vx , vy ) is the joint velocity which is set to the global flow velocity

estimate. This is considered as the corrected state x

t1 at time t = 1.

t of the Kalman

4:

5:

6:

7:

8:

9:

10:

filter. Using the predicted state x

t , posterior state x

error co-variance Pt , estimate the elliptical region Sreg1 (t) where the joint

location is likely to fall on.

Extract the next frame. Find the region based matching estimate of each

joint between instances t and t 1 formulated as argminpSop (t) 2 (fj , fp )

where fj is the joint descriptor updated in the previous time instant, fp is

the region descriptor computed at the pixel p within the elliptical search

region Sreg1 (t). Also compute the dense optical flow and the global velocity

of the foreground region.

Using this estimate and the coarse joint location estimate, predict the new

elliptical search region Sreg2 (t). A constraint Sreg12 (t) Sreg1 (t) is enforced to prevent the growth of Sreg2 (t). If constraint is satisfied, go to Step

6. Else goto Step 8.

Compute region-based estimate given by argminpSreg2 (t) 2 (fj , fp ). Use this

finer estimate of the joint location as the measurement vector z = [zx , zy ] to

correct the Kalman tracker associated with that particular joint.

Update t t + 1. Set the joint velocity as the global velocity and predict

t and the elliptical search region Sreg1 (t). Go to

Step 4.

Using the coarse joint location estimates as the measurement vector, perform

the correction phase of the filter.

Update t t + 1. Set the joint velocity as the global velocity and predict

t and the elliptical search region Sreg1 (t). Go to

Step 4.

Continue till all the frames of the sequence has been processed.

The proposed tracking scheme has been tested on a private dataset provided

by the Air Force Institute of Technology, Dayton OH. It consists of 12 subjects

walking along a outdoor track across the face of a building is performed twice,

Nair et al.

tracking scheme using LBP descriptors. The numbering of points in

MOTP/MOTA scores refer to the name of the sequences mentioned in the left

figure.

one wearing a loaded vest and other no vest by each subject to get a total

of 24 video sequences. The area of focus is when the subject walks clockwise

around the track and climbs a ramp. We set equal neighborhood sizes of 17 17

for each joint region and set a constant acceleration a = 0.1 pixels/f rame2

for the corresponding Kalman filter. Figure 4 shows sample illustration of the

proposed scheme in certain frames of the sequence. Sample illustrations of the

joint trajectories are also shown in Figure 5 where a comparison is made with four

different schemes. All of the joint trajectories estimated by different schemes for

each joint is smoothened by using a regression based neural network. We see that

the smooth trajectories obtained by the proposed scheme using LBP or HOG

has the closest approximation to the sinusoidal trajectory with subtle variations.

5.1

Its a statistical measure which gives how close the tracked joint locations are

to the coarse estimates of the joint location for each sequence associated with a

n

X

particular subject. This metric [6] is given by d2 (K, Km ) =

(log(i (K, Km ))2

i=1

where K R3 is the co-variance of the tracked points, Km R3 is the covariance matrix of the coarse joint locations, i is the ith Eigen value associated

with |K Km | = 0 and n being the number of Eigen values. The lower the

value, the closer are the tracked points to the coarse joint locations. This measure

does not provide us with the precision of the tracking scheme but it gives an

indication whether the tracked joint trajectory are located within the spatialtemporal neighborhood of the coarse joint trajectory. We see that most of the

joint trajectories obtained from the proposed scheme have very low values. This

shows that the proposed scheme obtains tracked estimates which are close to the

pose estimates obtained from a pose detector.

5.2

The MOTP/MOTA [2] metric is a widely used efficiency measure for multipleobject tracking mechanisms where the MOTP/MOTA gives the precision and

accuracy of the tracker by considering all the detected and tracked objects.

We use an implementation of the CLEAR-MOT [1] to give us the statistical

data such as false positive rate, false negative rate, MOTA and MOTP scores.

Multiple Object Tracking Precision (MOTP) refers to the closeness of a tracked

point location to its true location (given as ground truth). Here, we measure the

closeness by measuring the overlap between the neighborhood region occupied by

the tracked point location and the ground truth. Higher the value of this overlap,

more precise is the estimated location of the point. Multiple Object Tracking

Accuracy (MOTA) gives the accumulated accuracy in terms of the fraction of

the tracked joints matched correctly without any misses or mismatches. We

computed the MOTP, MOTA, false positive rate and false negative rate for each

sequence by setting the threshold T = 0.5 with same acceleration parameter

a = 0.1 and a neighborhood size of 17 17 for each body joint. We also use the

coarse joint location estimates as the ground truth data since no appropriate

ground truth has been provided with this dataset. In Figure 3b, we see that

all of the sequences have moderately high precision of around 75% and a high

accuracy of around 90%. This shows that the proposed tracking scheme is less

noise free and the reduction in precision is due to the slight variation of the

estimated joint locations with respect to the coarse joint location estimates.

Conclusion

scheme incorporated along with a Kalman filter for use in conjunction with the

state of the art human pose estimation algorithms under low-resolution scenarios

for outdoor sequences. The algorithm is a combination of effective region-based

point tracking techniques using HOG or LBP coupled with the predictive capability of the Kalman filter. After applying a post-processing GRNN-based

smoothening scheme, we see that the proposed scheme provides a better approximation of the true sinusoidal trajectory than the schemes using only the pose

estimates through qualitative evaluation. In terms of quantitative analysis, precision and accuracy of the joint tracks obtained from proposed scheme is higher.

Future work will involve analyzing the trajectories obtained with the joints to

determine any characteristics embedded in it for suitable gait signature analysis

for people re-identification or for human action and activity analysis.

Nair et al.

(a) Elliptical search region in frame 1 for (b) Fine estimates of joint location in

frame 2.

frame 2 obtained from tracking scheme

(LBP).

(c) Elliptical search region computed in (d) Finer estimates of joint locations in

frame 3 for frame 4. Here, the shoulder and frame 4 obtained from tracking scheme

the ankle joint trackers are corrected with (LBP).

the coarse location while the other joint

trackers are corrected with the regionbased estimate.

(e) Elliptical search region computed at (f) Tracked joint locations at frame 9

frame 7 for frame 8.

based on the elliptical search regions.

Fig. 4: Illustration of elliptical search regions before tracking and joint location

estimates after tracking. The coarse pose estimates are represented by purple

color in each frame. The search regions and the finer joint estimates are given

as shoulder (blue), elbow (green), wrist (red), waist (cyan), knee (yellow) and

ankle (pink).

Fig. 5: Estimated fine joint trajectories by different schemes for subject 11 wearing a coat in phase A. Color Key : Blue - Coarse joint locations from human

pose estimation, Purple - Coarse joint locations filtered by Kalman filter, Green

- HOG region based tracking, Red - LBP region based tracking

10

Nair et al.

University and is supported by the National Science Foundation grant No:1240734.

We would like to thank the National Signature Program and the Air Force Institute of Technology for the dataset used in this research.

References

1. Bagdanov, A., Del Bimbo, A., Dini, F., Lisanti, G., Masi, I.: Posterity logging of

face imagery for video surveillance. MultiMedia, IEEE 19(4), 4859 (Oct 2012)

2. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance:

The clear mot metrics. J. Image Video Process. 2008, 1:11:10 (Jan 2008)

3. Burgos-Artizzu, X., Hall, D., Perona, P., Dollar, P.: Merging pose estimates across

space and time. In: Proceedings of the British Machine Vision Conference. BMVA

Press (2013)

4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In:

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer

Society Conference on. vol. 1, pp. 886893 vol. 1 (2005)

5. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction

for human pose estimation. In: Computer Vision and Pattern Recognition, 2008.

CVPR 2008. IEEE Conference on. pp. 18 (June 2008)

6. Forstner, W., Moonen, B.: A metric for covariance matrices (1999)

7. Huang, C.H., Boyer, E., Ilic, S.: Robust human body shape and pose tracking. In:

3DV-Conference, 2013 International Conference on. pp. 287294 (2013)

8. Kaaniche, M., Bremond, F.: Tracking hog descriptors for gesture recognition. In:

Advanced Video and Signal Based Surveillance, 2009. AVSS 09. Sixth IEEE International Conference on. pp. 140145 (2009)

9. Kohler, M.: Using the Kalman Filter to Track Human Interactive Motion: Modelling and Initialization of the Kalman Filter for Translational Motion. Forschungsberichte des Fachbereichs Informatik der Universit

at Dortmund, Dekanat Informatik, Univ. (1997)

10. Nair, B.M., Kendricks, K.D., Asari, V.K., Tuttle, R.F.: Optical flow based kalman

filter for body joint prediction and tracking using hog-lbp matching. vol. 9026, pp.

90260H90260H14 (2014)

11. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation

invariant texture classification with local binary patterns. Pattern Analysis and

Machine Intelligence, IEEE Transactions on 24(7), 971987 (2002)

12. Ramakrishna, V., Kanade, T., Sheikh, Y.: Tracking human pose by tracking symmetric parts. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE

Conference on. pp. 37283735 (2013)

13. Ramanan, D.: Learning to parse images of articulated bodies. In: Sch

olkopf, B.,

Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems

19, pp. 11291136. MIT Press (2007)

14. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth

images. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp. 12971304 (2011)

15. Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-ofparts. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. pp. 13851392 (June 2011)

- SPIE_EI_9026_2014Uploaded bybinuq8usa
- CAS ScienceDirect 2011Uploaded bybinuq8usa
- 201304Uploaded bykricks92
- SPIE DSS 2013 CorrectedUploaded bybinuq8usa
- IEAAIE_LNAI_2012Uploaded bybinuq8usa
- 7Uploaded byGabriel Burgos
- isp rubricUploaded byapi-301806337
- ISVC LNCS 2014 AcceptedPaperUploaded bybinuq8usa
- Kalman Filtering of Movement TrajectoriesUploaded byJohn James
- VISAPP_2011Uploaded bybinuq8usa
- Smc Ieee 2013Uploaded bybinuq8usa
- Thesis Robot Localization and Kalman Filters on Finding Your Position in a Noisy WorldUploaded byRupesh Nidhi
- Rr411005 Digital Control SystemsUploaded bySrinivasa Rao G
- 10.1.1.15.7061Uploaded byglamura
- 81634Uploaded byJohnSmith
- 46.TechTips_LMDevicesUploaded byVinit Ahluwalia
- Línea baseUploaded byVictorAndresMillaSalazar
- IUTUploaded byChayatamaRamadhanBoiman
- ACCURACY-PRECISION-SIGNIFICANT-DIGITS.pdfUploaded bynkar037
- B Inggeris > Skema BI Kertas 1 Set 2Uploaded byapi-3841296
- Oce Tugas1 Kartika Tamara 3515100095Uploaded bykartika tamara maharani
- 11_05_metricas.pdfUploaded bySamuel Pardo Mesias
- Quality BasicsUploaded byAubrey Holt
- tip eduardo delatorreUploaded byapi-132965223
- Beier Radio - Dynamic Positioning Induction CourseUploaded byManuel Gomez
- Conduct a Scientific InvestigationUploaded byTrudie Pan
- ASSIGMENT 2.docxUploaded bynadhirah
- ex1a-solnsUploaded byAnonymous zt7IGm
- Alternative-to-practical-paper-6-tips.docxUploaded bysalmasoma
- day trader - uk main market 20130801Uploaded byPhilip Morrish

- SIAM 2014 InvitedTalkUploaded bybinuq8usa
- SMC 2013 PresentationUploaded bybinuq8usa
- ISVC LNCS 2014 AcceptedPaperUploaded bybinuq8usa
- VISAPP_2011Uploaded bybinuq8usa
- Smc Ieee 2013Uploaded bybinuq8usa
- Masters ThesisUploaded bybinuq8usa
- UD Stander Symposium 2014Uploaded bybinuq8usa
- IEAAIE LNAI 2012 PresentationUploaded bybinuq8usa
- SPIE_EI_9026_2014_PosterUploaded bybinuq8usa
- SPIE JEI Paper UnderReviewUploaded bybinuq8usa
- PersonReIdentification_ResultsUploaded bybinuq8usa
- SPIE DSS 2013 PresentationUploaded bybinuq8usa

- Data Warehousing and Data mining.docxUploaded byksai.mb
- Teknik Menjawab Bahasa Inggeris Untuk IbubapaUploaded bySyaful Nizam Mustapa
- 02 Trait ApproachUploaded byMèo Vuông Vuông
- Metu Neter Volume 1 by Ra Un Amen NeferUploaded byMedusin Worker
- Grammar BasicsUploaded byjaanamattis
- SIR TOTOUploaded byjjoevarie
- ethnographic essayUploaded byapi-241378842
- Role of Myelin Plasticity in Oscillations and Synchrony of Neuronal ActivityUploaded bytonylee24
- rubricsUploaded byapi-329017273
- Slavoj Zizek - Hegel versus HeideggerUploaded byAhmed Rizk
- EDDE 201 FMA 1Uploaded byGerald Getalado
- RMA tipsUploaded byTrincy
- Fetsco Psychology Chpt 1Uploaded byDjami Olii
- Instructional TechnologyUploaded byGabriel Marquez
- Peka GuideUploaded byrozitaismail673553
- How to Get FocusedUploaded byJeremy Allen
- Unit 7 - Writing (Argumentative Essay - For and Against).docxUploaded byabdoupeace
- Horgan, T. & Kriegel, U. (2007). Phenomenal Epistemology. What is Consciousness That We May Know It So WellUploaded byllmammutll
- PID Control TheoryUploaded bydeepam0187
- classroom observationUploaded byapi-263161461
- Education Technology and Technological EducationUploaded byWaseem
- DLL March 6-10 2017Uploaded byPrince Joseph Hortilano
- english 1302 research assignmentUploaded byapi-246856658
- 2017 psychology board game project item due datesUploaded byapi-260339450
- Predicting Best Match Sportsperson for Product AdvertisementUploaded byInternational Journal of Innovative Science and Research Technology
- uoi 3 parent letterUploaded byapi-265511182
- annotated bibliographyUploaded byapi-340129196
- design process - copyUploaded byapi-390902511
- AI ARTICLES.docxUploaded byShahir Shaari
- Death, Resurrection and the Continuity of Personal Identity - Glenn Andrew PeoplesUploaded byRubem_CL