You are on page 1of 4

2010 International Conference on Pattern Recognition

Detecting Dominant Motion Flows In Unstructured/structured Crowd Scenes


Ovgu Ozturk Toshihiko Yamasaki Kiyoharu Aizawa The University of Tokyo {ovgu, yamasaki, aizawa}@hal.t.u-tokyo.ac.jp

Abstract
Detecting dominant motion flows in crowd scenes is one of the major problems in video surveillance. This is particularly difficult in unstructured crowd scenes, where the participants move randomly in various directions. This paper presents a novel method which utilizes SIFT features flow vectors to calculate the dominant motion flows in both unstructured and structured crowd scenes. SIFT features can represent the characteristic parts of objects, allowing robust tracking under non-rigid motion. First, flow vectors of SIFT features are calculated at certain intervals to form a motion flow map of the video. ext, this map is divided into equally sized square regions and in each region dominant motion flows are estimated by clustering the flow vectors. Then, local dominant motion flows are combined to obtain the global dominant motion flows. Experimental results demonstrate the successful application of the proposed method to challenging real-world scenes.

(a) Structured Crowd Scenes

(b) Unstructured Crowd Scenes

1. Introduction
Dominant motion patterns in videos provide very significant information which has a wide range of applications. Since motion patterns are formed by individual motions or interacted motions of crowds, it helps to analyze the social behavior in a given environment in the video. Furthermore it is useful during public place design and activity analysis for security reasons. Over the years, there have been many researches which try to find the motion patterns by using individual object tracking and trajectory classification methods. However, in real world situations, high density crowds form the most cases, and it is not always possible to track individual
1051-4651/10 $26.00 2010 IEEE DOI 10.1109/ICPR.2010.862 3521 3537 3533

Figure 1. Unstructured/structured Crowds. objects. Crowd scenes can be divided into two groups, unstructured and structured scenes, as in Figure 1. Structured crowds are the ones where main motion tracks are defined by environmental conditions, such as elevators, crosswalks, etc. Unstructured crowds are those where objects can move freely in any direction, following any path. So far, very few researchers have attempted to solve the complexity of the crowd scenes that are structured. Detecting dominant motion flows in unstructured crowds still remains as a challenging task. To solve the problem of calculating the dominant motion flows both in unstructured and structured crowds, we propose a new approach which has two distinctive contributions. First, our approach utilizes motion flows of the SIFT features in a scene. Unlike corner-based features which have been used commonly in other researches, SIFT features can represent characteristics parts of the objects. Therefore, their tracking consistency and accuracy are higher during complex motions. Second, we propose a hierarchical clustering framework to deal with the complexity of unstructured motion flows.

Entire scene is divided into equally sized local regions. In each local region, flow vectors are classified into groups based on their orientation. Then, location-based classification is applied to find the spatial accumulation of the vectors. Finally, local dominant motion flows are connected to obtain global dominant motion flows.
Figure 2. SIFT motion flow vectors.

1.1. Related Work Tracking individual objects and constructing the trajectories is a common approach to find the global motion flows as in [1, 6]. However, for crowd videos, continuous tracking of individual objects is not possible because of occlusion or failures. Another approach is to employ instantaneous flow vectors of image features in the entire image [3-5, 11]. They use corner-based features. But, these features are not reliable under non-rigid motion, affine transformation or noise. Hence, these researches consider only structured motions and do not work for unstructured crowds. In [4], they use neighborhood information, but it fails when a region contains flows with multiple directions eliminating each other. In [7], they propose floor fields, which are applicable for structured crowds. Only, the work in [2] considers unstructured crowd scenes where they try to track individual targets.

(a)

(b)

Figure 3. (a) Unstructured Crowd Scene. (b) SIFT flows in the marked region for 400 frames.

2. Generating SIFT Feature Flows


In this paper, SIFT features are used to calculate the motion flows. SIFT features are known to be one of the best features that are robust under various transformations. They can be used to continuously track the foreground objects over many frames. Thus, instead of calculating the motion flows at each frame, we track the features at certain intervals. It provides us two advantages. First, it reduces the noise coming from background and unstable points. And computed motion flow vectors can be used directly without any pre or post processing. Each video is segmented into intervals with length d. SIFT features extracted in a frame will be matched to the corresponding features in the next frame after the interval d. The displacement vectors of the features over a certain threshold are defined as flow vectors. Figure 2 depicts the flow vectors. Flow vector is represented with F(x, y, , t, L), where: x,y : center of mass : orientation L : length t : frame number
3534 3538 3522

Figure 3(a) demonstrates an unstructured crowd scene. Motion flow map of the region in white square is depicted in 3(b). Motion flows are calculated for 400 frames with interval length 3. Accumulation of flow vectors can be seen in certain orientations. However, if the variety of orientations in the region increases, the flow map becomes very complicated. When entire scene is considered, data amount and complexity will be higher. In this case, common clustering methods [3] in the literature will not work effectively. We introduce a hierarchical clustering method to detect the dominant motion flows in the region, which is explained in the next section.

3. Calculating Dominant Motion Flows


Detecting dominant motion flows is defined as finding the orientation and spatial distribution of the mostly followed paths in a scene during a given period. If the motion of the objects in a video has an organized behavior, then one type of orientation can be assigned to each location. However, for crowd videos, especially unstructured crowds, participants move in various directions at different times. Each spatial location holds more than one orientation type depending on the time. It is not possible to find the dominant flows by existing methods [3, 4, 11]. In this work, entire scene is divided into smaller regions, in which flows vectors are easier to separate into meaningful groups. Then, the flow vectors in each region are clustered with a two-step hierarchical approach to find the local dominant motion flows. Figure 4 shows the hierarchical clustering steps.

(a)

(a)

(b) Figure 4. Hierarchical Clustering.

(b) orientation based clustering

(c) spatial clustering

(d) local motion flows

Finally, local dominant motion flows are connected to compute the global dominant motion flows.

3.1. Hierarchical Clustering of Flow Vectors


Orientation information is the most significant information while classifying the flow vectors. In each local region, first, flow vectors are classified into one of the four main orientation groups. Figure 4 shows the grouping of orientations. To achieve this, orientation histogram is calculated and major groups are chosen to represent the region. For example, in Figure 5(b), there are two groups depending on the orientation as depicted in blue and green. Second step is spatial clustering. Flow vectors in each orientation group are clustered based on the location. Hence accumulations of the vectors in the region are detected as in Figure 5(c). For this, Self-Tuning Spectral Clustering method has been applied considering the evaluation results in [3]. After clustering, local dominant motion flows are calculated by computing the average location, average orientation and total number of the flow vectors in each group. So, local dominant motion flow for each group is described with L(x, y, w, ). w symbolizes the number of vectors and depicted with the width of the flow vector. Figure 5(d) shows three dominant motion flows calculated in the region.

Figure 5. Hierarchical clustering. are stated as horizontal flows, whereas groups I and IV are vertical flows. The algorithm is as follows: While scanning, for each local motion flow, 1. Determine the neighbor cells, Ns. 2. In each N, search for the motion flows that are in the same orientation group 3. Choose the closest one in the neighborhood and connect with the current flow. 4. If, there are not motion flows with the same orientation group in the neighbor cells and next neighbor cells, choose the motion flow that is the closest

(a)

(b) Figure 6. Connecting the local flows.

3.2. Combining Local Dominant Flows


Once, main flows in local regions are detected, next question is how to combine them and obtain the global motion flows. The basic logic is to start from one side of the scene and follow the local flows and connect them to the most probable neighbor flows till the end of the scene. In other words, first, the entire scene is scanned horizontally to connect the horizontal flows. After this, it is scanned vertically to connect the vertical flows. Orientation groups II, III

Neighbor cells are defined as the two regions that are in the direction of the current flow. For example, in Figure 6(a), for the horizontal vector, the neighborhood cells are c, e and next neighbor cells are c, e. In Figure 6, the vectors shown with A are in orientation group II. A1 is connected to A3 and A2, A3 are connected to A4. Hence, they form the global flow shown with bold gray line. If there are not any vectors in the neighbor and next neighbor cells, then it is connected to the closest vector to keep the continuity. In which case, it means there is a dominant abrupt motion orientation change in that region. For example, if there wasnt A4 , A3 would be connected to B1.

3535 3539 3523

(a)

(b)

(c)

(d)

Figure 7. Experimental results.

4. Experimental Results and Discussion


In our experiments, crowd data sets are taken from the datasets of University of Central Florida [4] to provide a comparison with the related works. (a) shows the input scenes and SIFT flows, (b) shows the results of our method with detailed lines. (c) shows the results in thick lines after combining the groups and generating one group for each global flow. (d) shows the ground truth which is drawn from the average result of user study. The image size for two sets is 360x480-pixels. Local regions are 60x60pixels size. There are 48 (6x8) local regions in total The set at the top is from an escalator neighborhood, which is a structured crowd example. The video is analyzed between frames 100 and 460 with an interval of three. Most of the people move on the escalators and the people on the far end of the escalators walk freely. The proposed method can successfully detect the global motion flows in free motion regions as well as the flows through the escalators The one at the bottom is from a street, which is an unstructured crowd example and complexity is high. Video is analyzed between frames 140 and 460 with interval length three. In 7(b) the local regions and the connection of the local motion flows can be seen. For the street scene, our system catches the parallelism in the upper half of the scene. And the crossing of the motion flows is also detected in the lower part. Also, 3 main flows of vertical motion are detected, it is shown with purple in 7(b). With the proposed approach, dominant motion flows can be detected in various levels. General dominant flow maps can be provided as in (c) or if necessary local analysis of the flows can also be obtained as in (b).

5. Conclusions
In this work, we have presented a new approach to solve the problem of calculating dominant motion flows in various crowd scenes. By using SIFT feature flows and hierarchical clustering approach, it becomes possible to analyze the motion flows even for unstructured and structured crowds. The proposed approach can detect global motion flows, at the same time it can give information about local characteristics of the motion flows.

References
[1] F. M. Porikli, Trajectory Pattern Detection by HMM Parameter Space Features and Eigenvector Clustering, ECCV, 2004. [2] M. Rodriguez, S. Ali and T. Kanade, Tracking In Unstructured Crowded Scenes, ICCV, 2009. [3] G. Eibl, N. Brandle, Evaluation of Clustering Methods for Finding Dominant Optical Flow Fields in Crowded Scenes, ICPR, 2008. [4] M. Hu, S. Ali and M. Shah, Detecting Global Motion Patterns in Complex Videos, ICPR, 2008. [5] G. Brostow, R. Cipolla, Unsupervised Bayesian Detection of Independent Motion in Crowds, CVPR, 2006. [6] X. Wang et al., Learning Semantic Scene Models by Trajectory Analysis, ECCV, 2006. [7] S. Ali, M. Shah, Floor Fields for Tracking in High Density Crowd Scenes, ECCV, 2008. [8] B. D. Lucas and T. Kanade, An Iterative Image Registration Technique with an Application to Stereo Vision, IJCAI, 1981. [9] D. Lowe. Distinctive image features from scaleinvariant key points. Intl. J. of Computer Vision, 60(2):91110, 2004. [10] Y. Tsuduki, H. Fujiyoshi, A Method for Visualizing Pedestrian Traffic Flow using SIFT, PSIVT, 2009. [11] N. Ihaddadene, C. Djeraba, Real-time Crowd Motion Analysis, ICPR, 2008. [12] L. Zelnik-Manor, P. Perona, Self-Tuning Spectral Clustering, In Adv. Neur. Inf. Proc. Sys.: 16011608, 2004.

3536 3540 3524

You might also like