Professional Documents
Culture Documents
Alina Bialkowski1,2 , Patrick Lucey1 , Peter Carr1 , Yisong Yue1 , Sridha Sridharan2 and Iain Matthews1
1
Disney Research, Pittsburgh, PA, USA, 2 Queensland University of Technology, Brisbane, Australia
a.bialkowski@connect.qut.edu.au, {patrick.lucey,peter.carr,yisong.yue,iainm}@disneyresearch.com, s.sridharan@qut.edu.au
Strategically, a team needs to spread out its players so that As there is no way to solve this problem efficiently,
the entire field is adequately covered. As a result, the prob- we achieve an approximate solution using the expectation
ability density functions should exhibit minimal overlap (See maximization (EM) algorithm [34]. Our approach is similar
Fig. 4 right). Equivalently, each role probability density func- to k-means clustering. However, instead of assigning each
tion should exhibit minimal overlap with the teams probability data point to its closest cluster, we solve a linear assignment
density function. Following the ideas of minimum entropy data problem between identities and roles using the Hungarian
partitioning [32], [33], we employ Kullback-Lieber divergence algorithm [35]. Our process is the following. We arbitrarily
to measure the overlap between two probability functions P (x) assign each player a unique role label and assume it remains
and Q(x) constant for the entire duration of the tracking data. As a result,
Z P (x) each role probability density function is initialized using the
KL(P (x)kQ(x)) = P (x) log dx. (2) tracking data of an individual player. The initial occupancy
Q(x) maps for each role resemble something similar to Fig. 4
(initial roles, left). We then iterate through each frame of
Since divergence is a strictly positive quantity (and com- the tracking data and assign role labels to player positions
pletely overlapping probability density functions have zero by formulating a cost matrix based on the log probability of
divergence), we employ a penalty Vn based on the negative each position being assigned a particular role label. We use the
divergence value between the heat map Pn (x) of an individual Hungarian algorithm [35] to compute the optimal assignment
role and that of the team P (x) of role labels. Once role labels have been assigned to all
frames of the tracking data, we recompute the probability
Vn = KL Pn (x)kP (x) . (3)
density functions of each role. The process is repeated until
Computing the optimal formation F is equivalent to convergence, resulting in well separated probability density
determining the optimal set F = {P1 (x), . . . , PN (x)} of functions similar to Fig. 4 (final, right). We normalize the
per-role probability density functions Pn (x) that minimize the tracking data in each frame to have zero mean in order to
total overlap negate the effects of translation.
F = arg max V. (4)
F
Substituting the expressions for KL divergence into the IV. D ISCOVERING AND V ISUALIZING ROLE
total overlap cost illustrates the dependence on each role- A. Spatiotemporal Tracking Data
specific 2D probability density function
For this work, we utilized an entire season of soccer player
tracking data from Prozone. The data consisted of 20 teams
N Z
X who played home and away, totaling 38 games for each team
V = P (n) P (x|n) log P (x|n)dx or 380 games overall. Five of these games were omitted due to
n=1
errors in the data files. We refer to the 20 teams using arbitrary
N
labels {A, B, . . . , T }. Each game consists of two halves,
X Z
+ P (n) P (x|n) log P (x)dx. (5) with each half containing the (x, y) position of every player
n=1 at 10 frames-per-second. This results in over 1 million data-
The expression for V is drastically simplified when put in points per game, in addition to the ball events that occurred
terms of entropy throughout the match, consisting of 43 possible events (e.g.
Z + passes, shots, crosses, tackles etc.). Each of these ball events
contains the time-stamp, location and players involved. An
H(x) = P (x) log(P (x))dx. (6)
inventory of the data is given in Table I.
Initial
G490 T1, rolesPositions
Initial Role Iteration
G490 T1, Iteration 1 = 22.44m
1, diff Iteration2, 2diff = 4.91m
G490 T1, Iteration ... Final11, diff = 0.01m
G490 T1, Iteration
1 7 1 7 1 7
7
2 65 2 6 9 2 6 9 2 6 9
9
8 4
10
3 3 5 10 3 5 3 5
10 10
1
4 8 4 8 4 8
G718 T1, Initial Role Positions G718 T1, Iteration 1, diff = 39.80m G718 T1, Iteration 2, diff = 11.49m G718 T1, Iteration 26, diff = 0.00m
1 1 1 1
5 5 7
5 8 7 5
9 7 9 9
3 29 10 3 8 3 8 3 8
6 2 2 2
6 10 6 6
7 10
4 4 4 4 10
Fig. 4. Example of how our approach works at each iteration, with the role distributions drawn as Gaussians, with the mean and covariance shown.
The resulting clusters are shown in Fig. 5, with the mean To evaluate the clustering results, we compare against
role positions of each formation overlaid over one another. It ground truth formation labels, where a soccer expert annotated
can be seen that clustering resulted in the discovery of distinct the most frequently observed formation for each half and team
according to the arrangement of players (4-4-2, 4-2-3-1, 4-3-3,
3-4-3, 4-1-4-1, or other where the team either did not display
Statistic Frequency a dominant formation or was not one of the given labels). To
Teams 20
evaluate the results, we estimated the label of each cluster as
Games 375 the most frequent ground truth label within the cluster. The
Data Points 480M results are presented as a confusion matrix in Fig. 6.
Ball Events 981K
TABLE I. I NVENTORY OF DATASET USED FOR THIS WORK . It can be seen from Fig. 6 that the discovered formation
clusters match the ground truth annotations quite well, with
[G700 M350_2]
WEHAM vs SHAMP
41
1
Confusion matrix
Shot on target
Shot on target
Shot on target
Shot on target
0.5 1
1
8 7 1 7
4231 84 10 0 0 5 1 1 4 8 7 4 8
90 7 4 1
8
2
4
2
5
10
2 5 9 6 3 69
5 3 6 10 3
2 6 9 9 3 5
5 10
0 10
102 10
96
39 6 2 3
6 7
95102
96 5
80 5 2
10 81
Shot on target
Yellow card
Yellow card
Shot on target
3 3 47 4
8 1
2
8
442 12 83 1 1 1 2 4 7 1
4
7 1
8
0.5
70
Cluster Number
60
3
343 0 0 97 1 0 1 1
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75
4
x 10
50
4
433 0 0 17 67 8 8
40
Fig. 7. By aligning the data based on role, we can quickly visualize and
digest the flow of the match based on the formation of the team. Here we
30 show snapshots of the match at different moments and highlight key events
5
4141 17 2 1 1 73 7
(circels=goals) with the home-team (red) attacking left-to-right. The xs denote
20
the mean role position across a small window of time (5 mins) and the ellipses
6
Interesting 10 24 2 17 0 48 10 show the variance of motion of each role.
0
Interesting
4231
442
343
433
4141
4-4-2
3-4-3
4-3-3
Other
4-2-3-1
4-1-4-1
LCB
1 RCB
7
RB
2
5 9 LCM
10 RCM
3 6 LW
RW
4 8
LCF
RCF
LW
RW
LB
LCB
RCM
LCF
RCB
RCF
RB
LCM
LB
LCB
1 7 RCB
RB
2 5 9 LCM
6 10 RCM
3 LW
4 8 RW
LCF
RCF
LW
RW
LB
LCB
RCM
LCF
RCB
RCF
RB
LCM
Fig. 9. The behavior of two different teams over half a match, demonstrating: (a) Their overall formation calculated using our minimum entropy data partitioning
LB
method (with roles represented as 2D Gaussians). (b) A timeline showing the role assigned to each player at each frame, colored by role. (c) A 5 min smoothed
LCB
7 temporary role swaps), (d) The role swaps across the half {1=left-back(LB), 2 = left-center-back(LCB), 3 = right-center-back(RCB),
version of the role assignments1(ignores RCB
4=right-back(RB), 5=left-centre-midfield (LCM), 6=right-centre-midfield(RCM), 7=left wing(LW), 8=right-wing(RW), 9=left forward(LF), 10=right forward(RF)}
RB
2
5 9 LCM
6 RCM
3 10 LW
4R EFERENCES
8 [20] X. Wei, P. Lucey, S. Morgan, and S. Sridharan, Sweet-Spot: Using
RW
[1] J. Yuan, Y. Zheng, C. Zhang, W. Xie, X. Xie, G. Sun, and Y. Huang, MITSSAC, 2013. RCF
LW
RW
LCF
LB
LCB
RCM
RCB
RCF
RB
LCM
T-Drive: Driving Directions based on Taxi Trajectories, in GIS, 2010. [21] , Predicting Shot Locations in Tennis using Spatiotemporal Data,
[2] L. Tang, X. Yu, S. Kim,(a) J. Han, C. Hung, and W. Peng, (b) Tru- in DICTA, (c)2013. (d)
Alarm: Trustworthiness Analysis of Sensor Networks in Cyber-Physical [22] S. Intille and A. Bobick, Recognizing Planned, Multi-Person Action,
Systems, in ICDM, 2010. Computer Vision and Image Understanding, vol. 81, pp. 414445, 2001.
[3] Y. Zheng, X. Xie, and W. Ma, GeoLife: A Collaborative Social [23] G. Zhu, Q. Huang, C. Xu, Y. Rui, S. Jiang, W. Gao, and H. Yao,
Networking Service Among User, Service , IEEE Data Engineering Trajectory based event tactics analysis in broadcast sports video, in
Bulletin, 2010. ACM Multimedia, 2007.
[4] J. Gudmundsson and M. Krevald, Computing Longest Duration Flocks [24] M. Perse, M. Kristan, S. Kovacic, and J. Pers, A Trajectory-Based
in Trajectory Data, in GIS, 2006. Analysis of Coordinated Team Activity in Basketball Game, CVIU,
2008.
[5] J. Lee, J. Han, and K. Whang, Trajectory Clustering: A Partition-and-
Group Framework, in Int. Conf. Mgmt. of Data, 2007. [25] J.-C. Bricola, Classification of multi-agent trajectories, Masters the-
sis, EPFL, 2012.
[6] STATS SportsVU, www.sportvu.com.
[26] D. Stracuzzi, A. Fern, K. Ali, R. Hess, J. Pinto, N. Li, T. Konik, and
[7] Prozone, www.prozonesports.com. D. Shapiro, An Application of Transfer to American Football: From
[8] SportsVision, www.sportsvision.com. Observation of Raw Video to Control in a Simulated Environment, AI
[9] Hawk-Eye, www.hawkeyeinnovations.co.uk. Magazine, vol. 32, no. 2, 2011.
[27] K. Kim, M. Grundmann, A. Shamir, I. Matthews, J. Hodgins, and
[10] E. GameCast, http://www.espnfc.com/gamecast/392450/gamecast.html.
I. Essa, Motion Fields to Predict Play Evolution in Dynamic Sports
[11] P. Lucey, A. Bialkowski, P. Carr, S. Morgan, I. Matthews, and Y. Sheikh, Scenes, in CVPR, 2010.
Representing and Discovering Adversarial Team Behaviors using [28] P. Carr, M. Mistry, and I. Matthews, Hybrid Robotic/Virtual Pan-Tilt-
Player Roles, in CVPR, 2013. Zoom Cameras for Autonomous Event Recording, in ACM Multimedia,
[12] K. Goldsberry, CourtVision: New Visual and Spatial Analytics for the 2013.
NBA, in MITSSAC, 2012. [29] P. Lucey, A. Bialkowski, P. Carr, Y. Yue, and I. Matthews, How to Get
[13] R. Masheswaran, Y. Chang, J. Su, S. Kwok, T. Levy, A. Wexler, an Open Shot: Analyzing Team Movement in Basketball using Tracking
and N. Hollingsworth, The Three Dimensions of Rebounding, in Data, in MITSSAC, 2014.
MITSSAC, 2014. [30] A. Bialkowski, P. Lucey, P. Carr, Y. Yue, and I. Matthews, Win at
[14] D. Cervone, A. DAmour, L. Bornn, and K. Goldsberry, POINTWISE: home and draw away: Automatic formation analysis highlighting the
Predicting Points and Valuing Decisions in Real Time with NBA Optical differences in home and away team behaviors, in MITSSAC, 2014.
Tracking Data, in MITSSAC, 2014. [31] X. Wei, L. Sha, P. Lucey, S. Morgan, and S. Sridharan, Large-Scale
[15] A. Miller, L. Bornn, R. Adams, and K. Goldsberry, Factorized Point Analysis of Formations in Soccer, in DICTA, 2013.
Process Intensities: A Spatial Analysis of Professional Basketball, in [32] S. Roberts, R. Everson, and I. Rezek, Minimum Entropy Data Parti-
ICML, 2014. tioning, IET, pp. 844849, 1999.
[16] P. Lucey, A. Bialkowski, P. Carr, E. Foote, and I. Matthews, Character- [33] Y. Lee and S. Choi, Minimum Entropy, K-Means, Spectral Clustering,
izing Multi-Agent Team Behavior from Partial Team Tracings: Evidence in International Joint Conference on Neural Networks, 2004.
from the English Premier League, in AAAI, 2012. [34] A. Dempster, N. Laird, and D. Rubin, Maximum Likelihood from
[17] P. Lucey, D. Oliver, P. Carr, J. Roth, and I. Matthews, Assessing team Incomplete Data via the EM Algorithm, Journal of the Royal Statistical
strategy using spatiotemporal data, in ACM SIGKDD, 2013. Society, vol. 39, no. 1, pp. 138, 1977.
[18] J. Gudmundsson and T. Wolle, Football Analysis using Spatiotemporal [35] H. W. Kuhn, The hungarian method for the assignment problem,
Tools, Computers, Environment and Urban Systems, 2013. Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 8397, 1955.
[19] J. Pena and H. Touchette, A Network Theory Analysis of Football [36] Y. Rubner, C. Tomasi, and L. Guibas, A Metric for Distributions with
Strategies, arXiv preprint arXiv:1206.6904, 2012. Applications to Image Databases, in ICCV, 1998.