Counting Crowds and Lines Dimroc

dimroc See Experiments
Counting Crowds and Lines

19 Nov 2017
Updated with video footage of the CUHK Mall Dataset:
0:00 / 6:40
The ML and site for this post can be found at

countingcompany.com.
In Union Square, NYC, there’s the untoppable burger joint

named Shake Shack that’s always crowded. A group of us
would obsessively check the Shake Cam around lunch to
figure out if that trip was worth it.
14 person line, not bad

Rather than do this manually (come on, it’s nearly 2018),
it would be great if this could be done for us. Then, to take
that idea further, imagine being able to measure foot
traffic on a month to month basis or to measure the
impact of a new promotional campaign.
Object detection has received a lot of attention in the
deep learning space, but it’s ill-suited for highly
congested scenes like crowds. In this post, I’ll talk about
how I implemented multi-scale convolutional neural
network (CNN) for crowd and line counting.
Why not object detection

Regional-CNN’s (R-CNN) use a sliding window to find an
object. High density crowds are ill-suited for sliding
windows due to high occlusion:
Failed attempt with off the shelf (no retraining) TensorFlow R-CNN
Further exploration in this approach led me to TensorBox,
but it too had issues with high congestion and large crowd
counts.
Density Maps to the rescue

Rather than a sliding window, density maps (aka heat
maps) estimate the likelihood of a head being at a
location:
Crowd photo from the UCF Dataset
3406 vs 3408? Pretty close!
What’s happening here?
Based on multi-scale convolutional neural network (CNN)

for crowd counting, the ground truth is generated by
taking the head annotations and setting that pixel value
to one, and then gaussian blurring the image. The model
is then trained to output these blurred images, or density
maps. The sum of all the image pixels then results in the
crowd count prediction. Read the paper for more insight.
Let’s look at density maps applied to the shake cam. Don’t

worry about the color switch from blue to white for the
density maps.
The sum of the pixel values is the size of the crowd

As you can see above, we have:
1. The annotated image courtesy of AWS Mechanical

Turk.
2. The calculated ground truth by setting head
locations to one and then gaussian blurring.
3. The model’s prediction after being trained with
ground truths.
How to get the images?

From your neighborhood Shake Shack Cam of course.
How to annotate the data?

The tried and true AWS Mechanical Turk, with a twist: a
mouse click annotates a head as shown below:
I went ahead and modified the bbox-annotator to be a
single click head annotator.
How to count the line?

Lines aren’t merely people in a certain space, they are
people standing next to each other to form a contiguous
collection of people. As of now, I simply feed the density
map into a three layer fully connected (FC) network to
output a single number, the line count.
Gathering data for that also ended up being a task in AWS

Mechanical Turk.
Here are some examples of where lines aren’t immediately

obvious:
Making a product out of data science
This is all good fun working on your development box, but
how do you host it? This will be a topic for another blog
post, but the short story is:
1. Make sure it doesn’t look bad! Thanks to the design

work done by Steve @ thoughtmerchants.com
2. Use Vue JS and d3 to visualize the line count.
3. Create a docker image with your static assets and
Conda dependencies.
4. Deploy to GCP with kubernetes on Google Container
Engine.
5. Periodically run a background job to scrape the
shake cam image and run a prediction.
I did the extra credit step of having a Rails application
interact with the ML service via gRPC, while integration
testing with PyCall. Not necessary, but I’m very happy
with the setup.
Unexpected Challenges
These following challenges have contributed to erroneous
line predictions:
1. Umbrellas. Not a head but still a person.

2. Shadows. Around noon there can be some strong
shadows resembling people.
3. Winter Darkness. It gets much darker much sooner
in November and December. Yet the model was
trained predominantly with images of people in
daylight.
4. Winter Snow. Training data never had snow, and
now we have mistakes like this:
As I discover more of these scenarios, I’ll know what data

to gather for a model retraining.

Counting Crowds and Lines Dimroc

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Counting Crowds and Lines Dimroc

Uploaded by

Copyright:

Available Formats

dimroc See Experiments

Counting Crowds and Lines

Updated with video footage of the CUHK Mall Dataset:

The ML and site for this post can be found at

In Union Square, NYC, there’s the untoppable burger joint

14 person line, not bad

Why not object detection

Density Maps to the rescue

What’s happening here?

Based on multi-scale convolutional neural network (CNN)

Let’s look at density maps applied to the shake cam. Don’t

The sum of the pixel values is the size of the crowd

1. The annotated image courtesy of AWS Mechanical

How to get the images?

How to annotate the data?

How to count the line?

Gathering data for that also ended up being a task in AWS

Here are some examples of where lines aren’t immediately

1. Make sure it doesn’t look bad! Thanks to the design

1. Umbrellas. Not a head but still a person.

As I discover more of these scenarios, I’ll know what data

You might also like