You are on page 1of 10

Interview Habana Labs targets AI processors

https://www.convergedigest.com/2018/09/interview-habana-labs-targets-ai.html

• Home
• Women in Networking »
• Service Providers »
• Mobile »
• Financial »
• Packet Systems »
• Optical »
• Silicon »
• Geographic »

Interview: Habana Labs targets AI processors

Monday, September 17, 2018 Habana, OND

Habana Labs, a start-up based in Israel with offices in Silicon Valley, emerged from
stealth to unveil its first AI processor.
Habana's deep learning inference processor, named Goya, is >2 orders of magnitude
better in throughput & power than commonly deployed CPUs, according to the
company. The company will offer a PCIe 4.0 card that incorporates a single Goya
HL-1000 processor and designed to accelerate various AI inferencing workloads,
such as image recognition, neural machine translation, sentiment analysis,
recommender systems, etc. A PCIe card based on its Goya HL-1000 processor
delivers 15,000 images/second throughput on the ResNet-50 inference benchmark,
with 1.3 milliseconds latency, while consuming only 100 watts of power.

Habana is also developing an inference software toolkit to simplify the development


and deployment of deep learning models (topologies) for mass-market use. The idea
is to provide an inference network model compilation and runtime that eliminates
low-level programming of the processor.

I recently sat down with Eitan Medina, Habana Labs' Chief Business Officer, to
discuss the development of this new class of AI processors and what it means for the
cloud business.

Jim Carroll: Who is Habana Labs and how did you guys get started?

Eitan Medina: Habana was founded in 2016 with the goal of building AI processors
for inference and training. Currently, we have about 120 people on board, mostly in
R&D and based in Israel. We have a business headquarters here in Silicon Valley. In
terms of the background of the management team, most of us have deep expertise in
processors, DSPs, and communications semiconductors. I previously was the CTO
for Galileo Technology (acquired by Marvell), and now I am on the business side. I
would say we have a very strong and multidisciplinary team for machine learning. We
certainly have the expertise in the processing, software and networking to architect a
complete hardware and software solution for deep learning.

In building this company, we identified the AI space as one that deserves its own
class of processors. We believe that the existing CPUs and GPUs are not good
enough.

The first wave of these AI processors are coming now or being announced now.
Habana decided that unlike other semiconductor companies, we would only emerge
from stealth once we have an actual product. We have production samples now and
that is why we are officially launching the company.

Jim Carroll: Who are the founders and what motivated them to enter this market
segment?

Eitan Medina: The two co-founders are David Dahan (CEO) and Ran Halutz (VP
R&D), who worked together at Prime Sense, a company that was acquired by Apple.
We also have onboard Shlomo Raikin (CTO), who was the Chief SoC Architect at
Mellanox and who has 45 patents. We've also been able to recruit top talent from
across the R&D ecosystem in Israel. The lead investors are Avigdor Willenz
(Chairman), Bessemer, and WALDEN (Lip-Bu Tan).

Jim Carroll: What does the name "Habana" refer to?

Eitan Medina: In Hebrew, Habana means "understanding" -- a good name for an AI


company.

Jim Carroll: The market for AI processors, obviously, is in its infancy. How do
you see it developing?

Eitan Medina: Well, some analysts are already projecting a market for new class of
chipsets for deep learning. Tractica, for instance, divides the emerging market into
CPUs, GPUs, FPGAs, ASICs, SoC accelerators, and other devices. We see the need
for a different type of processor because of the huge gap between the computational
requirements for AI and the incremental improvements that vendors have delivered
over the past few years, which right are just small improvements to CPUs and GPUs.
Look at the best-in-class, deep learning models and then calculate how much
computing power is needed to train them. Look at how these requirements have
grown over the past few years. Trying graphing this progression and you will see a log
scale graph with a doubling time of three and a half months. That's 10x every year.
Initially, people were running machine learning on CPUs, and then they adopted
Nvidia's GPUs. What we see in the market today is that training is dominated by
GPUs, while influence is dominated by CPUs.

Jim Carroll: So what is Habana's approach?

Eitan Medina: When we looked at the overall deep learning space, we began with the
workflows. It is important to understand that there's a training workflow, and there's an
inference workflow. What we are introducing today is our "Goya" inference processor.
Our "Gaudi" training processor will be introduced in the second quarter of 2019. It will
feature a 2Tbps interface per device and its training performance scales linearly to
thousands of processors. We intend to sell line cards equipped with these processors,
which you can then plug into your existing servers.

The inference processor offloads this workload completely from the CPU. Therefore,
you will not need to replace your existing servers with more advanced CPUs. What
can this do for you? This is where our story gets really interesting. We're about more
than an order of magnitude improvement.
Look at this graph showing our ResNet-50 inference throughput and latency
performance. On the left side is the best performance Intel has shown to date on a
dual socket Xeon Platinum. Latency is not reported, which could be a critical issue.
In the middle is Nvidia's V100 Tensor GPU, with shows 6ms of latency -- not bad, but
we can do better. Our performance, shown on the right, exceeds 15,000 images per
second with just 1.3ms of latency. Our card is just 100 watts, whereas we estimate at
least 400 watts for the other guys.

Jim Carroll: Where are you getting these gains? Are you processing the images
in a different way?

Eitan Medina: Well, I can say that we are not changing the topology. If you are an AI
researcher with a ResNet-50 topology, we will take you topology and ingest it to our
compiler. We're not forcing you to change anything in your model.

Jim Carroll: So, if we try to understand the magic inside a GPU, Nvidia will talk
about their ability to process polygons in parallel with large numbers of cores.
Where is the magic for Habana?

Eitan Medina: Yeah, Nvidia will say they are really good at figuring out polygons, and
may tell you about the massive memory bandwidth they can provide to the many
cores. But, at the end of the day, if you are interested in doing image recognition, you
only really care about application performance, not the stories of how wonderful the
technology is.

Let's assume for a second that there's a guy with a very inefficient image processing
architecture, ok? What would this guy do to give you better performance from
generation to generation? He would just pack more of the same stuff each time --
more more memory, more bandwidth, and more power. And then he would tell you to
"buy more to save more". Sound familiar? This guy can show you improvements, but
if he's carrying that inefficiency throughout the stack, it is just going to be more of the
same. If a new guy comes to market, what you want to see is application performance.
What's your latency? What's your throughput? What's your accuracy? What's your
power? What's your cost? If we can show all of that, then we don't have to have a
debate about architecture.

Jim Carroll: So, are you guys using the same "magic" to deliver inference
performance?

Eitan Medina: No, but for now, I want to show you what we can do. The lion share of
inference processors used by cloud operators today are CPUs -- an estimated 91% of
these workloads are running on CPUs. Nvidia so far has not come up with a solution
to move this market to GPUs. The market is using their GPUs mainly for training.

Our line card, installed in this server, can ingest and process 15,000 frames per
second through the PCI bus. Because our chip is so efficient, we don't need crazy
memory technologies or specialized manufacturing techniques. In fact, this chip is
built with 16 nanometer technology, which is quite mature and well-understood. As
soon as we got the first device back from TSMC, we had ResNet up and running
immediately.

In a cloud data center, three of our line cards could deliver the inference processing
equivalent of 169 Intel powered servers or eight of Nvidia's latest Tesla V100 GPUs.
Habana Labs is showcasing a Goya inference processor card in a live server, running
multiple neural-network topologies, at the AI Hardware Summit on September 18 – 19,
2018, in Mountain View, CA.

Email This BlogThis! Share to Twitter Share to Facebook


Newer Post Older Post Home

Links to this post

Create a Link

See also

• About Converge! Network Digest


• Subscribe to Daily Newsletter
• CND YouTube Channel
• Event Calendar 2018
• Blueprint Guidelines

Popular Posts

Telenor picks Nokia for 5G cloud core

Telenor will deploy a cloud-native core solution based on Nokia AirGile


technology, including the AirFrame data center and Cloud Packet Core...

Keysight, Qualcomm test 5G NR Standalone Data Transfer

Keysight Technologies and Qualcomm have demonstrated a 3GPP 5G New


Radio (NR) standalone (SA) mode IP data transfer. The testing, which us...

2019 Network Predictions - Operators must ‘scale or fail’ for 5G

by Heather Broughton, Sr. Director of Service Provider Marketing,


NETSCOUT Operators will ‘scale or fail’ to meet the 5G demand in 2019 ...

2019 Network Predictions

Bill Fenick, VP of enterprise at Interxion Enterprises will be smarter about the


cloud The cloud has quickly become a mainstay in the ent...

2019 Network Predictions


by Angelique Medina, senior product market manager, ThousandEyes 2018
has seen the acceleration of modern infrastructure from public cloud...

AT&T and Tillman Infrastructure cite progress in new tower builds

Tillman Infrastructure has built hundreds of new macro cell towers for lease to
AT&T and hundreds of additional tower builds are underwa...

Dr. Richard Uhlig to serve as Managing Director of Intel Labs

Dr. Richard Uhlig has been named as the new managing director of Intel Labs.
Prior to this role, Rich was the director of Systems and Softwa...

2019 Network Predictions - The campus becomes hot again

Michael Bushong, Juniper Networks’ VP of Enterprise and Cloud


Marketing Network automation will hit the curve in the proverbial hockey s...

Packet's bare metal edge cloud to leverage Netronome SmartNICs

Packet, a start-up developing a bare metal cloud for developers, will leverage
Netronome's SmartNICs to power cloud-native workloads at ...

Zayo plans data center in Piscataway, New Jersey

Zayo will open a data center in Piscataway, New Jersey and has signed an
anchor agreement with a leading financial services tenant. The cont...

You might also like