You are on page 1of 143

LECTURE 1: INTRODUCTION TO CLOUD COMPUTING ....................................................................

6
Essential Characteristics: ............................................................................................................. 6
Service Models: ........................................................................................................................... 7
Deployment Models: ................................................................................................................... 7
LECTURE 2: Introducing Windows Azure ................................................................................... 9
Azure Overview ........................................................................................................................... 9
Is Your Application a Good Fit for Windows Azure? ................................................................. 11
Understand the Benefits of Windows Azure ............................................................................. 11
Target Scenarios that Leverage the Strengths of Windows Azure ............................................ 12
Scenarios that Do Not Require the Capabilities of Windows Azure.......................................... 15
Evaluate Architecture and Development .................................................................................. 16
Summary.................................................................................................................................... 18
LECTURE 3: Main Components of Windows Azure ....................................................................... 19
Table of Contents .................................................................................................................... 19
The Components of Windows Azure .................................................................................... 19
Execution Models .................................................................................................................... 20
Data Management .................................................................................................................. 23
Networking .............................................................................................................................. 25
Business Analytics ................................................................................................................... 27
Messaging ................................................................................................................................ 29
Caching ..................................................................................................................................... 30
Identity ...................................................................................................................................... 32
High-Performance Computing .............................................................................................. 33
Media ........................................................................................................................................ 33
Commerce ................................................................................................................................ 34
SDKs .......................................................................................................................................... 35
Getting Started ........................................................................................................................ 36
Lecture 4: WINDOWS AZURE COMPUTE ....................................................................................... 62
Web Sites vs Cloud Services vs Virtual Machines ...................................................................... 62
WINDOWS AZURE CLOUD SERVICES: ................................................ Error! Bookmark not defined.
WEB ROLE AND WORKER ROLE ..................................................................................................... 70
THE THREE RULES OF THE WINDOWS AZURE PROGRAMMING MODEL................................... 70
A WINDOWS AZURE APPLICATION IS BUILT FROM ONE OR MORE ROLES ............................... 71
A WINDOWS AZURE APPLICATION RUNS MULTIPLE INSTANCES OF EACH ROLE ..................... 72
A WINDOWS AZURE APPLICATION BEHAVES CORRECTLY WHEN ANY ROLE INSTANCE FAILS . 73
WHAT THE WINDOWS AZURE PROGRAMMING MODEL PROVIDES ......................................... 75
IMPLICATIONS OF THE WINDOWS AZURE PROGRAMMING MODEL: WHAT ELSE CHANGES? . 78
MOVING WINDOWS SERVER APPLICATIONS TO WINDOWS AZURE ......................................... 82
CONCLUSION ............................................................................................................................. 84
Cloud service concept ................................................................................................................ 85
Concepts ....................................................................................... Error! Bookmark not defined.
Data Management and Business Analytics .............................................................................. 88
Table of Contents .................................................................................................................... 88
Blob Storage ............................................................................................................................ 88
Running a DBMS in a Virtual Machine ................................................................................. 90
SQL Database .......................................................................................................................... 91
Table Storage ........................................................................................................................... 96
Hadoop ..................................................................................................................................... 97
Lecture 6: .................................................................................................................................... 100
Windows Azure SQL Database .................................................................................................... 100
Similarities and Differences ..................................................................................................... 100
Compare SQL Server with Windows Azure SQL Database (en-US) .................................... 103
Table of Contents .................................................................................................................. 103
............................................................................................................................................... 103
Similarities and Differences.......................................................................................... 103
Logical Administration vs. Physical Administration .................................................. 103
Provisioning ................................................................................................................... 103
Transact-SQL Support .................................................................................................. 103
Features and Types ....................................................................................................... 103
Key Benefits of the Service ........................................................................................... 103
o Self-Managing ............................................................................................................... 103
o High Availability ............................................................................................................ 103
o Scalability ....................................................................................................................... 103
o Familiar Development Model ...................................................................................... 103
o Relational Data Model .................................................................................................. 103
See Also .......................................................................................................................... 103
Other Languages ........................................................................................................... 103
Similarities and Differences ................................................................................................. 103
Logical Administration vs. Physical Administration .......................................................... 103
Provisioning ........................................................................................................................... 104
Transact-SQL Support .......................................................................................................... 104
Features and Types ............................................................................................................... 105
Key Benefits of the Service ................................................................................................... 105
Federations in Windows Azure SQL Database (formerly SQL Azure) .......................................... 109
Federation Architecture .......................................................................................................... 109
Design Considerations ............................................................................................................. 111
LECTURE 7: ................................................................................................................................. 125
NETWORKING, CACHING AND ACCESS CONTROL IN WINDOWS AZURE ....................... 125
Windows Azure Networking .................................................................................................... 125
Table of Contents .................................................................................................................. 125
Windows Azure Virtual Network ......................................................................................... 125
Windows Azure Connect ...................................................................................................... 127
Windows Azure Traffic Manager ......................................................................................... 129
Caching in Windows Azure .......................................................................................................... 131
Caching (Preview) on Roles ..................................................................................................... 131
Shared Caching ........................................................................................................................ 131
In This Section ................................................................................ Error! Bookmark not defined.
Lecture 8: .................................................................................................................................... 117
WINDOWS AZURE SERVICE BUS ............................................................................................. 117
Software, Services, Clouds, and Devices ................................................................................ 117
Fulfilling the Potential ............................................................................................................. 117
Feature Overview .................................................................................................................... 118
Relayed and Brokered Messaging ............................................................................................... 120
Relayed Messaging .................................................................................................................. 120
Brokered Messaging ................................................................................................................ 121
What are Service Bus Queues ................................................................................................. 121
What are Service Bus Topics and Subscriptions ...................................................................... 122
What is the Service Bus Relay ................................................................................................. 123


LECTURE 1: INTRODUCTION TO CLOUD
COMPUTING
(From The NIST Definition of Cloud Computing)
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network
access to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.
This cloud model is composed of five essential characteristics, three service models, and
four deployment models.
Essential Characteristics:
On-demand self-service: A consumer can unilaterally provision computing
capabilities, such as server time and network storage, as needed automatically
without requiring human interaction with each service provider.
Broad network access: Capabilities are available over the network and accessed
through standard mechanisms that promote use by heterogeneous thin or thick
client platforms (e.g., mobile phones, tablets, laptops, and workstations).
Resource pooling: The providers computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand. There is a
sense of location independence in that the customer generally has no control or
knowledge over the exact location of the provided resources but may be able to specify
location at a higher level of abstraction (e.g., country, state, or datacenter). Examples
of resources include storage, processing, memory, and network bandwidth.
Rapid elasticity: Capabilities can be elastically provisioned and released, in some
cases automatically, to scale rapidly outward and inward commensurate with demand.
To the consumer, the capabilities available for provisioning often appear to be unlimited
and can be appropriated in any quantity at any time.
Measured service: Cloud systems automatically control and optimize resource use by
leveraging a metering capability at some level of abstraction appropriate to the type of
service (e.g., storage, processing, bandwidth, and active user accounts). Typically this
is done on a pay-per-use or charge-per-use basis. Resource usage can be monitored,
controlled, and reported, providing transparency for both the provider and
consumer of the utilized service.
A cloud infrastructure is the collection of hardware and software that enables
the five essential characteristics of cloud computing. The cloud infrastructure can be
viewed as containing both a physical layer and an abstraction layer. The physical layer
consists of the hardware resources that are necessary to support the cloud services being
provided, and typically includes server, storage and network components. The
abstraction layer consists of the software deployed across the physical layer, which
manifests the essential cloud characteristics. Conceptually the abstraction layer sits
above the physical layer.


Service Models:
Software as a Service (SaaS): The capability provided to the consumer is to use
the providers applications running on a cloud infrastructure.
The applications are accessible from various client devices through either a thin client
interface, such as a web browser (e.g., web-based email), or a program interface. The
consumer does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, storage, or even individual application capabilities,
with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): The capability provided to the consumer is to deploy onto
the cloud infrastructure consumer-created or acquired applications created using
programming languages, libraries, services, and tools supported by the provider.
The consumer does not manage or control the underlying cloud infrastructure
including network, servers, operating systems, or storage, but has control over the
deployed applications and possibly configuration settings for the application-hosting
environment.
Infrastructure as a Service (IaaS): The capability provided to the consumer is to
provision processing, storage, networks, and other fundamental computing
resources where the consumer is able to deploy and run arbitrary software, which
can include operating systems and applications. The consumer does not manage or
control the underlying cloud infrastructure but has control over operating systems,
storage, and deployed applications; and possibly limited control of select networking
components (e.g., host firewalls).

Deployment Models:
Private cloud. The cloud infrastructure is provisioned for exclusive use by a single
organization comprising multiple consumers (e.g., business units). It may be owned,
managed, and operated by the organization, a third party, or some combination of
them, and it may exist on or off premises.
Community cloud. The cloud infrastructure is provisioned for exclusive use by a
specific community of consumers from organizations that have shared concerns (e.g.,
mission, security requirements, policy, and compliance considerations). It may be
owned, managed, and operated by one or more of the organizations in the
community, a third party, or some combination of them, and it may exist on or off
premises.
Public cloud. The cloud infrastructure is provisioned for open use by the general public.
It may be owned, managed, and operated by a business, academic, or government
organization, or some combination of them. It exists on the premises of the cloud
provider.
Hybrid cloud. The cloud infrastructure is a composition of two or more distinct
cloud infrastructures (private, community, or public) that remain unique entities, but are
bound together by standardized or proprietary technology that enables data and
application portability (e.g., cloud bursting for load balancing between clouds).
LECTURE 2: Introducing Windows
Azure

Azure Overview
Windows Azure is an open and flexible cloud platform that enables you to quickly build,
deploy and manage applications across a global network of Microsoft-managed datacenters.
You can build applications using any language, tool or framework. And you can integrate
your public cloud applications with your existing IT environment.
Always up. Always on.
Windows Azure delivers a 99.95% monthly SLA and enables you to build and run highly
available applications without focusing on the infrastructure. It provides automatic OS and
service patching, built in network load balancing and resiliency to hardware failure. It
supports a deployment model that enables you to upgrade your application without
downtime.
Open
Windows Azure enables you to use any language, framework, or tool to build applications.
Features and services are exposed using open REST protocols. The Windows Azure client
libraries are available for multiple programming languages, and are released under an open
source license and hosted on GitHub.
Unlimited servers. Unlimited storage.
Windows Azure enables you to easily scale your applications to any size. It is a fully
automated self-service platform that allows you to provision resources within minutes.
Elastically grow or shrink your resource usage based on your needs. You only pay for the
resources your application uses. Windows Azure is available in multiple datacenters around
the world, enabling you to deploy your applications close to your customers.
Powerful Capabilities
Windows Azure delivers a flexible cloud platform that can satisfy any application need. It
enables you to reliably host and scale out your application code within compute roles. You
can store data using relational SQL databases, NoSQL table stores, and unstructured blob
stores, and optionally use Hadoop and business intelligence services to data-mine it. You can
take advantage of Windows Azures robust messaging capabilities to enable scalable
distributed applications, as well as deliver hybrid solutions that run across a cloud and on-
premises enterprise environment. Windows Azures distributed caching and CDN services
allow you to reduce latency and deliver great application performance anywhere in the
world.

Is Your Application a Good Fit for Windows Azure?
If you're considering using Windows Azure to host an application, you might wonder
if your application or business requirements are best served by the platform. This
topic attempts to answer this question by:
Looking at the benefits Windows Azure provides to your application
Applying the strengths of the platform to common scenarios
Rejecting scenarios that do not leverage the strengths of the platform
Examining some common architecture and development considerations
The intent is to provide a framework for thinking about your application and how it
relates to the capabilities of Windows Azure. In many cases, links to additional
resources are provided to improve your ability to analyze your application and make a
decision on how to move to the cloud.
Understand the Benefits of Windows Azure
Before you can determine if your application is well-suited for Windows Azure, you
must first understand some of the main benefits of the platform. A complete list of
benefits can be found in the Windows Azure documentation and many articles and
videos about Windows Azure. One excellent paper on this subject is Cloud
Optimization Expanding Capabilities, while Aligning Computing and Business
Needs.
There are several benefits to having hardware and infrastructure resources managed
for you. Let's look at a few of these benefits at a high level before we discuss
scenarios that take advantage of these features.
Resource Management
When you deploy your application and services to the cloud, Windows Azure provides
the necessary virtual machines, network bandwidth, and other infrastructure
resources. If machines go down for hardware updates or due to unexpected failures,
new virtual machines are automatically located for your application.
Because you only pay for what you use, you can start off with a smaller investment
rather than incurring the typical upfront costs required for an on-premises
deployment. This can be especially useful for small companies. In an on-premises
scenario, these organizations might not have the data center space, IT skills, or
hardware skills necessary to successfully deploy their applications. The automatic
infrastructure services provided by Windows Azure offer a low barrier of entry for
application deployment and management.
Dynamic Scaling
Dynamic scaling refers to the capability to both scale out and scale back your
application depending on resource requirements. This is also referred to as elastic
scale. Before describing how this works, you should understand the basic architecture
of a Windows Azure application. In Windows Azure, you create roles that work
together to implement your application logic. For example, one web role could host
the ASP.NET front-end of your application, and one or more worker roles could
perform necessary background tasks. Each role is hosted on one or more virtual
machines, called role instances, in the Windows Azure data center. Requests are load
balanced across these instances. For more information about roles, see the paper The
Windows Azure Programming Model.
If resource demands increase, new role instances running your application code can
be provisioned to handle the load. When demand decreases, these instances can be
removed so that you don't have to pay for unnecessary computing power. This is
much different from an on-premises deployment where hardware must be over-
provisioned to anticipate peak demands. This scaling does not happen automatically,
but it is easily achieved through either the web portal or the Service Management API.
The paper Dynamically Scaling an Application demonstrates one way to automatically
scale Windows Azure applications. There is also an Autoscaling Application
Block created by the Microsoft Patterns and Practices team.
If your application requires fluctuating or unpredictable demands for computing
resources, Windows Azure allows you to easily adjust your resource utilization to
match the load.
High Availability and Durability
Windows Azure provides a platform for highly available applications that can reliably
store and access backend data through storage services or Windows Azure SQL
Database.
First Windows Azure ensures high availability of your compute resources when you
have multiple instances of each role. Role instances are automatically monitored, so it
is able to respond quickly to hardware restarts or failures by automatically deploying
a role to a new instance.
Second, Windows Azure ensures high availability and durability for data stored
through one of the storage services. Windows Azure storage services replicate all data
to at least three different servers. Similarly, SQL Database replicates all data to
guarantee availability and durability.
Other Windows Azure services provide similar high availability guarantees. For more
information, see theWindows Azure SLA.
Target Scenarios that Leverage the Strengths of Windows
Azure
With an understanding of the strengths of the Windows Azure platform, you can
begin to look at the scenarios that are best suited for the cloud. The following
sections discuss several of these patterns and how Windows Azure is ideally suited for
certain workloads and goals. The video Windows Azure Design Patterns explains
many of the scenarios below and provides a good overview of the Windows Azure
platform.
Tip
Although there is a focus on application scenarios here, understand that you can choose to
use individual services of Windows Azure. For example, if you find that using blob storage
solves an application problem, it is possible that the rest of your application remains outside
of the Cloud. This is called a hybrid application and is discussed later in this topic.
Highly Available Services
Windows Azure is well-suited to hosting highly available services. Consider an online
store deployed in Windows Azure. Because an online store is a revenue generator, it is
critical that it stay running. This is accomplished by the service monitoring and
automatic instance management performed in the Windows Azure data center. The
online store must also stay responsive to customer demand. This is accomplished by
the elastic scaling ability of Windows Azure. During peak shopping times, new
instances can come online to handle the increased usage. In addition, the online store
must not lose orders or fail to completely process placed orders. Windows Azure
storage and SQL Database both provide highly available and durable storage options
to hold the order details and state throughout the order lifecycle.
Periodic Workloads
Another good fit for Windows Azure is some form of an "on and off" workload. Some
applications do not need to run continuously. One simple example of this is a demo
or utility application that you want to make available only for several days or weeks.
Windows Azure allows you to easily create, deploy, and share that application with
the world. But once its purpose is accomplished, you can remove the application and
you are only charged for the time it was deployed.
Note
Note: You must remove the deployment, not just suspend the application, to avoid charges
for compute time.
Also consider a large company that runs complex data analysis of sales numbers at
the end of each month. Although processing-intensive, the total time required to
complete the analysis is at most two days. In an on-premises scenario, the servers
required for this work would be underutilized for the majority of the month. In
Windows Azure, the business would only pay for the time that the analysis application
is running in the cloud. And assuming the architecture of the application is designed
for parallel processing, the scale out features of Windows Azure could enable the
company to create large numbers of worker role instances to complete more complex
work in less time. In this example, you should use code or scripting to automatically
deploy the application at the appropriate time each month.
Unpredictable Growth
All businesses have a goal of rapid and sustainable growth. But growth is very hard to
handle in the traditional on-premises model. If the expected growth does not
materialize, you've spent money maintaining underutilized hardware and
infrastructure. But if growth happens more quickly than expected, you might be
unable to handle the load, resulting in lost business and poor customer experience.
For small companies, there might not even be enough initial capital to prepare for or
keep up with rapid growth.
Windows Azure is ideal for handling this situation. Consider a small sports news web
site that makes money from advertising. The amount of revenue is directly
proportional to the amount of traffic that the site generates. In this example, initial
capital for the venture is limited, and they do not have the money required to setup
and run their own data center. By designing the web site to run on Windows Azure,
they can easily deploy their solution as an ASP.NET application that uses a backend
SQL Database for relational data and blob storage for pictures and videos. If the
popularity of the web site grows dramatically, they can increase the number of web
role instances for their front-end or increase the size of the SQL Database. The blob
storage has built-in scalability features within Windows Azure. If business decreases,
they can remove any unnecessary instances. Because their revenue is proportional to
the traffic on the site, Windows Azure helps them to start small, grow fast, and reduce
risk.
With Windows Azure, you have complete control to determine how aggressively to
manage your computing costs. You could decide to use the Service Management
API or the Autoscaling Application Block to create an automatic scaling engine that
creates and removes instances based on custom rules. You could choose to vary the
number of instances based on a predetermined amount, such as four instances
during business hours versus two instances during non-business hours. Or you could
keep the number of instances constant and only increase them manually through the
web portal as demand increases over time. Windows Azure gives you the flexibility to
make the decisions that are right for your business.
Workload Spikes
This is another workload pattern that requires elastic scale. Consider the previous
example of a sports news web site. Even as their business is steadily growing, there is
still the possibility of temporary spikes or bursts of activity. For example, if they are
referenced by another popular news outlet, the numbers of visitors to their site could
dramatically increase in a single day. In a more predictable scenario, major sporting
events and sports championships will result in more activity on their site.
An alternative example is a service that processes daily reports at the end of the day.
When the business day closes, each office sends in a report which is processed at the
company headquarters. Since the process needs to be active only a few hours each
day, it is also a candidate for elastic scaling and deployment.
Windows Azure is well suited for temporarily scaling out an application to handle
spikes in load and then scaling back again after the event has passed.
Infrastructure Offloading
As demonstrated in the previous examples, many of the most common cloud
scenarios take advantage of the elastic scale of Windows Azure. However, even
applications with steady workload patterns can realize cost savings in Windows Azure.
It is expensive to manage your own data center, especially when you consider the
cost of energy, people-skills, hardware, software licensing, and facilities. It is also hard
to understand how costs are tied to individual applications. In Windows Azure, the
goal is to reduce total costs as well as to make those costs more transparent. The
paper, Cloud Optimization Expanding Capabilities, while Aligning Computing and
Business Needs, does a great job of explaining typical on-premises hosting costs and
how these can be reduced with Windows Azure. Windows Azure also provides a
pricing calculator for understanding specific costs and a TCO (Total Cost of
Ownership) calculator for estimating the overall cost reduction that could occur by
adopting Windows Azure. For links to these calculator tools and other pricing
information, see the Windows Azure web site.
Scenarios that Do Not Require the Capabilities of Windows
Azure
Not all applications should be moved to the cloud. Only applications that benefit
from Windows Azure features should be moved to the cloud.
A good example of this would be a personal blog website intended for friends and
family. A site like this might contain articles and photographs. Although you could use
Windows Azure for this project, there are several reasons why Windows Azure is not
the best choice. First, even though the site might only receive a few hits a day, one
role instance would have to be continuously running to handles those few requests
(note that two instances would be required to achieve the Windows Azure SLA for
compute). In Windows Azure, the cost is based on the amount of time that each role
instance has been deployed (this is known in Windows Azure nomenclature
as compute time); suspending an application does not suspend the consumption of
(and charge for) compute time. Even if this site responded to only one hit during the
day, it would still be charged for 24 hours of compute time. In a sense, this is rented
space on the virtual machine that is running your code. So, at the time of writing this
topic, even one extra small instance of a web role would cost $30 a month. And if 20
GB of pictures were stored in blob storage, that storage plus transactions and
bandwidth could add another $6 to the cost. The monthly cost of hosting this type of
site on Windows Azure is higher than the cost of a simple web hosting solution from
a third party. Most importantly, this type of web site does not require resource
management, dynamic scaling, high availability, and durability.
Windows Azure allows you to choose only the options that are suited to your
business needs. For example, you might find instances in which certain data cannot
be hosted in the cloud for legal or regulatory reasons. In these cases, you might
consider a hybrid solution, where only certain data or specific parts of your
application that are not as sensitive and need to be highly available are hosted in
Windows Azure.
There are other scenarios that are not well-suited to Windows Azure. By
understanding the strengths of Windows Azure, you can recognize applications or
parts of an application that will not leverage these strengths. You can then more
successfully develop the overall solution that most effectively utilizes Windows Azure
capabilities.
Evaluate Architecture and Development
Of course, evaluating a move to Windows Azure involves more than just knowing that
your application or business goals are well-suited for the cloud. It is also important to
evaluate architectural and development characteristics of your existing or new
application. A quick way to start this analysis is to use the Microsoft Assessment
Tool (MAT) for Windows Azure. This tool asks questions to determine the types of
issues you might have in moving to Windows Azure. Next to each question is a link
called "See full consideration", which provides additional information about that
specific area in Windows Azure. These questions and the additional information can
help to identify potential changes to the design of your existing or new application in
the cloud.
In addition to the MAT tool, you should have a solid understanding of the basics of
the Windows Azure platform. This includes an understanding of common design
patterns for the platform. Start by reviewing the Windows Azure videos or reading
some of the introductory white papers, such as The Windows Azure Programming
Model. Then review the different services available in Windows Azure and consider
how they could factor into your solution. For an overview of the Windows Azure
services, see the MSDN documentation.
It is beyond the scope of this paper to cover all of the possible considerations and
mitigations for Windows Azure solutions. However, the following table lists four
common design considerations along with links to additional resources.
Area Description
Hybrid
Solutions
It can be difficult to move complex legacy applications to Windows Azure.
There are also sometimes regulatory concerns with storing certain types of
data in the cloud. However, it is possible to create hybrid solutions that
connect services hosted by Windows Azure with on-premises applications
and data.
There are multiple Windows Azure technologies that support this
capability, including Service Bus, Access Control Service, and Windows
Azure Connect. For a good video on this subject from October 2010,
see Connecting Cloud & On-Premises Apps with the Windows Azure
Platform. For hybrid architecture guidance based on real-world customer
implementations, see Hybrid Reference Implementation Using BizTalk
Server, Windows Azure, Service Bus and Windows Azure SQL Database.
State
Management
If you are moving an existing application to Windows Azure, one of the
biggest considerations is state management. Many on-premises
applications store state locally on the hard drive. Other features, such as
the default ASP.NET session state, use the memory of the local machine for
state management. Although your roles have access to their virtual
machine's local drive space and memory, Windows Azure load balances all
requests across all role instances. In addition, your role instance could be
taken down and moved at any time (for example, when the machine
running the role instance requires an update).
This dynamic management of running role instances is important for the
scalability and availability features of Windows Azure. Consequently,
application code in the cloud must be designed to store data and state
remotely using services such as Windows Azure storage or SQL Database.
For more information about storage options, see the resources in the Store
and Access Data section of the Windows Azure web site.
Storage
Requirements
SQL Database is the relational database solution in Windows Azure. If you
currently use SQL Server, the transition to SQL Database should be easier.
If you are migrating from another type of database system, there are SQL
Server Migration Assistants that can help with this process. For more
information on migrating data to SQL Database, see Data Migration to
Windows Azure SQL Database: Tools and Techniques.
Also consider Windows Azure storage for durable, highly available, and
scalable data storage. One good design pattern is to effectively combine
the use of SQL Database and Windows Azure storage tables, queues, and
blobs. A common example is to use SQL Database to store a pointer to a
blob in Windows Azure storage rather than storing the large binary object
in the database itself. This is both efficient and cost-effective. For a
discussion of storage options, see the article onData Storage Offerings on
the Windows Azure Platform.
Interoperability The easiest application to design or port to Windows Azure is a .NET
application. The Windows Azure SDK and tools for Visual Studio greatly
simplify the process of creating Windows Azure applications.
But what if you are using open source software or third-party development
languages and tools? The Windows Azure SDK uses a REST API that is
interoperable with many other languages. Of course, there are some
challenges to address depending on your technology. For some
technologies, you can choose to use a stub .NET project in Visual Studio
and overload the Run method for your role. Microsoft provides Windows
Azure SDKs for Java and Node.js that you can use to develop and deploy
applications. There are also community-created SDKs that interact with
Windows Azure. A great resource in this area is the Ineroperability Bridges
and Labs Center.
Deploying projects that use open source software can also be a challenge.
For example, the following blog post discusses options for deploying Ruby
applications on Windows
Azure:http://blogs.msdn.com/b/silverlining/archive/2011/08/29/deploying-
ruby-java-python-and-node-js-applications-to-windows-azure.aspx.
The important point is that Windows Azure is accessible from a variety of
languages, so you should look into the options for your particular
language of choice before determining whether the application is a good
candidate for Windows Azure.
Beyond these issues, you can learn a lot about potential development challenges and
solutions by reviewing content on migrating applications to Windows Azure. The
Patterns and Practices group at Microsoft published the following guidance on
migration: Moving Applications to the Cloud on the Microsoft Windows Azure
Platform. You can find additional resources on migration from the Windows Azure
web site: Migrate Services and Data.
Summary
Windows Azure offers a platform for creating and managing highly scalable and
available services. You pay only for the resources that you require and then scale
them up and down at any time. And you don't have to own the hardware or
supporting infrastructure to do this. If your business can leverage the platform to
increase agility, lower costs, or lower risk, then Windows Azure is a good fit for your
application. After making this determination, you can then look at specific
architecture and development options for using the platform. This includes decisions
about new development, migration, or hybrid scenarios. At the end of this analysis,
you should have the necessary information to make an informed decision about how
to most effectively use Windows Azure to reach your business goals.

LECTURE 3:
Main Components of Windows Azure
Windows Azure is Microsoft's application platform for the public cloud. You can use this
platform in many different ways. For instance, you can use Windows Azure to build a web
application that runs and stores its data in Microsoft datacenters. You can use Windows
Azure just to store data, with the applications that use this data running on-premises (that is,
outside the public cloud). You can use Windows Azure to create virtual machines for
development and test or to run SharePoint and other applications. You can use Windows
Azure to build massively scalable applications with lots and lots of users. Because the
platform offers a wide range of services, all of these things-and more-are possible.
To do any of them, though, you need to understand the basics. Even if you don't know
anything about cloud computing, this article will walk you through the fundamentals of
Windows Azure. The goal is to give you a foundation for understanding and using this cloud
platform.
Table of Contents
The Components of Windows Azure
Execution Models
Data Management
Networking
Business Analytics
Messaging
Caching
Identity
High-Performance Computing (HPC)
Media
Commerce
SDKs
Getting Started
The Components of Windows Azure
To understand what Windows Azure offers, it's useful to group its services into distinct
categories. Figure 1 shows one way to do this.

Figure 1: Windows Azure provides Internet-accessible application services running in
Microsoft datacenters.
To get started with Windows Azure, you need to know at least the basics about each of its
components. The rest of this article walks through the technologies shown in the figure,
describing what each one offers and when you might use it.
Execution Models
One of the most basic things a cloud platform does is execute applications. Windows Azure
provides three options for doing this, as Figure 2 shows.

Figure 2: Windows Azure provides Infrastructure as a Service (IaaS), web hosting, and
Platform as a Service (PaaS).
Each of these three approaches-Virtual Machines, Web Sites, and Cloud Services-can be used
separately. You can also combine them to create an application that uses two or more of
these options together.
Virtual Machines
The ability to create a virtual machine on demand, whether from a standard image or from
one you supply, can be very useful. Add the ability to pay for this VM by the hour, and it's
even more useful. This approach, commonly known as Infrastructure as a Service (IaaS), is
what Windows Azure Virtual Machines provides.
To create a VM, you specify which VHD to use and the VM's size. You then pay for each hour
the VM is running. As Figure 2 shows, Windows Azure Virtual Machines offers a gallery of
standard VHDs. These include Microsoft-provided options, such as Windows Server 2008 R2,
Windows Server 2012, and Windows Server 2008 R2 with SQL Server, along with Linux
images provided by Microsoft partners. You're free to upload and create VMs from your own
VHDs as well.
Wherever the image comes from, you can persistently store any changes made while a VM is
running. The next time you create a VM from that VHD, things pick up where you left off. It's
also possible to copy the changed VHD out of Windows Azure, then run it locally.
Windows Azure VMs can be used in many different ways. You might use them to create an
inexpensive development and test platform that you can shut down when you've finished
using it. You might also create and run applications that use whatever languages and
libraries you like. Those applications can use any of the data management options that
Windows Azure provides, and you can also choose to use SQL Server or another DBMS
running in one or more virtual machines. Another option is to use Windows Azure VMs as an
extension of your on-premises datacenter, running SharePoint or other applications. To
support this, it's possible to create Windows domains in the cloud by running Active
Directory in Windows Azure VMs. This quite general approach to cloud computing can be
used to address many different problems. What you do is up to you.
Web Sites
One of the most common things that people do in the cloud is run websites and web
applications. Windows Azure Virtual Machines allows this, but it still leaves you with the
responsibility of administering one or more VMs. What if you just want a website where
somebody else takes care of the administrative work for you?
This is exactly what Windows Azure Web Sites provides. This execution model offers a
managed web environment using Internet Information Services (IIS). You can move an
existing IIS website into Windows Azure Web Sites unchanged, or you can create a new one
directly in the cloud. Once a website is running, you can add or remove instances
dynamically, relying on Web Sites to load balance requests across them. And as Figure 2
shows, Windows Azure Web Sites offers both a shared option, where your website runs in a
virtual machine with other sites, and a way for a site to run in its own VM.
Windows Azure Web Sites is intended to be useful for both developers and web design
agencies. For development, it supports .NET, PHP, and Node.js, along with SQL Database and
(from ClearDB, a Microsoft partner) MySQL for relational storage. It also provides built-in
support for several popular applications, including WordPress, Joomla, and Drupal. The goal
is to provide a low-cost, scalable, and broadly useful platform for creating websites and web
applications in the public cloud.
Cloud Services
Suppose you want to build a cloud application that can support lots of simultaneous users,
doesn't require much administration, and never goes down. You might be an established
software vendor, for example, that's decided to embrace Software as a Service (SaaS) by
building a version of one of your applications in the cloud. Or you might be a start-up
creating a consumer application that you expect will grow fast. If you're building on Windows
Azure, which execution model should you use?
Windows Azure Web Sites allows creating this kind of web application, but there are some
constraints. You don't have administrative access, for example, which means that you can't
install arbitrary software. Windows Azure Virtual Machines gives you lots of flexibility,
including administrative access, and you certainly can use it to build a very scalable
application, but you'll have to handle many aspects of reliability and administration yourself.
What you'd like is an option that gives you the control you need but also handles most of the
work required for reliability and administration.
This is exactly what's provided by Windows Azure Cloud Services. This technology is
designed expressly to support scalable, reliable, and low-admin applications, and it's an
example of what's commonly called Platform as a Service (PaaS). To use it, you create an
application using the technology you choose, such as C#, Java, PHP, Python, Node.js, or
something else. Your code then executes in virtual machines (referred to as instances)
running a version of Windows Server.
But these VMs are distinct from the ones you create with Windows Azure Virtual Machines.
For one thing, Windows Azure itself manages them, doing things like installing operating
system patches and automatically rolling out new patched images. (This implies that your
application shouldn't maintain state in web or worker role instances; it should instead be
kept in one of the Windows Azure data management options described in the next section.)
Windows Azure also monitors the VMs, restarting any that fail.
As Figure 2 shows, you have two roles to choose from when you create an instance, both
based on Windows Server. The main difference between the two is that an instance of a web
role runs IIS, while an instance of a worker role does not. Both are managed in the same way,
however, and it's common for an application to use both. For example, a web role instance
might accept requests from users, then pass them to a worker role instance for processing.
To scale your application up or down, you can request that Windows Azure create more
instances of either role or shut down existing instances. And just like Windows Azure Virtual
Machines, you're charged by the hour for each web or worker role instance.
Each of the three Windows Azure execution models has its own role to play. Windows Azure
Virtual Machines provides a general-purpose computing environment, Windows Azure Web
Sites offers low-cost web hosting, and Windows Azure Cloud Services is the best choice for
creating scalable, reliable applications with low administration costs. And as mentioned
earlier, you can use these technologies separately or combine them as needed to create the
right foundation for your application. The approach you choose depends on what problems
you're trying to solve.
Data Management
Applications need data, and different kinds of applications need different kinds of data.
Because of this, Windows Azure provides several different ways to store and manage data.
One of these has already been mentioned: the ability to run SQL Server or another DBMS in a
VM created with Windows Azure Virtual Machines. (It's important to realize that this option
isn't limited to relational systems; you're also free to run NoSQL technologies such as
MongoDB and Cassandra.) Running your own database system is straightforward-it replicates
what we're used to in our own datacenters-but it also requires handling the administration of
that DBMS. To make life easier, Windows Azure provides three data management options
that are largely managed for you. Figure 3 shows the choices.

Figure 3: For data management, Windows Azure provides relational storage, scalable
NoSQL tables, and unstructured binary storage.
Each of the three options addresses a different need: relational storage, fast access to
potentially large amounts of simple typed data, and unstructured binary storage. In all three
cases, data is automatically replicated across three different computers in a Windows Azure
datacenter to provide high availability. It's also worth pointing out that all three options can
be accessed either by Windows Azure applications or by applications running elsewhere,
such as your on-premises datacenter, your laptop, or your phone. And however you apply
them, you pay for all Windows Azure data management services based on usage, including a
gigabyte-per-month charge for stored data.
SQL Database
For relational storage, Windows Azure provides SQL Database. Formerly called SQL Azure,
SQL Database provides all of the key features of a relational database management system,
including atomic transactions, concurrent data access by multiple users with data integrity,
ANSI SQL queries, and a familiar programming model. Like SQL Server, SQL Database can be
accessed using Entity Framework, ADO.NET, JDBC, and other familiar data access
technologies. It also supports most of the T-SQL language, along with SQL Server tools such
as SQL Server Management Studio. For anybody familiar with SQL Server (or another
relational database), using SQL Database is straightforward.
But SQL Database isn't just a DBMS in the cloud-it's a PaaS service. You still control your data
and who can access it, but SQL Database takes care of the administrative grunt work, such as
managing the hardware infrastructure and automatically keeping the database and operating
system software up to date. SQL Database also provides a federation option that distributes
data across multiple servers. This is useful for applications that work with large amounts of
data or need to spread data access requests across multiple servers for better performance.
If you're creating a Windows Azure application (using any of the three execution models) that
needs relational storage, SQL Database can be a good option. Applications running outside
the cloud can also use this service, though, so there are plenty of other scenarios. For
instance, data stored in SQL Database can be accessed from different client systems,
including desktops, laptops, tablets, and phones. And because it provides built-in high
availability through replication, using SQL Database can help minimize downtime.
Tables
Suppose you want to create a Windows Azure application that needs fast access to typed
data, maybe lots of it, but doesn't need to perform complex SQL queries on this data. For
example, imagine you're creating a consumer application that needs to store customer
profile information for each user. Your app is going to be very popular, so you need to allow
for lots of data, but you won't do much with this data beyond storing it, then retrieving it in
simple ways. This is exactly the kind of scenario where Windows Azure Tables makes sense.
Don't be confused by the name: this technology doesn't provide relational storage. (In fact,
it's an example of a NoSQL approach called a key/value store.) Instead, Windows Azure
Tables let an application store properties of various types, such as strings, integers, and
dates. An application can then retrieve a group of properties by providing a unique key for
that group. While complex operations like joins aren't supported, tables offer fast access to
typed data. They're also very scalable, with a single table able to hold as much as a terabyte
of data. And matching their simplicity, tables are usually less expensive to use than SQL
Database's relational storage.
Blobs
The third option for data management, Windows Azure Blobs, is designed to store
unstructured binary data. Like Tables, Blobs provides inexpensive storage, and a single blob
can be as large as one terabyte. An application that stores video, for example, or backup data
or other binary information can use blobs for simple, cheap storage. Windows Azure
applications can also use Windows Azure drives, which let blobs provide persistent storage
for a Windows file system mounted in a Windows Azure instance. The application sees
ordinary Windows files, but the contents are actually stored in a blob.
Networking
Windows Azure runs today in several datacenters spread across the United States, Europe,
and Asia. When you run an application or store data, you can select one or more of these
datacenters to use. You can also connect to these datacenters in various ways:
You can use Windows Azure Virtual Network to connect your own on-premises local network
to a defined set of Windows Azure VMs.
You can use Windows Azure Connect to link one or more on-premises Windows servers to a
specific Windows Azure application.
If your Windows Azure application is running in multiple datacenters, you can use Windows
Azure Traffic Manager to route requests from users intelligently across instances of the
application.
Figure 4 illustrates these three options.

Figure 4: Windows Azure allows creating a cloud VPN, connecting a Windows Azure
application to on-premises machines, and intelligently distributing user requests across
different datacenters.
Virtual Network
One useful way to use a public cloud is to treat it as an extension of your own datacenter.
Because you can create VMs on demand, then remove them (and stop paying) when they're
no longer needed, you can have computing power only when you want it. And since
Windows Azure Virtual Machines lets you can create VMs running SharePoint, Active
Directory, and other familiar on-premises software, this approach can work with the
applications you already have.
To make this really useful, though, your users ought to be able to treat these applications as
if they were running in your own datacenter. This is exactly what Windows Azure Virtual
Network allows. Using a VPN gateway device, an administrator can set up a virtual private
network (VPN) between your local network and a defined group of VMs running in Windows
Azure. Because you assign your own IP v4 addresses to the cloud VMs, they appear to be on
your own network. Users in your organization can access the applications those VMs contain
as if they were running locally.
Connect
Creating a VPN between your local network and a group of VMs in the cloud is useful, but it
also requires VPN gateway hardware and the services of a network administrator. Suppose
you're a developer who just wants to connect a single Windows Azure application to a
specific group of Windows machines within your organization. Perhaps you've built a Cloud
Services application that needs to access a database on one of those servers, for example,
and you don't want to go to the trouble of configuring a VPN gateway.
Windows Azure Connect is designed for this situation. Connect provides a simple way to
establish a secure connection between a Windows Azure application and a group of
computers running Windows. A developer just installs the Connect software on the on-
premises machines-there's no need to involve a network administrator-and configures the
Windows Azure application. Once this is done, the application can communicate with the on-
premises computers as if they were on the same local network.
Traffic Manager
A Windows Azure application with users in just a single part of the world might run in only
one Windows Azure datacenter. An application with users scattered around the world,
however, is more likely to run in multiple datacenters, maybe even all of them. In this second
situation, you face a problem: How do you intelligently assign users to application instances?
Most of the time, you probably want each user to access the datacenter closest to her, since
it will likely give her the best response time. But what if that copy of the application is
overloaded or unavailable? In this case, it would be nice to route her request automatically to
another datacenter. This is exactly what's done by Windows Azure Traffic Manager.
The owner of an application defines rules that specify how requests from users should be
routed to datacenters, then relies on Traffic Manager to carry out these rules. For example,
users might normally be routed to the closest Windows Azure datacenter, but get sent to
another one when the response time from their default datacenter exceeds a certain
threshold. For globally distributed applications with many users, having a built-in service to
handle problems like these is useful.
Business Analytics
Analyzing data is a fundamental part of how businesses use information technology. A cloud
platform provides a pool of on-demand, pay-per-use resources, which makes it a good
foundation for this kind of computing. Accordingly, Windows Azure provides two options for
business analytics. Figure 5 illustrates the choices.

Figure 5: For business analytics, Windows Azure provides reporting and support for big
data.
Analyzing data can take many forms, and so these two options are quite different. It's worth
looking at each one separately.
SQL Reporting
One of the most common ways to use stored data is to create reports based on that data. To
let you do this with data in SQL Database, Windows Azure provides SQL Reporting. A subset
of the reporting services included with SQL Server, SQL Reporting lets you build reporting
into applications running on Windows Azure or on premises. The reports you create can be in
various formats, including HTML, XML, PDF, Excel, and others, and they can be embedded in
applications or viewed via a web browser.
Another option for doing analytics with SQL Database data is to use on-premises business
intelligence tools. To a client, SQL Database looks like SQL Server, and so the same
technologies can work with both. For example, you're free to use on-premises SQL Server
Reporting Services to create reports from SQL Database data.
Hadoop
For many years, the bulk of data analysis has been done on relational data stored in a data
warehouse built with a relational DBMS. This kind of business analytics is still important, and
it will be for a long time to come. But what if the data you want to analyze is so big that
relational databases just can't handle it? And suppose the data isn't relational? It might be
server logs in a datacenter, for example, or historical event data from sensors, or something
else. In cases like this, you have what's known as a big data problem. You need another
approach.
The dominant technology today for analyzing big data is Hadoop. An Apache open source
project, this technology stores data using the Hadoop Distributed File System (HDFS), then
lets developers create MapReduce jobs to analyze that data. HDFS spreads data across
multiple servers, then runs chunks of the MapReduce job on each one, letting the big data be
processed in parallel.
As Figure 5 suggests, the Apache Hadoop-based Service for Windows Azure lets HDFS
distribute data across multiple virtual machines, then spreads the logic of a MapReduce job
across those VMs. Just as with on-premises Hadoop, data is processed locally-the logic and
the data it works on are in the same VM-and in parallel for better performance. The Apache
Hadoop-based Service for Windows Azure supports other components of the technology as
well, including Hive and Pig, and Microsoft has also created an Excel plug-in for issuing Hive
queries.
Messaging
No matter what it's doing, code frequently needs to interact with other code. In some
situations, all that's needed is basic queued messaging. In other cases, more complex
interactions are required. Windows Azure provides a few different ways to solve these
problems. Figure 6 illustrates the choices.

Figure 6: For connecting applications, Windows Azure provides queues,
publish/subscribe, and synchronous connections via the cloud.
Queues
Queuing is a simple idea: One application places a message in a queue, and that message is
eventually read by another application. If your application needs just this straightforward
service, Windows Azure Queues might be the best choice.
One common use of Queues today is to let a web role instance communicate with a worker
role instance within the same Cloud Services application. For example, suppose you create a
Windows Azure application for video sharing. The application consists of PHP code running
in a web role that lets users upload and watch videos, together with a worker role
implemented in C# that translates uploaded video into various formats. When a web role
instance gets a new video from a user, it can store the video in a blob, then send a message
to a worker role via a queue telling it where to find this new video. A worker role instance-it
doesn't matter which one-will then read the message from the queue and carry out the
required video translations in the background. Structuring an application in this way allows
asynchronous processing, and it also makes the application easier to scale, since the number
of web role instances and worker role instances can be varied independently.
Service Bus
Whether they run in the cloud, in your data center, on a mobile device, or somewhere else,
applications need to interact. The goal of Windows Azure Service Bus is to let applications
running pretty much anywhere exchange data.
As Figure 6 shows, Service Bus provides a queuing service. This service isn't identical to the
Queues just described, however. Unlike Windows Azure Queues, for example, Service Bus
provides a publish-and-subscribe mechanism. An application can send messages to a topic,
while other applications can create subscriptions to this topic. This allows one-to-many
communication among a set of applications, letting the same message be read by multiple
recipients. And queuing isn't the only option: Service Bus also allows direct communication
through its relay service, providing a secure way to interact through firewalls.
Applications that communicate through Service Bus might be Windows Azure applications or
software running on some other cloud platform. They can also be applications running
outside the cloud, however. For example, think of an airline that implements reservation
services in computers inside its own datacenter. The airline needs to expose these services to
many clients, including check-in kiosks in airports, reservation agent terminals, and maybe
even customers' phones. It might use Service Bus to do this, creating loosely coupled
interactions among the various applications.
Caching
Applications tend to access the same data over and over. One way to improve performance is
to keep a copy of that data closer to the application, minimizing the time needed to retrieve
it. Windows Azure provides two different services for doing this: in-memory caching of data
used by Windows Azure applications and a content delivery network (CDN) that caches blob
data on disk closer to its users. Figure 7 shows both.

Figure 7: A Windows Azure application can cache data in memory, and copies of a blob
can be cached at sites around the world.
Caching
Accessing data stored in any of Windows Azure's data management services-SQL Database,
Tables, or Blobs-is quite fast. Yet accessing data stored in memory is even faster. Because of
this, keeping an in-memory copy of frequently accessed data can improve application
performance. You can use Windows Azure's in-memory Caching to do this.
A Cloud Services application can store data in this cache, then retrieve it directly without
needing to access persistent storage. As Figure 7 shows, the cache can be maintained inside
your application's VMs or be provided by VMs dedicated solely to caching. In either case, the
cache can be distributed, with the data it contains spread across multiple VMs in a Windows
Azure datacenter.
An application that repeatedly reads a product catalog might benefit from using this kind of
caching, for example, since the data it needs will be available more quickly. The technology
also supports locking, letting it be used with read/write as well as read-only data. And
ASP.NET applications can use the service to store session data with just a configuration
change.
CDN
Suppose you need to store blob data that will be accessed by users around the world. Maybe
it's a video of the latest World Cup match, for instance, or driver updates, or a popular e-
book. Storing a copy of the data in multiple Windows Azure datacenters will help, but if there
are lots of users, it's probably not enough. For even better performance, you can use the
Windows Azure CDN.
The CDN has dozens of sites around the world, each capable of storing copies of Windows
Azure blobs. The first time a user in some part of the world accesses a particular blob, the
information it contains is copied from a Windows Azure datacenter into local CDN storage in
that geography. After this, accesses from that part of the world will use the blob copy cached
in the CDN-they won't need to go all the way to the nearest Windows Azure datacenter. The
result is faster access to frequently accessed data by users anywhere in the world.
Identity
Working with identity is part of most applications. For example, knowing who a user is lets an
application decide how it should interact with that user. To help you do this, Microsoft
provides Windows Azure Active Directory.
Like most directory services, Windows Azure Active Directory stores information about users
and the organizations they belong to. It lets users log in, then supplies them with tokens they
can present to applications to prove their identity. It also allows synchronizing user
information with Windows Server Active Directory running on premises in your local network.
While the mechanisms and data formats used by Windows Azure Active Directory arent
identical with those used in Windows Server Active Directory, the functions it performs are
quite similar.
It's important to understand that Windows Azure Active Directory is designed primarily for
use by cloud applications. It can be used by applications running on Windows Azure, for
example, or on other cloud platforms. It's also used by Microsoft's own cloud applications,
such as those in Office 365. If you want to extend your datacenter into the cloud using
Windows Azure Virtual Machines and Windows Azure Virtual Network, however, Windows
Azure Active Directory isn't the right choice. Instead, you'll want to run Windows Server
Active Directory in cloud VMs, as described earlier.
To let applications access the information it contains, Windows Azure Active Directory
provides a RESTful API called Windows Azure Active Directory Graph. This API lets
applications running on any platform access directory objects and the relationships among
them. For example, an authorized application might use this API to learn about a user, the
groups he belongs to, and other information. Applications can also see relationships between
users-their social graph-letting them work more intelligently with the connections among
people.
Another capability of this service, Windows Azure Active Directory Access Control, makes it
easier for an application to accept identity information from Facebook, Google, Windows Live
ID, and other popular identity providers. Rather than requiring the application to understand
the diverse data formats and protocols used by each of these providers, Access Control
translates all of them into a single common format. It also lets an application accept logins
from one or more Active Directory domains. For example, a vendor providing a SaaS
application might use Windows Azure Active Directory Access Control to give users in each
of its customers single sign-on to the application.
Directory services are a core underpinning of on-premises computing. It shouldn't be
surprising that they're also important in the cloud.
High-Performance Computing
One of the most attractive ways to use a cloud platform is for high-performance computing
(HPC), The essence of HPC is executing code on many machines at the same time. On
Windows Azure, this means running many virtual machines simultaneously, all working in
parallel to solve some problem. Doing this requires some way to schedule applications, i.e.,
to distribute their work across these instances. To allow this, Windows Azure provides the
HPC Scheduler.
This component can work with HPC applications built to use the industry-standard Message
Passing Interface (MPI). Software that does finite element analysis, such as car crash
simulations, is one example of this type of application, and there are many others. The HPC
Scheduler can also be used with so-called embarrassingly parallel applications, such as
Monte Carlo simulations. Whatever problem is addressed, the value it provides is the same:
The HPC Scheduler handles the complex problem of scheduling parallel computing work
across many Windows Azure virtual machines. The goal is to make it easier to build HPC
applications running in the cloud.
Media
Video makes up a large part of Internet traffic today, and that percentage will be even larger
tomorrow. Yet providing video on the web isn't simple. There are lots of variables, such as
the encoding algorithm and the display resolution of the user's screen. Video also tends to
have bursts in demand, like a Saturday night spike when lots of people decide they'd like to
watch an online movie.
Given its popularity, it's a safe bet that many new applications will be created that use video.
Yet all of them will need to solve some of the same problems, and making each one solve
those problems on its own makes no sense. A better approach is to create a platform that
provides common solutions for many applications to use. And building this platform in the
cloud has some clear advantages. It can be broadly available on a pay-as-you-go basis, and it
can also handle the variability in demand that video applications often face.
Windows Azure Media Services addresses this problem. It provides a set of cloud
components that make life easier for people creating and running applications using video
and other media. Figure 8 illustrates the technology.

Figure 8: Media Services is a platform for applications that provide video and other
media to clients around the world.
As the figure shows, Media Services provides a set of components for applications that work
with video and other media. For example, it includes a media ingest component to upload
video into Media Services (where it's stored in Windows Azure Blobs), an encoding
component that supports various video and audio formats, a content protection component
that provides digital rights management, a component for inserting ads into a video stream,
components for streaming, and more. Microsoft partners can also provide components for
the platform, then have Microsoft distribute those components and bill on their behalf.
Applications that use this platform can run on Windows Azure or elsewhere. For example, a
desktop application for a video production house might let its users upload video to Media
Services, then process it in various ways. Alternatively, a cloud-based content management
service running on Windows Azure might rely on Media Services to process and distribute
video. Wherever it runs and whatever it does, each application chooses which components it
needs to use, accessing them through RESTful interfaces.
To distribute what it produces, an application can use the Windows Azure CDN, another
CDN, or just send bits directly to users. However it gets there, video created using Media
Services can be consumed by various client systems, including Windows, Macintosh, HTML 5,
iOS, Android, Windows Phone, Flash, and Silverlight. The goal is to make it easier to create
modern media applications.
Commerce
The rise of Software as a Service is transforming how we create applications. It's also
transforming how we sell applications. Since a SaaS application lives in the cloud, it makes
sense that its potential customers should look for solutions online. And this change applies
to data as well as to applications. Why shouldn't people look to the cloud for commercially
available datasets? Microsoft addresses both of these concerns with Windows Azure
Marketplace, illustrated in Figure 9.

Figure 9: Windows Azure Marketplace lets you find and buy Windows Azure
applications and commercial datasets.
Potential customers can search the Marketplace to find Windows Azure applications that
meet their needs, then sign up to use them either through the application's creator or
directly through the Marketplace. Customers can search the Marketplace for commercial
datasets as well, including demographic data, financial data, geographic data, and more.
When they find something they like, they can access it either from the vendor or directly
through the Marketplace. Applications can also use the Bing Search API through the
Marketplace, giving them access to the results of web searches.
SDKs
Back in 2008, the very first pre-release version of Windows Azure supported only .NET
development. Today, however, you can create Windows Azure applications in pretty much
any language. Microsoft currently provides language-specific SDKs for .NET, Java, PHP,
Node.js, and Python. There's also a general Windows Azure SDK that provides basic support
for any language, such as C++.
These SDKs help you build, deploy, and manage Windows Azure applications. They're
available either from www.windowsazure.com or GitHub, and they can be used with Visual
Studio and Eclipse. Windows Azure also offers command line tools that developers can use
with any editor or development environment, including tools for deploying applications to
Windows Azure from Linux and Macintosh systems.
Along with helping you build Windows Azure applications, these SDKs also provide client
libraries that help you create software running outside the cloud that uses Windows Azure
services. For example, you might build an application running at a hoster that relies on
Windows Azure blobs, or create a tool that deploys Windows Azure applications through the
Windows Azure management interface.
Getting Started
Now that you have the big-picture, the next step is to write your first Windows Azure
application. Choose your language, get the appropriate SDK, and go for it. Cloud computing
is the new default--get started now.
Failsafe: Guidance for Resilient Cloud
Architectures
11 out of 11 rated this helpful - Rate this topic
Authors: Marc Mercuri, Ulrich Homann, and Andrew Townhill
Publish date: November 2012
Introduction
Fail-safe noun. Something designed to work or function automatically to prevent breakdown of a
mechanism, system, or the like.
Individuals - whether in the context of employee, citizen, or consumer demand instant access to
application, compute and data services. The number of people connected and the devices they use to
connect to these services are ever growing. In this world of always-on services, the systems that support
them must be designed to be both available and resilient.
The Fail-Safe initiative within Microsoft is intended to deliver general guidance for building resilient cloud
architectures, guidance for implementing those architectures on Microsoft technologies, and recipes for
implementing these architectures for specific scenarios. The authors of this document are part of Microsofts
Services organization, with contributions from and collaboration with members of the Customer Advisory
Team (CAT) within the product group.
This document focuses on the architectural considerations for designing scalable and resilient systems.
This paper is organized into the following sections:
Decompose the Application by Workload: Defining how a workload-centric approach provides better
controls over costs, more flexibility in choosing technologies best suited to the workload, and enables a
more finely tuned approach to availability and resiliency.
Establish a Lifecycle Model: Establishing an application lifecycle model helps define the expected behavior of
an application in production and will provide requirements and insight for the overall architecture.
Establish an Availability Model and Plan: The availability model identifies the level of availability that is
expected for your workload. It is critical as it will inform many of the decisions youll make when establishing
your service.
Identify Failure Points and Failure Modes: To create a resilient architecture, its important to understand and
identify failure points and modes. Specifically, making a proactive effort to understand and document what
can cause an outage will establish an outline that can be used in analysis and planning.
Resiliency Patterns and Considerations: This section represents the majority of the document, and contains
key considerations across compute, storage, and platform services. These considerations focus on proven
practices to deliver a healthy application at key considerations across compute, storage, and platform
services.
Design for Operations: In a world that expects services to be always on, its important that services be
designed for operations. This section looks at proven practices for designing for operations that span the
lifecycle, including establishing a health model to implementing telemetry to visualizing that telemetry
information for the operations and developer audiences.
Decompose the Application by Workload
Applications are typically composed of multiple workloads. When looking at Microsoft products, you can see
that products such as SharePoint and Windows Server are designed with this principle in mind.
Different workloads can, and often do, have different requirements, different levels of criticality to the
business, and different levels of financial consideration associated with them. By decomposing an application
into workloads, an organization provides itself with valuable flexibility. A workload-centric approach provides
better controls over costs, more flexibility in choosing technologies best suited to the workload, workload
specific approaches to availability and security, flexibility and agility in adding and deploying new
capabilities, etc.
Scenarios
When thinking about resiliency, its sometimes helpful to do so in the context of scenarios. The following are
examples of typical scenarios:
Scenario 1 Sports Data Service

A customer provides a data service that provides sports information. The service has two primary workloads.
The first provides statistics for the player and teams. The second provides scores and commentary for games
that are currently in progress.
Scenario 2 E-Commerce Web Site

An online retailer sells goods via a website in a well-established model. The application has a number of
workloads, with the most popular being search and browse and checkout.
Scenario 3 Social

A high profile social site allows members of a community to engage in shared experiences around forums,
user generated content, and casual gaming. The application has a number of workloads, including
registration, search and browse, social interaction, gaming, email, etc.
Scenario 4 - Web

An organization wishes to provide an experience to customers via its web site. The application needs to
deliver experiences on both PC-based browsers as well as popular mobile device types (phone, tablet) The
application has a number of workloads including registration, search and browse, content publishing, social
commenting, moderation, gaming, etc.
Example of Decomposing by Workload
Lets take a closer look at one of the scenarios and decompose it into its child workloads. Scenario #2, an
ecommerce web site, could have a number of workloads browse & search, checkout & management, user
registration, user generated content (reviews and rating), personalization, etc.
Example definitions of two of the core workloads for the scenario would be:
Browse & Search enables customers to navigate through a product catalog, search for specific items, and
perhaps manage baskets or wish lists. This workload can have attributes such as anonymous user access,
sub-second response times, and caching. Performance degradation may occur in the form of increased
response times with unexpected user load or application-tolerant interrupts for product inventory refreshes.
In those cases, the application may choose to continue to serve information from the cache.
Checkout & Management helps customers place, track, and cancel orders; select delivery methods and
payment options; and manage profiles. This workload can have attributes such as secure access, queued
processing, access to third-party payment gateways, and connectivity to back-end on-premise systems.
While the application may tolerate increased response time, it may not tolerate loss of orders; therefore, it is
designed to guarantee that customer orders are always accepted and captured, regardless of whether the
application can process the payment or arrange delivery.
Establish a Lifecycle Model
An application lifecycle model defines the expected behavior of an application when operational. At different
phases and times, an application will put different demands on the system whether at a functional or scale
level. The lifecycle model(s) will reflect this.
Workloads should have defined lifecycle models for all relevant and applicable scenarios. Services may have
hourly, daily, weekly, or seasonal lifecycle differences that, when modeled, identify specific capacity,
availability, performance, and scalability requirements over time.
Many services will have a minimum of two applicable models, particularly if service demand bursts in a
predictable fashion. Whether its a spike related to peak demand during a holiday period, increased filing of
tax returns just before their due date, morning and afternoon commuter time windows, or end-of-year filing
of employee performance reviews, many organizations have an understanding of predictable spike in
demand for a service that should be modeled.

Figure 1. A view of the lifecycle model on a month by month basis

Figure 2. A look at the lifecycle model more granularly, at the daily level
Establish an Availability Model and Plan
Once a lifecycle model is identified, the next step is to establish an availability model and plan. An availability
model for your application identifies the level of availability that is expected for your workload. It is critical as
it will inform many of the decisions youll make when establishing your service.
There are a number of things consider and a number of potential actions that can be taken.
SLA Identification
When developing your availability plan, its important to understand what the desired availability is for your
application, the workloads within that application, and the services that are utilized in the delivery of those
workloads.
Defining the Desired SLA for Your Workload
Understanding the lifecycle of your workload will help you understand the desired Service Level Agreement
that youd like to deliver. Even if an SLA is not provided for your service publicly, this is the baseline to which
youll aspire to meet in terms of availability.
There are a number of options that can be taken that will provide scalability and resiliency. These can take
varying costs and contain multiple layers. At an application level, utilizing all of these is unfeasible for most
projects due to cost and implementation time. By decomposing your application to the workload level, you
gain a benefit that you can make these investments at a more targeted level, the workload.
Even at the workload level, you may not choose to implement every option. What you choose to implement
or not is determined by your requirements. Regardless of the options you do choose, you should make a
conscious choice thats informed and considerate of all of the options.
Autonomy
Autonomy is about independence and reducing dependency between the parts which make up the service
as a whole. Dependency on components, data, and external entities must be examined when designing
services, with an eye toward building related functionality into autonomous units within the service. Doing so
provides the agility to update versions of distinct autonomous units, finer tuned control of scaling these
autonomous units, etc.
Workload architectures are often composed of autonomous components that do not rely on manual
intervention, and do not fail when the entities they depend upon are not available. Applications composed
of autonomous parts are:
available and operational
resilient and easily fault-recoverable
lower-risk for unhealthy failure states
easy to scale through replication
less likely to require manual interventions
These autonomous units will often leverage asynchronous communication, pull-based data processing, and
automation to ensure continuous service.
Looking forward, the market will evolve to a point where there are standardized interfaces for certain types
of functionality for both vertical and horizontal scenarios. When this future vision is realized, a service
provider will be able to engage with different providers and potentially different implementations that solve
the designated work of the autonomous unit. For continuous services, this will be done autonomously and
be based on policies.
As much as autonomy is an aspiration, most services will take a dependency on a third party service if only
for hosting. Its imperative to understand the SLAs of these dependent services and incorporate them into
your availability plan.
+Understanding the SLAs and Resiliency Options for Service
Dependencies
This section identifies the different types of SLAs that can be relevant to your service. For each of these
service types, there are key considerations and approaches, as well as questions that should be asked.
Public Cloud Platform Services
Services provided by a commercial cloud computing platform, such as compute or storage, have service level
agreements that are designed to accommodate a multitude of customers at significant scale. As such, the
SLAs for these services are non-negotiable. A provider may provide tiered levels of service with different
SLAs, but these tiers will be non-negotiable.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limit on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
What are the equivalent platform services provided by other 3rd parties?
3rd Party Free Services
Many third parties provide free services to the community. For private sector organizations, this is largely
done to help generate an ecosystem of applications around their core product or service. For public sector,
this is done to provide data to the citizenry and businesses that have ostensibly have paid for its collection
through the funding of the government through taxes.
Most of these services will not come with service level agreements, so availability is not guaranteed. When
SLAs are provided, they typically focus on restrictions that are placed on consuming applications and
mechanisms that will be used to enforce them. Examples of restrictions can include throttling or blacklisting
your solution if it exceeds a certain number of service calls, exceeds a certain number of calls in a given time
period (x per minute), or exceeds the number of allowable servers that are calling the service.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limits on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
Is this a commodity service where the required functionality and/or data are available from multiple service
providers?
If a commodity service, is the interface interoperable across other service providers (directly or through an
available abstraction layer)?
What are the equivalent platform services provided by other 3rd parties?
3rd Party Commercial Services
Commercial services provided by third parties have service level agreements that are designed to
accommodate the needs of paying customers. A provider may provide tiered levels of SLAs with different
levels of availability, but these SLAs will be non-negotiable.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limits on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
Is this a commodity service where the required functionality and/or data are available from multiple service
providers?
If a commodity service, is the interface interoperable across other service providers (directly or through an
available abstraction layer)?
What are the equivalent platform services provided by other 3rd parties?
Community Cloud Services
A community of organizations, such as a supply chain, may make services available to member
organizations.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limits on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
As a member of the community, is there a possibility of negotiating a different SLA?
Is this a commodity service where the required functionality and/or data are available from multiple service
providers?
If a commodity service, is the interface interoperable across other service providers (directly or through an
available abstraction layer)?
What are the equivalent platform services provided by other 3rd parties?
1st Party Internal Enterprise Wide Cloud Services
An enterprise may make core services, such as stock price data or product metadata, available to its divisions
and departments.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limits on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
As a member of the organization, is there a possibility of negotiating a different SLA?
Is this a commodity service where the required functionality and/or data are available from multiple service
providers?
If a commodity service, is the interface interoperable across other service providers (directly or through an
available abstraction layer)?
What are the equivalent platform services provided by other 3rd parties?
1st Party Internal Divisional or Departmental Cloud Services
An enterprise division or department may make services available to other members of their immediate
organization.
Questions to consider for this type of service:
Does this service allow only a certain number of calls to the Service API?
Does this service place limits on the call frequency to the Service API?
Does the service limit the number of servers that can call the Service API?
What is the publicly available information on how the service delivers on its availability promise?
How does this service communicate its health status?
What is the stated Service Level Agreement (SLA)?
As a member of the division, is there a possibility of negotiating a different SLA?
Is this a commodity service where the required functionality and/or data are available from multiple service
providers?
If a commodity service, is the interface interoperable across other service providers (directly or through an
available abstraction layer)?
What are the equivalent platform services provided by other 3rd parties?
The True 9s of Composite Service Availability
Taking advantage of existing services can provide significant agility in delivering solutions for your
organization or for commercial sale. While attractive, it is important to truly understand the impacts these
dependencies have on the overall SLA for the workload.
Availability is typically expressed as a percentage of uptime in a given year. This is expressed availability
percentage is referred to as the number of 9s. For example, 99.9 represents a service with three nines and
99.999 represents a service with five nines.

Availability % Downtime per
year
Downtime per
month
Downtime per
week
90% ("one nine") 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
97% 10.96 days 21.6 hours 5.04 hours
98% 7.30 days 14.4 hours 3.36 hours
99% ("two nines") 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% ("three nines") 8.76 hours 43.2 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes
99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% ("seven
nines")
3.15 seconds .259 seconds 0.0605 seconds
Figure 3. Downtime related to the more common 9s
One common misconception is related to the number of 9s for a composite service provides. Specifically, it
is often assumed that if a given service is composed of 5 services, each with a promised 99.999 uptime in
their SLAs, that the resulting composite service has availability of 99.999. This is not the case.
The percentage is actually a calculation which considers the amount of downtime per year. A service with an
SLA of four 9s (99.99%) can be offline up to 52.56 minutes. Incorporating 5 of these services into a
composite introduces an identified SLA risk of 262.8 minutes or 4.38 hours. This reduces the availability to
99.95% before a single line of code is written! You generally cant change the availability of a third-party
service; however, when writing your code, you can increase the overall availability of your application using
concepts laid out in this document.
When leveraging external services, the importance of understanding SLAs both individually and their
impact on the composite - cannot be stressed enough.
Identify Failure Points and Failure Modes
To create a resilient architecture, its important to understand it. Specifically, making a proactive effort to
understand and document what can cause an outage.
Understanding the failure points and failure modes for an application and its related workload services can
enable you to make informed, targeted decisions on strategies for resiliency and availability.
Failure Points
A failure point is a design element that can cause an outage. An important focus is on design elements that
are subject to external change.
Examples of failure points include
Database connections
Website connections
Configuration files
Registry keys
Categories of common failure points include
ACLs
Database access
External web site/service access
Transactions
Configuration
Capacity
Network
Failure Modes
While failure points define the areas that can result in an outage, failure modes identify the root cause of an
outage at those failure points.
Examples of failure modes include
A missing configuration file
Significant traffic exceeding resource capacity
A database reaching maximum capacity
Resiliency Patterns and Considerations
This document will look at key considerations across compute, storage, and platform services. Before
covering these topics, it is important to recap several basic resiliency impacting topics that are often either
misunderstood and/or not implemented.
Default to Asynchronous
As mentioned previously, a resilient architecture should optimize for autonomy. One of the ways to achieve
autonomy is by making communication asynchronous. A resilient architecture should default to
asynchronous interaction, with synchronous interactions happening only as the result of an exception.
Stateless web-tiers or web-tiers with a distributed cache can provide this on the front end of a solution.
Queues can provide this capability for communication for interaction between workload services or for
services within a workload service.
The latter allows messages to be placed on queues and secondary services can retrieve them. This can be
done based on logic, time, or volume considerate logic. In addition to making the process asynchronous, it
also allows scaling of tiers pushing or pulling from the queues as appropriate.
Timeouts
A common area where transient faults will occur is where your architecture connects to a service or a
resource such as a database. When consuming these services, its a common practice to implement logic that
introduces the concept of a time out. This logic identifies an acceptable timeframe in which a response is
expected and will generate an identifiable error when exceeding that time frame. Based on the appearance
of the timeout error, appropriate steps will be taken based on the context in which the error occurs. Context
can include the number of times this error has occurred, the potential impact of the unavailable resource,
SLA guarantees for the current time period for the given customer, etc.
Handle Transient Faults
When designing the service(s) that will deliver your workload, you must accept and embrace that failures will
occur and take the appropriate steps to address them.
One of the common areas to address is transient faults. As no service has 100% uptime, its realistic to expect
that you may not be able to connect to a service that a workload has taken a dependency on. The inability to
connect to or faults seen from one of these services may be fleeting (less than a second) or permanent (a
provider shuts down).
Degrade Gracefully
Your workload service should aspire to handle these transient faults gracefully. Netflix, for example, during
an outage at their cloud provider utilized an older video queue for customers when the primary data store
was not available. Another example would be an ecommerce site continuing to collect orders if its payment
gateway is unavailable. This provides the ability to process orders when the payment gateway is once again
available or after failing over to a secondary payment gateway.
When doing this, the ideal scenario is to minimize the impact to the overall system. In both cases, the service
issues are largely invisible to end users of these systems.
Transient Fault Handling Considerations
There are several key considerations for the implementation of transient fault handling, as detailed in the
following sections.
Retry logic

The simplest form of transient fault handling is to retry the operation that failed. If using a commercial third
party service, implementing retry logic will often resolve this issue.

It should be noted that designs should typically limit the number of times the logic will be retried. The logic
will typically attempt to execute the action(s) a certain number of times, registering an error and/or utilizing
a secondary service or workflow if the fault continues.
Exponential Backoff

If the result of the transient fault is due to throttling by the service due to heavy load, repeated attempts to
call the service will only extend the throttling and impact overall availability.

It is often desirable to reduce the volume of the calls to the service to help avoid or reduce throttling. This is
typically done algorithmically, such as immediately retrying after the first failure, waiting 1 second after the
second failure, 5 seconds after the 3rd failure, etc. until ultimately succeeding or a hitting an application
defined threshold for failures.

This approach is referred to exponential backoff.
Idempotency

A core assumption with connected services is that they will not be 100% available and that transient fault
handling with retry logic is a core implementation approach. In cases where retry logic is implemented, there
is the potential for the same message to be sent more than once, for messages to be sent out of sequence,
etc.

Operations should be designed to be idempotent, ensuring that sending the same message multiple times
does not result in an unexpected or polluted data store.

For example, inserting data from all requests may result in multiple records being added if the service
operation is called multiple times. An alternate approach would be to implement the code as an intelligent
upsert. A timestamp or global identifier could be used to identify new from previously processed messages,
inserting only newer ones into the database and updating existing records if the message is newer than what
was received in the past.
Compensating Behavior

In addition to idempotency, another area for consideration is the concept compensating behavior. In a world
of an every growing set of connected systems and the emergence of composite services, the importance of
understanding how to handle the compensating behavior is important.

For many developers of line of business applications, the concepts of transactions are not new, but the frame
of reference is often tied to the transactional functionality exposed by local data technologies and related
code libraries. When looking at the concept in terms of the cloud, this mindset needs to take into new
considerations related to orchestration of distributed services.

A service orchestration can span multiple distributed systems and be long running and stateful. The
orchestration itself is rarely synchronous, can span multiple systems and can span from seconds to years
based on the business scenario.

In a supply chain scenario that could tie together 25 organizations in the same workload activity, for
example, there may be a set of 25 or more systems that are interconnected in one or more service
orchestrations.

If success occurs, the 25 systems must be made aware that the activity was successful. For each connection
point in the activity, participant systems can provide a correlation ID for messages it receives from other
systems. Depending on the type of activity, the receipt of that correlation ID may satisfy the party that the
transaction is notionally complete. In other cases, upon the completion of the interactions of all 25 parties,
and confirmation message may be sent to all parties (either directly from a single service or via the specific
orchestration interaction points for each system).

To handle failures in composite and/or distributed activities, each service would expose a service interface
and operation(s) to receive requests to cancel a given transaction by a unique identifier. Behind the service
faade, workflows would be in place to compensate for the cancellation of this activity. Ideally these would
be automated procedures, but they can be as simple as routing to a person in the organization to remediate
manually.
Circuit Breaker Pattern
A circuit breaker is a switch that automatically interrupts the flow of electric current if the current exceeds a
preset limit. Circuit breakers are used most often as a safety precaution where excessive current through a
circuit could be hazardous. Unlike a fuse, a circuit breaker can be reset and re-used.
The same pattern is applicable to software design, and particularly applicable for services where availability
and resiliency are a key consideration.
In the case of a resource being unavailable, implementing a software circuit breaker can respond with
appropriate action and respond appropriately.
A common implementation of this pattern is related to accessing of databases or data services. Once an
established type and level of activity fails, the circuit breaker would react. With data, this is typically caused
by the inability to connect to a database or a data service in front of that database.
If a call to a database resource failed after 100 consecutive attempts to connect, there is likely little value in
continuing to call the database. A circuit breaker could be triggered at that threshold and the appropriate
actions can be taken.
In some cases, particularly when connecting to data services, this could be the result of throttling based on a
client exceeding the number of allowed calls within a given time period. The circuit breaker may inject delays
between calls until such time that connections are successfully established and meet the tolerance levels.
In other cases, the data store may not be unavailable. If a redundant copy of the data is available, the system
may fail over to that replica. If a true replica is unavailable or if the database service is down broadly across
all data centers within a provider, a secondary approach may be taken. This could include sourcing data from
a version of the data requested via an alternate data service provider. This alternate source could be from a
cache, an alternate persistent data store type on the current cloud provider, a separate cloud provider, or an
on premise data center. When such an alternate is not available, the service could also return a recognizable
error that could be handled appropriately by the client.
Circuit Breaker Example: Netflix
Netflix, a media streaming company, is often held up as a great example of a resilient architecture. When
discussing the circuit breaker pattern at Netflix, that team calls out several criteria that are included in their
circuit breaker in their Netflix Tech Blog. These included:
1. A request to the remote service times out.
2. The thread pool and bounded task queue used to interact with a service dependency are at 100% capacity.
3. The client library used to interact with a service dependency throws an exception.
All of these contribute to the overall error rate. When that error rate exceeds their defined thresholds, the
circuit breaker is tripped and the circuit for that service immediately serves fallbacks without even
attempting to connect to the remote service.
In that same blog entry, the Netflix team states that the circuit breaker for each of their services implements
a fallback using one of the following three approaches:
1. Custom fallback a service client library provides an invokable fallback method or locally available data on
an API server (e.g., a cookie or local cache) is used to generate a fallback response.
2. Fail silent a method returns a null value to the requesting client, which works well when the data being
requested is optional.
3. Fail fast when data is required or no good fallback is available, a 5xx response is returned to the client. This
approach focuses on keeping API servers healthy and enabling a quick recovery when impacted services
come back online, but does so at the expense of negatively impacting the client UX.
Handling SLA Outliers: Trusted Parties and Bad Actors
To enforce an SLA, an organization should address how its data service will deal with two categories of
outlierstrusted parties and bad actors.
Trusted Parties and White Listing
Trusted parties are organizations with whom the organization could have special arrangements, and for
whom certain exceptions to standard SLAs might be made.
Third Parties with Custom Agreements

There may be some users of a service that want to negotiate special pricing terms or policies. In some cases,
a high volume of calls to the data service might warrant special pricing. In other cases, demand for a given
data service could exceed the volume specified in standard usage tiers. Such customers should be defined as
trusted parties to avoid inadvertently being flagged as bad actors.
White Listing

The typical approach to handling trusted parties is to establish a white list. A white list, which identifies a list
of trusted parties, is used by the service when it determines which business rules to apply when processing
customer usage. White listing is typically done by authorizing either an IP address range or an API key.

When establishing a consumption policy, an organization should identify if white listing is supported; how a
customer would apply to be on the white list; how to add a customer to the white list; and under what
circumstances a customer is removed from the white list.
Handling Bad Actors

If trusted parties stand at one end of the customer spectrum, the group at the opposite end is what is
referred to as bad actors. Bad actors place a burden on the service, typically from attempted
overconsumption. In some cases bad behavior is genuinely accidental. In other cases it is intentional, and,
in a few situations, it is malicious. These actors are labeled bad, as their actions intentional or otherwise
have the ability to impact the availability of one or more services.

The burden of bad actors can introduce unnecessary costs to the data service provider and compromise
access by consumers who faithfully follow the terms of use and have a reasonable expectation of service, as
spelled out in an SLA. Bad actors must therefore be dealt with in a prescribed, consistent way. The typical
responses to bad actors are throttling and black listing.
Throttling

Organizations should define a strategy for dealing with spikes in usage by data service consumers.
Significant bursts of traffic from any consumer can put an unexpected load on the data service. When such
spikes occur, the organization might want to throttle access for that consumer for a certain period of time. In
this case the service refuses all requests from the consumer for a certain period of time, such as one minute,
five minutes, or ten minutes. During this period, service requests from the targeted consumer result in an
error message advising that they are being throttled for overuse.

The consumer making the requests can respond accordingly, such as by altering its behavior.

The organization should determine whether it wants to implement throttling and set the related business
rules. If it determines that consumers can be throttled, the organization will also need to decide what
behaviors should trigger the throttling response.
Black listing

Although throttling should correct the behavior of bad actors, it might not always be successful. In cases in
which it does not work, the organization might want to ban a consumer. The opposite of a white list, a black
list identifies consumers that are barred from access to the service. The service will respond to access
requests from black-listed customers appropriately, and in a fashion that minimizes the use of data service
resources.

Black listing, as with white listing, is typically done by using either an API key or with an IP address range.

When establishing a consumption policy, the organization should specify what behaviors will place a
consumer on the black list; how black listing can be appealed; and how a consumer can be removed from
the black list.
Automate All the Things
People make mistakes. Whether its a developer making a code change that could have unexpected
consequences, a DBA accidentally dropping a table in a database, or an operations person who makes a
change but doesnt document it, there are multiple opportunities for a person to inadvertently make a
service less resilient.
To reduce human error, a logical approach is to reduce the amount of humans in the process. Through the
introduction of automation, you limit the ability for ad hoc, inadvertent deltas from expected behavior to
jeopardize your service.
There is a meme in the DevOps community with a cartoon character saying Automate All the Things. In the
cloud, most services are exposed with an API. From development tools to virtualized infrastructure to
platform services to solutions delivered as Software as a Service, most things are scriptable.
Scripting is highly recommended. Scripting makes deployment and management consistent and predictable
and pays significant dividends for the investment.
Automating Deployment
One of the key areas of automation is in the building and deployment of a solution. Automation can make it
easy for a developer team to test and deploy to multiple environments. Development, test, staging, beta,
and production can all be deployed readily and consistently through automated builds. The ability to deploy
consistently across environments works toward ensuring that whats in production is representative of whats
been tested.
Establish and Automating a Test Harness
Testing is another area that can be automated. Like automated deployment, establishing automated testing
is valuable in ensuring that your system is resilient and stays resilient over time. As code and usage of your
service evolves its important to remain that all appropriate testing is done, both functionally and at scale.
Automating Data Archiving and Purging
One of the areas that gets little attention is that of data archiving and purging. Data volume is growing and
continues to grow at a higher volume and in greater variety than any time in history. Depending on the
database technology and the types of queries required, unnecessary data can reduce the response time of
your system and increase costs unnecessarily. For resiliency plans that include one or more replicas of a data
store, removing all but the necessary data can expedite management activities such as backing up and
restoring data.
Identify the requirements for your solution related to data needed for core functionality, data needed for
compliance purposes but can be archived, and data that is no longer necessary and can be purged.
Utilize the APIs available from the related products and services to automate the implementation of these
requirements.
Understand Fault Domains and Upgrade Domains
When building a resilient architecture, its also important to understand the concepts of fault domains and
upgrade domains.
Fault Domains
Fault domains constrain the placement of services based on known hardware boundaries and the likelihood
that a particular type of outage will affect a set of machines. A fault domain is defined as a series of
machines can fail simultaneously, and are usually defined by physical properties (a particular rack of
machines, a series of machines sharing the same power source, etc).
Upgrade Domains
Upgrade domains are similar to fault domains. Upgrade domains define a physical set of services that are
updated by the system at the same time. The load balancer at the cloud provider must be aware of upgrade
domains in order to ensure that if a particular domain is being updated that the overall system remains
balanced and services remain available.
Depending on the cloud provider and platform services utilized, fault domains and upgrade domains may be
provided automatically, be something your service can opt-in to via APIs, or require a 1st or 3rd party
solution.
Identify Compute Redundancy Strategies
On-premises solutions have often relied on redundancy to help them with availability and scalability. From
an availability standpoint, redundant data centers provided the ability to increase likelihood of business
continuity in the face of infrastructure failures in a given data center or part of a data center.
For applications with geo-distributed consumers, traffic management and redundant implementations
routed users to local resources, often with reduced latency.
Note
Data resiliency, which includes redundancy, is covered
as a separate topic in the section titled Establishing a
Data Resiliency Approach.
Redundancy and the Cloud
On-premises, redundancy has historically been achieved through duplicate sets of hardware, software, and
networking. Sometimes this is implemented in a cluster in a single location or distributed across multiple
data centers.
When devising a strategy for the cloud, it is important to rationalize the need for redundancy across three
vectors. These vectors include deployed code within a cloud providers environment, redundancy of
providers themselves, and redundancy between the cloud and on premises.
Deployment Redundancy
When an organization has selected a cloud provider, it is important to establish a redundancy strategy for
the deployment within the provider.
If deployed to Platform as a Service (PaaS), much of this may be handled by the underlying platform. In an
Infrastructure as a Service (IaaS) model, much of this is not.
Deploy n number of roles with in a data center

The simplest form of redundancy is deploying your solution to multiple compute nodes within a single cloud
provider. By deploying to multiple nodes, the solution can limit downtime that would occur when only a
single node is deployed.

In many Platform as a Service environments, the state of the virtual machine hosting the code is monitored
and virtual machines detected to be unhealthy can be automatically replaced with a healthy node.
Deploy Across Multiple Data Centers

While deploying multiple nodes in a single data center will provide benefits, architectures must consider that
an entire data center could potentially be unavailable. While not a common occurrence, events such as
natural disasters, war, etc. could result in a service disruption in a particular geo-location.

To achieve your SLA, it may be appropriate for you to deploy your solution to multiple data centers for your
selected cloud provider. There are several approaches to achieving this, as identified below.
1. Fully Redundant Deployments in Multiple Data Centers

The first option is a fully redundant solution in multiple data centers done in conjunction with a traffic
management provider. A key consideration for this approach will be impact to the compute-related costs for
this type of redundancy, which will increase 100% for each additional data center deployment.
2. Partial Deployment in Secondary Data Center(s) for Failover

Another approach is to deploy a partial deployment to a secondary data center of reduced size. For example,
if the standard configuration utilized 12 compute nodes, the secondary data center would contain a
deployment containing 6 compute nodes.

This approach, done in conjunction with traffic management, would allow for business continuity with
degraded service after an incident that solely impacted the primary center.

Given the limited number of times a data center goes offline entirely, this is often seen as a cost-effective
approach for compute particularly if a platform allows the organization to readily onboard new instances in
the second data center.
3. Divided Deployments across Multiple Data Centers with Backup Nodes

For certain workloads, particularly those in the financial services vertical, there is a significant amount of data
that must be processed within a short, immovable time window. In these circumstances, work is done in
shorter bursts and the costs of redundancy are warranted to deliver results within that window.

In these cases, code is deployed to multiple data centers. Work is divided and distributed across the nodes
for processing. In the instance that a data center becomes unavailable, the work intended for that node is
delivered to the backup node which will complete the task.
4. Multiple Data Center Deployments with Geography Appropriate Sizing per Data Center

This approach utilizes redundant deployments that exist in multiple data centers but are sized appropriately
for the scale of a geo-relevant audience.
Provider Redundancy
While data-center-centric redundancy is good, Service Level Agreements are at the Service Level vs. the data
center. There is the possibility that the services delivered by a provider could become unavailable across
multiple or all data centers.
Based on the SLAs for a solution, it may be desirable to also incorporate provider redundancy. To realize this,
cloud-deployable products or cloud services that will work across multiple cloud platforms must be
identified. Microsoft SQL Server, for example, can be deployed in a Virtual Machine inside of Infrastructure as
a Service offerings from most vendors.
For cloud provided services, this is more challenging as there are no standard interfaces in place, even for
core services such as compute, storage, queues, etc. If provider redundancy is desired for these services, it is
often achievable only thorugh an abstraction layer. An abstraction layer may provide enough functionality
for a solution, but it will not be innovated as fast as the underlying services and may inhibit an organization
from being able to readily adopt new features delivered by a provider.
If redundant provider services may are warranted, it can be at one of several levels--an entire application, a
workload, or an aspect of a workload. At the appropriate level, evaluate the need for compute, data, and
platform services and determine what must truly be redundant and what can be handled via approaches to
provide graceful degradation.
On-Premises Redundancy
While taking a dependency on a cloud provider may make fiscal sense, there may be certain business
considerations that require on-premises redundancy for compliance and/or business continuity.
Based on the SLAs for a solution, it may be desirable to also incorporate on-premises redundancy. To realize
this, private cloud-deployable products or cloud services that will work across multiple cloud types must be
identified. As with the case of provider redundancy, Microsoft SQL Server is a good example of a product
that can be deployed on-premises or in an IaaS offering.
For cloud provided services, this is more challenging as there are often no on-premises equivalents with
interface and capability symmetry.
If redundant provider services are required on premises, this can be at one of several levels--an entire
application, a workload, or an aspect of a workload. At the appropriate level, evaluate the need for compute,
data, and platform services and determine what must truly be redundant and what can be handled via
approaches to provide graceful degradation.
Redundancy Configuration Approaches
When identifying your redundancy configuration approaches, classifications that existed pre-cloud also
apply. Depending on the types of services utilized in your solution, some of this may be handled by the
underlying platform automatically. In other cases, this capability is handled through technologies like
Windows Fabric.
1. Active/active Traffic intended for a failed node is either passed onto an existing node or load balanced
across the remaining nodes. This is usually only possible when the nodes utilize a homogeneous software
configuration.
2. Active/passive Provides a fully redundant instance of each node, which is only brought online when its
associated primary node fails. This configuration typically requires the most extra hardware.
3. N+1 Provides a single extra node that is brought online to take over the role of the node that has failed.
In the case of heterogeneous software configuration on each primary node, the extra node must be
universally capable of assuming any of the roles of the primary nodes it is responsible for. This normally
refers to clusters which have multiple services running simultaneously; in the single service case, this
degenerates to active/passive.
4. N+M In cases where a single cluster is managing many services, having only one dedicated failover node
may not offer sufficient redundancy. In such cases, more than one (M) standby servers are included and
available. The number of standby servers is a tradeoff between cost and reliability requirements.
5. N-to-1 Allows the failover standby node to become the active one temporarily, until the original node can
be restored or brought back online, at which point the services or instances must be failed-back to it in order
to restore high availability.
6. N-to-N A combination of active/active and N+M, N to N redistributes the services, instances or
connections from the failed node among the remaining active nodes, thus eliminating (as with active/active)
the need for a 'standby' node, but introducing a need for extra capacity on all active nodes.
Traffic Management
Whether traffic is always geo-distributed or routed to different data centers to satisfy business continuity
scenarios, traffic management functionality is important to ensure that requests to your solution are being
routed to the appropriate instance(s).
It is important to note that taking a dependence on a traffic management service introduces a single point of
failure. It is important to investigate the SLA of your applications primary traffic management service and
determine if alternate traffic management functionality is warranted by your requirements.
Establish a Data Partitioning Strategy
While many high scale cloud applications have done a fine job of partitioning their web tier, they are less
successful in scaling their data tier in the cloud. With an ever growing diversity of connected devices, the
level of data generated and queried is growing at levels not seen before in history. The need to be able to
support 500,000 new users per day, for example, is now considered reasonable.
Having a partitioning strategy is critically important across multiple dimensions, including storing, querying,
or maintaining that data.
Decomposition and Partitioning
Because of the benefits and tradeoffs of different technologies, it is common to leverage technologies that
are most optimal for the given workload.
Having a solution that is decomposed by workloads provides you with the ability to choose data
technologies that are optimal for a given workload. For example, a website may utilize table storage for
content for an individual, utilizing partitions at the user level for a response experience. Those table rows
may be aggregated periodically into a relational database for reporting and analytics.
Partitioning strategies may, and often will, vary based on the technologies chosen.
Understanding the 3 Vs
To properly devise a partitioning strategy, an organization must first understand it.
The 3 Vs, made popular by Gartner, look at three different aspects of data. Understanding how the 3 Vs
relate to your data will assist you in making an informed decision on partitioning strategies.
Volume

Volume refers to the size of the data. Volume has very real impacts on the partitioning strategy. Volume
limitations on a particular data technology may force partitioning due to size limitations, query speeds at
volume, etc.
Velocity

Velocity refers to the rate at which your data is growing. You will likely devise a different partitioning strategy
for a slow growing data store vs. one that needs to accommodate 500,000 new users per day.
Variety

Variety refers to the different types of data that are relevant to the workload. Whether its relational data,
key-value pairs, social media profiles, images, audio files, videos, or other types of data, its important to
understand it. This is both to choose the right data technology and make informed decisions for your
partitioning strategy.
Horizontal Partitioning
Likely the most popular approach to partitioning data is to partition it horizontally. When partitioning
horizontally, a decision is made on criteria to partition a data store into multiple shards. Each shard contains
the entire schema, with the criteria driving the placement of data into the appropriate shards.
Based on the type of data and the data usage, this can be done in different ways. For example, an
organization could choose to partition their data based on a customer last name. In another case, the
partition could be date centric, partitioning on the relevant calendar interval of hour, day, week, or month.

Figure 4. An example of horizontal partitioning by last name
Vertical Partitioning
Another approach is vertical partitioning. This optimizes the placement of data in different stores, often tied
to the variety of the data. Figure 5 shows an example where metadata about a customer is placed in one
store while thumbnails and photos are placed in separate stores.
Vertical partitioning can result in optimized storage and delivery of data. In Figure 5, for example, if the
photo is rarely displayed for a customer, returning 3 megabytes per records can add unnecessary costs in a
pay as you go model.

Figure 5. An example of vertical partitioning.
Hybrid Partitioning
In many cases it will be appropriate to establish a hybrid partitioning strategy. This approach provides the
efficiencies of both approaches in a single solution.
Figure 6 shows an example of this, where the vertical partitioning seen earlier is now augmented to take
advantage of horizontal partitioning of the customer metadata.

Figure 6. An example of horizontal partitioning.
Cloud computing == network computing
At the heart of cloud computing is the network. The network is crucial as it provides the fabric or backbone
for devices to connect to services as well as services connecting to other services. There are three network
boundaries to consider in any FailSafe application.
Those network boundaries are detailed below with Windows Azure used as an example to provide context:
1. Role boundaries are traditionally referred to as tiers. Common examples are a web tier or a business logic
tier. If we look at Windows Azure as an example, it formally introduced roles as part of its core design to
provide infrastructure support the multi-tier nature of modern, distributed applications. Windows Azure
guarantees that role instances belonging t the same service are hosted within the scope of a single network
environment and managed by a single fabric controller.
2. Service boundaries represent dependencies on functionality provided by other services. Common examples
are a SQL environment for relational database access and a Service Bus for pub/sub messaging support.
Within Windows Azure, for example, service boundaries are enforced through the network: no guarantee will
be given that a service dependency will be part of the same network or fabric controller environment. That
might happen, but the design assumption for any responsible application has to be that any service
dependency is on a different network managed by a different fabric controller.


3. Endpoint boundaries are external to the cloud. They include any consuming endpoint, generally assumed to
be a device, connecting to the cloud in order to consume services. You must make special considerations in
this part of the design due to the variable and unreliable nature of the network. Role boundaries and service
boundaries are within the boundaries of the cloud environment and one can assume a certain level of
reliability and bandwidth. For the external dependencies, no such assumptions can be made and extra care
has to be given to the ability of the device to consume services, meaning data and interactions.

The network by its very nature introduces latency as it passes information from one point of the network to
another. In order to provide a great experience for both users and as dependent services or roles, the
application architecture and design should look for ways to reduce latency as much as sensible and manage
unavoidable latency explicitly. One of the most common ways to reduce latency is to avoid services calls that
involve the network--local access to data and services is a key approach to reduce latency and introduce
higher responsiveness Using local data and services also provides another layer of failure security; as long as
the requests of the user or application can be served from the local environment, there is no need to interact
with other roles or services, removing the possibility of dependent component unavailability as a failure
point.
Introducing caching
Caching is a technique that can be used to improve data access speeds when its not possible to store data
locally. Caching is utilized to great effect in most cloud service operating at scale today. As the definition
provided by Wikipedia outlines, a cache provides local access to data that is repeatedly asked for by
applications. Caching relies on two things:
Usage patterns for the data by users and dependent applications are predominantly read-only. In certain
scenarios such as ecommerce websites, the percentage of read-only access (sometimes referred to as
browsing) is up to 95% of all user interactions with the site.
The applications information model provides an additional layer of semantic information that supports the
identification of stable, singular data that is optimal for caching.
Device caching
While not the focus for the FailSafe initiative, device caching is one of the most effective ways to increase the
usability and robustness of any devices + services application. Numerous ways exist to provide caching
services on the device or client tier, ranging from the HTML5 specification providing native caching
capabilities implemented in all the standard browsers to local database instances such as SQL Server
Compact Edition or similar.
Distributed caching
Distributed caching is a powerful set of capabilities, but its purpose is not to replace a relational database or
other persistent store; rather, its purpose is to increase the responsiveness of distributed applications that by
nature are network centric and thus latency sensitive. A side benefit of introducing caching is the reduction
of traffic to the persistent data store, which drives the interactions with your data service to a minimum.
Information models optimized for caching
Cached data by its very nature is stale data, i.e. data that is not necessarily up-to-date anymore. A great
example of cached data although from a very different domain is a product catalog that is being sent to
thousands of households. The data used to produce the product catalog was up-to-date when the catalog
was created. Once the printing presses were going, the data, by the very nature of time passing during the
catalog production process, went stale. Due to cached data being stale, the attributes of data with respect to
stability and singularity are critical to the caching design:
o Stability - Data that has an unambiguous interpretation across space and time. This often means data values
that do not change. For example, most enterprises never recycle customer identifications or SKU numbers.
Another technique to create stable data is the addition of expiration dates to the existing data. The printed
product catalog example above is a great example. Generally retailers accept orders from any given catalog
for 2 periods of publication. If one publishes a catalog four times a year, the product catalog data is stable
for 6 months and can be utilized for information processing such as placing and fulfilling orders.

Stable data is often referenced as master or reference data. In the FailSafe initiative we will utilize the term
reference data as it is a semantically more inclusive term than master data. In a lot of enterprises, master
data has a very specific meaning and is narrower than reference data.
o Singularity - Data that can be isolated through association with uniquely identifiable instances with no or
low concurrent updates. Take the example of a shopping basket. While the shopping basket will clearly be
updated, the updates occur relatively infrequently and can be completely isolated from the storage as well as
the processing perspective.

Isolatable data as described above is referenced as activity data or session data.

With these two attributes in mind, the following schema emerges:


Managing the cache

Caching the right information at the right time is a key part of a successful caching strategy. Numerous
techniques exist for loading the cache: a good overview is described here. In addition, the sections below
outline a few considerations for FailSafe application design that is dependent upon distributed caching.
o Reference data - If the hosting environment (fabric controller or datacenter) encounters a disaster, your
application will be moved to another environment. In the case where an active instance of your application is
already active (active-active design), the likelihood that your cache already contains a lot of the relevant
information (especially reference data) is high. In the case that a new instance of your application gets spun
up, no information will be in the cache nodes. You should design your application so that on a cache miss, it
automatically loads the desired data. In the case of a new instance, you can have a startup routine that bulk
loads reference data into the cache. A combination of the two is desirable as users might be active as soon
as the application is being served by the cloud infrastructure.
o Activity data - The basic techniques described for reference data hold true for activity data as well. However,
there is a specific twist to activity data. Reference data is assumed to be available in any persistent store of
the application. As it will change on a less frequent basis, synchronization ought not to be a problem,
although it needs to be considered. However, activity data, albeit being updated in isolation and with low
frequency, will be more volatile than reference data. Ideally the distributed cache persists activity data on a
frequent basis and replicates the data between the various instances of the application. Take care to ensure
that the persistence and synchronization intervals are spaced far enough to avoid contention but close
enough to keep possible data loss at a minimum.
Establishing a Data Resiliency Approach
A common misunderstanding is the relationship, specifically the areas of responsibility, between platform
and application. One of the areas where this is most troublesome is in respect to data.
While a platform such as Windows Azure will deliver on promises of storing multiple copies of the data (and
in some services even going so far as to provide geo-redundancy), the data that is stored is driven by the
application, workload, and its component services. If the application takes an action that corrupts its
application data, the platform stores multiple copies of it.
When establishing your failure modes and failure points its important to identify areas of the application
that could potentially cause data corruption. While the point of origin could vary from bad code or poison
messages to your service, its important to identify the related failure modes and failure points.
Application Level Remediation
Idempotency

A core assumption with connected services is that they will not be 100% available and that transient fault
handling with retry logic is a core implementation approach. In cases where retry logic is implemented, there
is the potential for the same message to be sent more than once, for messages to be sent out of sequence,
etc.

Operations should be designed to be idempotent, ensuring that sending the same message multiple times
does not result in an unexpected or polluted data store.

For example, inserting data from all requests may result in multiple records being added if the service
operation is called multiple times. An alternate approach would be to implement the code as an intelligent
upsert which performs an update if a record exists or an insert if it does not. A timestamp or global
identifier could be used to identify new vs. previously processed messages, inserting only newer ones into
the database and updating existing records if the message is newer than what was received in the past.
Workload Activities and Compensating Behavior

In addition to idempotency, another area for consideration is the concept of compensating behavior.

A real world example of compensating behavior is seen when returning of a product which was paid for with
a credit card. In this scenario, a consumer visits a retailer, provides a credit card and a charge is applied to
the consumers credit card account. If the consumer returns the product to the retailer, a policy is evaluated
and if the return conforms to the policy, the retailer issues a credit for the amount of the purchase to the
consumers credit card account.

In a world of an every growing set of connected systems and the emergence of composite services, the
importance of understanding how to handle the compensating behavior is important.

For many developers of line-of-business applications, the concepts of transactions are not new, but the
frame of reference is often tied to the transactional functionality exposed by local data technologies and
related code libraries. When looking at the concept in terms of the cloud, this mindset needs to take into
account new considerations related to orchestration of distributed services.

A service orchestration can span multiple distributed systems and be long running and stateful. The
orchestration itself is rarely synchronous and it can span from seconds to years based on the business
scenario.

In a supply chain scenario, that could tie together 25 organizations in the same workload activity. For
example, there may be a set of 25 or more systems that are interconnected in one or more service
orchestrations.

If success occurs, the 25 systems must be made aware that the activity was successful. For each connection
point in the activity, participant systems can provide a correlation ID for messages it receives from other
systems. Depending on the type of activity, the receipt of that correlation ID may satisfy the party that the
transaction is notionally complete. In other cases, upon the completion of the interactions of all 25 parties, a
confirmation message may be sent to all parties (either directly from a single service or via the specific
orchestration interaction points for each system).

To handle failures in composite and/or distributed activities, each service would expose a service interface
and operation(s) to receive requests to cancel a given transaction by a unique identifier. Behind the service
faade, workflows would be in place to compensate for the cancellation of this activity. Ideally these would
be automated procedures, but they can be as simple as routing to a person in the organization to remediate
manually.
Backups
In addition to application-level remediation to avoid data corruption, there is also remediation that is put in
place to provide options if application remediation is not successful.
Processes for both creating and restoring backup copies of your data store either in whole or in part
should be part of your resiliency plan. While the concepts of backing up and restoring data are not new,
there are new twists to this in the cloud.
Your backup strategy should be defined with a conscious understanding of the business requirements for
restoring data. If a data store is corrupted or taken offline due a disaster scenario, you need to know what
type of data must be restored, what volume must be restored, and what pace is required for the business.
This will impact your overall availability plan and should drive your backup and restore planning.
Relational Databases

Backing up of relational databases is nothing new. Many organization have tools, approaches, and processes
in place for the backing up of data to either satisfy disaster recovery or compliance needs. In many cases
traditional backup tools, approaches, and processes may work with little or no modification. In addition,
there are new or variant alternatives, such as backing up data and storing a copy in cloud-based blob
storage, that can be considered.

When evaluating existing processes and tools, its important to evaluate which approach is appropriate for
the cloud based solution. In many cases, one or more of the approaches listed below will be applied to
remediate different failure modes.
1. Total Backup - This is a backup of a data store in its entirety. Total backups should occur based on a
schedule dictated by your data volume and velocity. A total backup is the complete data set needed to
deliver on the service level agreement for your service. Mechanisms for this type of backup are generally
available either by the database / database service provider or its vendor ecosystem.
2. Point in Time - A point in time backup is a backup that reflects a given point in time in the databases
existence. If an error were to occur in the afternoon that corrupted the data store, for example, a point in
time backup done at noon could be restored to minimize business impact.

Given the ever-growing level of connectivity of individuals, the expectation to engage with your service at
any time of day makes the ability to quickly restore to a recent point in time a necessity.
3. Synchronization - In addition to traditional backups, another option is synchronization of data. Data could
be stored in multiple data centers, for example, with a periodic synchronization of data from one datacenter
to another. In addition to providing synchronized data in solutions that utilize traffic management as part of
a normal availability plan, this can also be used to fail over to a second data center if there is a business
continuity issue.

Given the constant connectivity of individuals consuming services, downtime becomes less and less
acceptable for a number of scenarios and synchronization can be a desirable approach.

Patterns for synchronization can include:

- data center to data center of a given cloud provider

- data center to data center across cloud providers

- data center to data center from on premise to a given cloud provider

- data center to device synchronization for consumer specific data slices
Sharded Relational Databases

For many, the move to the cloud is driven by a need to facilitate large numbers of users and high traffic
scenarios such as those related to mobile or social applications. In these scenarios, the application pattern
often involves moving away from a single database model to a number of database shards that contain a
portion of the overall data set and are optimized for large scale engagement. One recent social networking
project built on Windows Azure launched with a total of 400 database shards.

Each shard is a standalone database and your architecture and management should facilitate total backups,
point in time backups, and restoration of backups for both individual shards and a complete data set
including all shards.
NoSQL Data Stores

In addition to relational data stores, backup policies should be considered for Not only SQL or NoSQL data
stores as well. The most popular form of NoSQL databases provided by major cloud providers would be a
form of high availability key-value pair store, often referred to as a table store.

NoSQL stores may be highly available. In some cases they will also be geo-redundant, which can help
prevent loss in the case of a catastrophic failure in a specific data center. These stores typically do not
provide protections from applications overwriting or deleting content unintentionally. Application or user
errors are not handled automatically by platform services such as blob storage and a backup strategy should
be evaluated.

While relational databases typically have existing and well-established tools for performing backups, many
NoSQL stores do not. A popular architectural approach is to create a duplicate copy of the data in a replica
NoSQL store and use a lookup table of some kind to identify which rows from the source store have been
placed in the replica store. To restore data, this same table would be utilized, reading from the table to
identify content in the replica store available to be restored.

Depending on the business continuity concerns, the placement of this replica could be hosted with the same
cloud provider, in the same data center, and/or the same No SQL data store. It could also reside in a
different data center, a different cloud provider, and/or a different variant of NoSQL data store. The driver for
placement will be largely influence by the desired SLA of your workload service and any related regulatory
compliance considerations.

A factor to consider when making this determination is cost, specifically as it relates to data ingress and
egress. Cloud providers may provide free movement of data within their data center(s) and allow free
passage of data into their environment. No cloud provider offers free data egress, and the cost of moving
data to a secondary cloud platform provider could introduce significant costs at scale.
Blob Storage

Like relational and NoSQL data stores, a common misconception is that the availability features implemented
for a blob storage offering will remove the need to consider implementing a backup policy.

Blob storage also may be geo-redundant, but, as discussed earlier, this does not guard against application
errors. Application or user errors are not handled automatically by platform services such as blob storage
and a backup strategy should be evaluated.

Backup strategies could be very similar to those used for NoSQL stores. Due to the potentially large size of
blobs, cost and time to move data will be important parts of a backup and restore strategy.
Restoring Backups

By now, most people have heard the cautionary tale of the organization that established and diligently
followed backup policies but never tested restoring the data. On that fateful day when a disaster did occur,
they went to restore their database backup only to discover they had configured their backup incorrectly and
the tapes theyd been sending offsite for years didnt have the information they needed on them.

Whatever back up processes are put into place, its important to establish testing to verify that data can be
restored correctly and to ensure that restoration occurs in a timely fashion and with minimal business
impact.
Content Delivery Networks
Content Delivery Networks (CDNs) are a popular way to provide availability and enhanced user experience
for frequently requested files. Content in a CDN is copied to a local node on its first use, and then served up
from that local node for subsequent requests. The content will expire after a designated time period, after
which content must be re-copied to the local node upon the next request.
Utilizing a CDN provides a number of benefits but it also adds a dependency. As is the case with any
dependency, remediation of a service failure should be proactively reviewed.
Appropriate Use of a CDN
A common misconception is that CDNs are a cure all for scale. In one scenario, a customer was confident it
was the right solution for an online ebook store. It was not. Why? In a catalog of a million books, there is a
small subset of books that would be frequently requested (the hits) and a very long tail of books that
would be requested with very little predictability. Frequently requested titles would be copied to the local
node on the first request and provide cost effective local scale and a pleasant user experience. For the long
tail, almost every request is copied twice once to the local node, then to the customer as the infrequent
requests result in content regularly expiring. This is evidence that a CDN improperly will have the opposite of
the intended effect a slower, more costly solution.
Design for Operations
In many cases, operations of a solution may not be planned until further along in the lifecycle. To build truly
resilient applications, they should be designed for operations. Designing for operations typically will include
key activities such as establishing a health model, capturing telemetry information, incorporating health
monitoring services and workflows, and making this data actionable by both operations and developers.

Lecture 4: WINDOWS AZURE COMPUTE
Web Sites vs Cloud Services vs Virtual Machines

With its recent expansion, Windows Azure (Microsofts public cloud platform) now
supports 3 modes of cloud computing. In this article, well explain the rules of the
road using a highway metaphor. Which lane of this 3-way highway should you drive
in, and what kind of vehicles are permitted in each lane?





Traditionally, there was one way to use Windows Azure: Platform-as-a-Service (now
called Cloud Services). Recently, the platform was greatly expanded to also provide
Infrastructure-as-a-Service capability (Virtual Machines) and a special mode for web
sites (Windows Azure Web Sites). Lets point out here and now that although one
area of the platform has web sites in its name, you can in fact host web sites at all
three levels. The table below will give you a quick idea of what kind of apps belong
where, after which well take a look at each of these ways to use Windows Azure and
compare them.


Windows Azure Web
Sites
2-Tier Simple Web Sites
Web sites using open source frameworks
Web sites using SQL DB or MySQL
Web sites that run on IIS: ASP.NET, PHP, node.js
Cloud Services Multi-Tier Modern Solutions
Stateless VM farms
Run on Windows Server
Where automated management is required
Require advanced functionality such as service bus, identity
federation, CDN, traffic management
Virtual Machines Legacy Solutions
Stateful single VM or VM farms
Run on Windows Server or Linux
Legacy LOB applications
Enterprise server products
Where portability between cloud and on-premises is
required



Windows Azure Web Sites: The Fast Lane
Windows Azure Web Sites provides an accelerated way to work in the cloud for
modern web sites. It has the automated management benefits normally associated
with Platform-as-a-Service and the portability of workload normally associated with
Infrastructure-as-a-Service. Unlike the other two modes for using Windows Azure,
which support a great diversity of solution types, WAWS is limited to simple 2-tier
web sites that can run on a standard IIS configuration. At the time of this writing,
WAWS is in preview and does not yet come with an SLA. A certain level of use is free
before charges apply.

Why do we call WAWS the fast lane? First of all, provisioning is lightning-quick: you
can provision a web site and accompanying SQL or MySQL database in well under a
minutefar less than the 10-15 minutes of provisioning time other segments of the
platform require. Second, you have high productivity from the get-go because your
web developers dont have to learn anything new or change the way they work. Your
web site can run as-is in the cloud. Deployment is a beautiful thing in WAWS: you
can use Web Deploy, FTS, Git or TFS. Moreover, you only have to deploy to a single
server regardless of how many VMs your site runs on: Windows Azure takes care of
distributing deployments out to all instances. Lastly, one other nice speedy aspect of
WAWS is that you can provision web sites ready to go with your favorite web
framework by choosing from a gallery that includes DotNetNuke, Drupal, Joomla,
Orchard, Umbraco, and WordPress.

The reason WAWS deployment is so fast is that it uses a pre-allocated pool of VMs
for web serving. By default, you are working in shared mode, which means you are
using this pool and sharing your VMs with other tenants. That might sound a little
dangerous, but virtual directory isolation keeps your work and that of other tenants
walled off from each other. At a higher cost, you can choose to switch over
toreserved mode, in which your VMs are dedicated to you alone. In either mode, you
can scale the number of instances using the Windows Azure management portal.

What kind of web technologies can you run in WAWS? Anything compatible with IIS,
including ASP.NET, classic ASP, PHP, and node.js. WAWS does come with some
limitations. Architecturally, you are limited to simple 2-tier web sites that run on IIS
with a standard configuration. However, you cannot change the IIS configuration,
remote desktop to the VM, or otherwise make changes that might affect other
tenants. If you need more than this, such as a third tier for web services, you should
look at Cloud Services or Virtual Machines instead.

From a VM persistence standpoint, disk file changes are persistently saved in
Windows Azure Storage. However, all VM instances for a site are sharing common
storageso you need to consider file overwrites.
The bottom line: Windows Azure Web Sites provide a fast lane for web sites in the
cloud, offering the best attributes of PaaS (automated management) and IaaS
(portability) without requiring web developers to change the way the workbut they
can only be used with simple 2-tier web sites that can run on IIS in a default
configuration.

See my post Reintroducing Windows Azure: Web Sites for a detailed walk-through of
this feature.




Cloud Services: Leave the Driving to Us
Cloud Services are the Platform-as-a-Service (PaaS) way to use Windows Azure.
PaaS gives you automated management and a large toolbox of valuable services to
compose solutions from.

PaaS can be controversial: some see it as the modern way to design applications that
leverage the differences and advantages of cloud computing environments; others
view PaaS with concerns about vendor/platform lock-in. Keep in mind, the expense
of preserving full cloud/on-prem portability is limiting yourself to the least common
denominator of both; that is, not taking advantage of the special functionality
available in the cloud.

The cornerstone Windows Azure Compute service which hosts applications uses a
declarative model to specify the shape and size of your solution. Although many
kinds of software can run in the cloud at this level, including complex multi-tier
solutions, this model must be adhered to. Your solution consists of one of
more roles, where each role is a VM farm. There are several kinds of roles,
including web roles for Internet-facing software and worker roles for everything else.
You can connect your roles to each other or to the outside world using load-
balanced endpoints or alternative methods such as queues. The VM instances that
make up a role are not persistent, meaning you cannot rely on VM state changes
(such as disk file updates) to stay around; for persistence you must use a storage or
database service. Youll need at least 2 VM instances per role in order for the 99.95%
(3 9s) Compute SLA to apply. Most of the other cloud services give you a 99.9% (3
9s) SLA.

There are many other cloud services to choose from that provide storage, relational
data, identity, communication, caching, traffic management, and more. Many of
these services provide extremely valuable functionality for pennies such as federating
identity or brokering messagesand theyre managed for you. Most PaaS solutions
leverage a combination of cloud services.

The automated management in Cloud Services is very attractive. For the Compute
service, this includes automated patching and orchestrated software updates across
your VMs. Storage and database services store data with triple-redundancy and have
automatic fail-over. The traffic management service can check the health of your
applications and fail over from one data center to another when necessary. VM
instances and storage units are spread across fault domains in the data center to
maintain availability in the event of a data center failure.

Cloud Services do require you to adhere to its model, and sometimes that means
designing for the cloud or adapting existing applications to fit the cloud. Sometimes
this is a small effort and sometimes it is not. Modern SOA applications based on tiers
of stateless VM farms are generally very straightforward to move to Cloud Services.
Solutions that have a single-server model, or depend on local VM state, usually
require moderate-to-significant changes. These changes can sacrifice portability of
your application, especially if you are highly dependent on cloud service functionality
not available elsewhere.

Cloud Services provide an SDK and a simulation environment that allow you to
develop and test locally before deploying to a cloud data center. To deploy, you
must package your solution and publish it to the cloud to a Staging slot or a
Production slot. You can promote a cloud deployment from Staging to Production in
a fast, one-click operation.

The bottom line: cloud services provide automated management, valuable
functionality, and architectural versatilitybut apps may need to be adapted to fit its
model, and strong dependence on platform-specific cloud services can result in apps
that are no longer portable to other environments.

See my post Reintroducing Windows Azure: Cloud Services for highlights of whats
new in this area.




Virtual Machines: Self-Service
Virtual Machines are the Infrastructure-as-a-Service (IaaS) level of Windows Azure.
You stand up virtual machines, configure them yourself, and manage them yourself.
If you value portability of workload (the ability to move your IT assets between the
cloud and on-premises seamlessly) and are fine managing your systems yourself, this
lane might be for you. At the time of this writing, Virtual Machines are in preview;
however, they do come with an SLA and are not free despite the pre-release status.

Although it comes at the cost of self-management, Virtual Machines provide great
versatility. You provision Linux or Windows Server VMs and either compose the VM
images in the cloud or upload a VHD youre previously created using Hyper-V. You
can capture a VM and add it your image gallery for easy reuse.
In Virtual Machines, individual VMs are persistent. Its fine to run just one instance of
a VM, you dont have to worry about losing VM state, and you get a 99.9% SLA. This
makes virtual machines the right choice for single-server solutions and server
products that use local disk files. You cant run a product like Active Directory or SQL
Server or SharePoint Server successfully in Cloud Services, but you can in Virtual
Machines. Virtual machines are also often the best fit for legacy applications.

You can group your virtual machines in a common availability set, which will spread
instances across fault domains in the data center for high availability in the event of a
data center failure. You can provision load-balanced endpoints to direct Internet
traffic to your availability set.

The bottom line: Virtual Machines are for the do-it-yourself IT person (like a driving
enthusiast who also likes to work on their car and do their own tuning and
maintenance). It's also the only way to run certain kinds of applications, such as
single-server stateful solutions and some server products.
See my post Reintroducing Windows Azure: Virtual Machines for a detailed walk-
through of this feature.


WAWS, Cloud Services, & VMs Compared
The table below contrasts Windows Azure, Web Sites, Cloud Services, and Virtual
Machines.

WAWS Cloud Services Virtual Machines
Level (PaaS/IaaS) PaaS with the portability of
IaaS
PaaS IaaS
Portability Fully portable Design or adapt for cloud Fully portable
Management Automated Automated Customer Responsibility
Architecture 2-tier IIS web sites only Versatile Versatile
Unit of Management Web Site Cloud Service Virtual Machine
Persistence VMs share persistence VMs are not persistent Each VM is persistent
Provisioning Under a minute 10-15 minutes 10-15 minutes
Technology platform Windows Server / IIS Windows Server Windows Server, Linux
Deployment Web Deploy, FTP, Git, TFS Package and Publish Compose VMs or upload
VHDs
Gallery Common web frameworks Microsoft Guest OS images Microsoft-provided and user-
saved VM images
SLA None during preview 3 9s / (3 9s Compute)
requires 2+ VMs per role
3 9s (single VM)

Lets contrast three characteristics that you might care about when using the cloud:
automated management, portability of workload between cloud and on-premises,
and architectural versatility. As you can see from the magic triangle below, each
mode will give you two of the three.



Choice is good, but it comes with the responsibility to choose well. I hope the above has
helped to characterize the 3-lane highway that is Windows Azure. Drive safely!

INTRODUCTION TO WINDOWS AZURE CLOUD SERVICES


Millions of developers around the world know how to create applications using the Windows Server
programming model. Yet applications written for Windows Azure, Microsofts cloud platform, dont
exactly use this familiar model. While most of a Windows developers skills still apply, Windows Azure
provides its own programming model.

Why? Why not just exactly replicate the familiar world of Windows Server in the cloud? Many vendors
cloud platforms do just this, providing virtual machines (VMs) that act like on-premises VMs. This
approach, commonly called Infrastructure as a Service (IaaS), certainly has value, and its the right
choice for some applications. Yet cloud platforms are a new world, offering the potential for solving
todays problems in new ways. Instead of IaaS, Windows Azure offers a higher-level abstraction thats
typically categorized as Platform as a Service (PaaS). While its similar in many ways to the on-premises
Windows world, this abstraction has its own programming model meant to help developers build better
applications.

The Windows Azure programming model focuses on improving applications in three areas:

Administration: In PaaS technologies, the platform itself handles the lions share of administrative
tasks. With Windows Azure, this means that the platform automatically takes care of things such as
applying Windows patches and installing new versions of system software. The goal is to reduce the
effortand the costof administering the application environment.

Availability: Whether its planned or not, todays applications usually have down time for Windows
patches, application upgrades, hardware failures, and other reasons. Yet given the redundancy that
cloud platforms make possible, theres no longer any reason to accept this. The Windows Azure
programming model is designed to let applications be continuously available, even in the face of
software upgrades and hardware failures.

Scalability: The kinds of applications that people want to write for the cloud are often meant to
handle lots of users. Yet the traditional Windows Server programming model wasnt explicitly
designed to support Internet-scale applications. The Windows Azure programming model, however,
was intended from the start to do this. Created for the cloud era, its designed to let developers build
the scalable applications that massive cloud data centers can support. Just as important, it also allows
applications to scale down when necessary, letting them use just the resources they need.

Whether a developer uses an IaaS technology or a PaaS offering such as Windows Azure, building
applications on cloud platforms has some inherent benefits. Both approaches let you pay only for the
computing resources you use, for example, and both let you avoid waiting for your IT department to
deploy servers. Yet important as they are, these benefits arent the topic here. Instead, the focus is
entirely on making clear what the Windows Azure programming model is and what it offers.
THE THREE RULES OF THE WINDOWS AZURE
PROGRAMMING MODEL

To get the benefits it promises, the Windows Azure programming model imposes three rules on
applications:

A Windows Azure application is built from one or more roles.

A Windows Azure application runs multiple instances of each role.

A Windows Azure application behaves correctly when any role instance fails.

Its worth pointing out that Windows Azure can run applications that dont follow all of these rulesit
doesnt actually enforce them. Instead, the platform simply assumes that every application obeys all
three. Still, while you might choose to run an application on Windows Azure that violates one or more
of the rules, be aware that this application isnt actually using the Windows Azure programming model.
Unless you understand and follow the models rules, the application might not run as you expect it to.
A WINDOWS AZURE APPLICATION IS BUILT FROM ONE OR
MORE ROLES

Whether an application runs in the cloud or in your data center, it can almost certainly be divided into
logical parts. Windows Azure formalizes these divisions into roles. A role includes a specific set of code,
such as a .NET assembly, and it defines the environment in which that code runs. Windows Azure today
lets developers create three different kinds of roles:

Web role: As the name suggests, Web roles are largely intended for logic that interacts with the
outside world via HTTP. Code written as a Web role typically gets its input through Internet
Information Services (IIS), and it can be created using various technologies, including ASP.NET,
Windows Communication Foundation (WCF), PHP, and Java.

Worker role: Logic written as a Worker role can interact with the outside world in various waysits
not limited to HTTP. For example, a Worker role might contain code that converts videos into a
standard format or calculates the risk of an investment portfolio or performs some kind of data
analysis.

Virtual Machine (VM) role: A VM role runs an imagea virtual hard disk (VHD)of a Windows Server
2008 R2 virtual machine. This VHD is created using an on-premises Windows Server machine, then
uploaded to Windows Azure. Once its stored in the cloud, the VHD can be loaded on demand into a
VM role and executed. From January 2012 onwards the Virtual Machine (VM) role is replaced by
Windows Azure Virtual Machine.

All three roles are useful. The VM role was made available quite recently, however, and so its fair to say
that the most frequently used options today are Web and Worker roles. Figure 1 shows a simple
Windows Azure application built with one Web role and one Worker role.




This application might use a Web role to accept HTTP requests from users, then hand off the work these
users request, such as reformatting a video file and making it available for viewing, to a Worker role. A
primary reason for this two-part breakdown is that dividing tasks in this way can make an application
easier to scale.

Its also fine for a Windows Azure application to consist of just a single Web role or a single Worker role
you dont have to use both. A single application can even contain different kinds of Web and Worker
roles. For example, an application might have one Web role that implements a browser interface, perhaps
built using ASP.NET, and another Web role that exposes a Web services interface implemented using
WCF. Similarly, a Windows Azure application that performed two different kinds of data analysis might
define a distinct Worker role for each one. To keep things simple, though, well assume that the example
application described here has just one Web role and one Worker role.

As part of building a Windows Azure application, a developer creates a service definition file that names
and describes the applications roles. This file can also specify other information, such as the ports each
role can listen on. Windows Azure uses this information to build the correct environment for running
the application.
A WINDOWS AZURE APPLICATION RUNS MULTIPLE
INSTANCES OF EACH ROLE

Every Windows Azure application consists of one or more roles. When it executes, an application
that conforms to the Windows Azure programming model must run at least two copiestwo distinct
instancesof each role it contains. Each instance runs as its own VM, as Figure 2 shows.




























Figure 2: A Windows Azure application runs multiple instances of each role.

As described earlier, the example application shown here has just one Web role and one Worker role. A
developer can tell Windows Azure how many instances of each role to run through a service
configuration file (which is distinct from the service definition file mentioned in the previous section).
Here, the developer has requested four instances of the applications Web role and three instances of its
Worker role.

Every instance of a particular role runs the exact same code. In fact, with most Windows Azure
applications, each instance is just like all of the other instances of that roletheyre interchangeable. For
example, Windows Azure automatically load balances HTTP requests across an applications Web role
instances. This load balancing doesnt support sticky sessions, so theres no way to direct all of a clients
requests to the same Web role instance. Storing client-specific state, such as a shopping cart, in a
particular Web role instance wont work, because Windows Azure provides no way to guarantee that all
of a clients requests will be handled by that instance. Instead, this kind of state must be stored
externally, as described later.
A WINDOWS AZURE APPLICATION BEHAVES CORRECTLY
WHEN ANY ROLE INSTANCE FAILS
An application that follows the Windows Azure programming model must be built using roles, and it
must run two or more instances of each of those roles. It must also behave correctly when any of those
role instances fails. Figure 3 illustrates this idea.




























Figure 3: A Windows Azure application behaves correctly even when a role instance fails.

Here, the application shown in Figure 2 has lost two of its Web role instances and one of its Worker role
instances. Perhaps the computers they were running on failed, or maybe the physical network
connection to these machines has gone down. Whatever the reason, the applications performance is
likely to suffer, since there are fewer instances to carry out its work. Still, the application remains up and
functioning correctly
If all instances of a particular role fail, an application will stop behaving as it shouldthis cant be
helped. Yet the requirement to work correctly during partial failures is fundamental to the Windows
Azure programming model. In fact, the service level agreement (SLA) for Windows Azure requires
running at least two instances of each role. Applications that run only one instance of any role cant get
the guarantees this SLA provides.

The most common way to achieve this is by making every role instance equivalent, as with load-balanced
Web roles accepting user requests. This isnt strictly required, however, as long as the failure of a single
role instance doesnt break the application. For example, an application might use a group of Worker
role instances to cache data for Web role instances, with each Worker role instance holding different
data. If any Worker role instance fails, a Web role instance trying to access the cached data it contained
behaves just as it would if the data wasnt found in the cache (e.g., it accesses persistent storage to
locate that data). The failure might cause the application to run more slowly, but as seen by a user, it still
behaves correctly.

One more important point to keep in mind is that even though the sample application described so far
contains only Web and Worker rules, all of these rules also apply to applications that use VM roles. Just
like the others, every VM role must run at least two instances to qualify for the Windows Azure SLA,
and the application must continue to work correctly if one of these instances fails. Even with VM roles,
Window Azure still provides a form of PaaSits not traditional IaaS.
WHAT THE WINDOWS AZURE PROGRAMMING MODEL
PROVIDES

The Windows Azure programming model is based on Windows, and the bulk of a Windows developers
skills are applicable to this new environment. Still, its not the same as the conventional Windows Server
programming model. So why bother to understand it? How does it help create better applications? To
answer these questions, its first worth explaining a little more about how Windows Azure works. Once
this is clear, understanding how the Windows Azure programming model can help create better software
is simple.
SOME BACKGROUND: THE FABRIC CONTROLLER

Windows Azure is designed to run in data centers containing lots of computers. Accordingly, every
Windows Azure application runs on multiple machines simultaneously. Figure 4 shows a simple example
of how this looks.






























Figure 4: The Windows Azure fabric controller creates instances of an applications roles on different
machines, then monitors their execution.

As Figure 4 shows, all of the computers in a particular Windows Azure data center are managed by
an application called the fabric controller. The fabric controller is itself a distributed application that
runs across multiple computers.

When a developer gives Windows Azure an application to run, he provides the code for the applications
roles together with the service definition and service configuration files for this application. Among
other things, this information tells the fabric controller how many instances of each role it should create.
The fabric controller chooses a physical machine for each instance, then creates a VM on that machine
and starts the instance running. As the figure suggests, the role instances for a single application are
spread across different machines within this data center.

Once its created these instances, the fabric controller continues to monitor them. If an instance fails for
any reasonhardware or softwarethe fabric controller will start a new instance for that role. While
failures might cause an applications instance count to temporarily drop below what the developer
requested, the fabric controller will always start new instances as needed to maintain the target number
for each of the applications roles. And even though Figure 4 shows only Web and Worker roles, VM roles
are handled in the same way, with each of the roles instances running on a different physical machine.
THE BENEFITS: IMPROVED ADMINISTRATION, AVAILABILITY, AND SCALABILITY

Applications built using the Windows Azure programming model can be easier to administer, more
available, and more scalable than those built on traditional Windows servers. These three attributes are
worth looking at separately.

The administrative benefits of Windows Azure flow largely from the fabric controller. Like every operating
system, Windows must be patched, as must other system software. In on-premises environments, doing
this typically requires some human effort. In Windows Azure, however, the process is entirely automated:
The fabric controller handles updates for Web and Worker role instances (although not for VM role
instances). When necessary, it also updates the underlying Windows servers those VMs run on. The result
is lower costs, since administrators arent needed to handle this function.

Lowering costs by requiring less administration is good. Helping applications be more available is also
good, and so the Windows Azure programming model helps improve application availability in
several ways. They are the following:

Protection against hardware failures. Because every application is made up of multiple instances of
each role, hardware failuresa disk crash, a network fault, or the death of a server machinewont
take down the application. To help with this, the fabric controller doesnt choose machines for an
applications instances at random. Instead, different instances of the same role are placed in different
fault domains. A fault domain is a set of hardwarecomputers, switches, and morethat share a
single point of failure. (For example, all of the computers in a single fault domain might rely on the
same switch to connect to the network.) Because of this, a single hardware failure cant take down an
entire application. The application might temporarily lose some instances, but it will continue to
behave correctly.

Protection against software failures. Along with hardware failures, the fabric controller can also
detect failures caused by software. If the code in an instance crashes or the VM in which its running
goes down, the fabric controller will start either just the code or, if necessary, a new VM for that role.
While any work the instance was doing when it failed will be lost, the new instance will become part
of the application as soon as it starts running.

The ability to update applications with no application downtime. Whether for routine maintenance or
to install a whole new version, every application needs to be updated. An application built using the
Windows Azure programming model can be updated while its runningtheres no need to take it
down. To allow this, different instances for each of an applications roles are placed in different
update domains (which arent the same as the fault domains described earlier). When a new version
of the application needs to be deployed, the fabric controller can shut down the instances in just one
update domain, update the code for these, then create new instances from that new code. Once
those instances are running, it can do the same thing to instances in the next update domain, and so
on. While users might see different versions of the application during this process, depending on
which instance they happen to interact with, the application as a whole remains continuously
available.

The ability to update Windows and other supporting software with no application downtime. The
fabric controller assumes that every Windows Azure application follows the three rules listed earlier,
and so it knows that it can shut down some of an applications instances whenever it likes, update the
underlying system software, then start new instances. By doing this in chunks, never shutting down
all of a roles instances at the same time, Windows and other software can be updated beneath a
continuously running application.

Availability is important for most applicationssoftware isnt useful if its not running when you need it
but scalability can also matter. The Windows Azure programming model helps developers build more
scalable applications in two main ways:

Automatically creating and maintaining a specified number of role instances. As already described, a
developer tells Windows Azure how many instances of each role to run, and the fabric controller
creates and monitors the requested instances. This makes application scalability quite
straightforward: Just tell Windows Azure what you need. Because this cloud platform runs in very
large data centers, getting whatever level of scalability an application needs isnt generally a problem.

Providing a way to modify the number of executing role instances for a running application: For
applications whose load varies, scalability is more complicated. Setting the number of instances just
once isnt a good solution, since different loads can make the ideal instance count go up or down
significantly. To handle this situation, Windows Azure provides both a Web portal for people and an
API for applications to allow changing the desired number of instances for each role while an
application is running.

Making applications simpler to administer, more available, and more scalable is useful, and so using the
Windows Azure programming model generally makes sense. But as mentioned earlier, its possible to run
applications on Windows Azure that dont follow this model. Suppose, for example, that you build an
application using a single role (which is permitted) but then run only one instance of that role (violating
the second and third rules). You might do this to save money, since Windows Azure charges separately
for each running instance. Anybody who chooses this option should understand, however, that the fabric
controller wont know that his application doesnt follow all three rules. It will shut down this single
instance at unpredictable times to patch the underlying software, then restart a new one. To users, this
means that the application will go down from time to time, since theres no other instance to take over.
This isnt a bug in Windows Azure; its a fundamental aspect of how the technology works.

Getting all of the benefits that Windows Azure offers requires conforming to the rules of its programming
model. Moving existing applications from Windows Server to Windows Azure can require some work, a topic
addressed in more detail later in this paper. For new applications, however, the argument for using the
Windows Azure model is clear. Why not build an application that costs less to administer? Why not build an
application that need never go down? Why not build an application that can easily scale up and down? Over
time, its reasonable to expect more and more applications to be created using the Windows
Azure programming model.
IMPLICATIONS OF THE WINDOWS AZURE PROGRAMMING
MODEL: WHAT ELSE CHANGES?

Building applications for Windows Azure means following the three rules of its programming model.
Following these rules isnt enough, thoughother parts of a developers world must also adjust. The
changes the Windows Azure programming model brings to the broader development environment can be
grouped into three areas:

How role instances interact with the operating system.

How role instances interact with persistent storage.
How role instances interact with other role instances.

This section looks at all three.
INTERACTIONS WITH THE OPERATING SYSTEM

For an application running on a typical Windows Server machine, the administrator of that machine is in
control. She can reboot VMs or the machine they run on, install Windows patches, and do whatever
else is required to keep that computer available. In Windows Azure, however, all of the servers are
owned by the fabric controller. It decides when VMs or machines should be rebooted, and for Web and
Worker roles (although not for VM roles), the fabric controller also installs patches and other updates to
the system software in every instance.

This approach has real benefits, as already described. It also creates restrictions, however. Because the
fabric controller owns the physical and virtual machines that Windows Azure applications use, its free to
do whatever it likes with them. This implies that letting a Windows Azure application modify the system
it runs onletting it run in administrator mode rather than user modepresents some challenges. Since
the fabric controller can modify the operating system at will, theres no guarantee that changes a role
instance makes to the system its running on wont be overwritten. Besides, the specific virtual (and
physical) machines an application runs in change over time. This implies that any changes made to the
default local environment must be made each time a role instance starts running.

In its first release, Windows Azure simply didnt allow applications to modify the systems they ran on
applications only ran in user mode. This restriction has been relaxedboth Web and Worker roles now
give developers the option to run applications in admin modebut the overall programming model hasnt
changed. Anybody creating a Windows Azure application needs to understand what the fabric controller
is doing, then design applications accordingly.
INTERACTIONS WITH PERSISTENT STORAGE

Applications arent just codethey also use data. And just as the programming model must change to
make applications more available and more scalable, the way data is stored and accessed must also
change. The big changes are these:

Storage must be external to role instances. Even though each instance is its own VM with its own file
system, data stored in those file systems isnt automatically made persistent. If an instance fails, any
data it contains may be lost. This implies that for applications to work correctly in the face of failures,
data must be stored persistently outside role instances. Another role instance can now access data
that otherwise would have been lost if that data had been stored locally on a failed instance.

Storage must be replicated. Just as a Windows Azure application runs multiple role instances to allow
for failures, Windows Azure storage must provide multiple copies of data. Without this, a single
failure would make data unavailable, something thats not acceptable for highly available
applications.

Storage must be able to handle very large amounts of data. Traditional relational systems arent
necessarily the best choice for very large data sets. Since Windows Azure is designed in part for
massively scalable applications, it must provide storage mechanisms for handling data at this scale.

To allow this, the platform offers blobs for storing binary data along with a non-SQL approach called
tables for storing large structured data sets.

Figure 5 illustrates these three characteristics, showing how Windows Azure storage looks to an
application.

































Figure 5: While applications see a single copy, Windows Azure storage replicates all blobs and tables
three times.

In this example, a Windows Azure application is using two blobs and one table from Windows Azure
storage. The application sees each blob and table as a single entity, but under the covers, Windows Azure
storage actually maintains three instances of each one. These copies are spread across different physical
machines, and as with role instances, those machines are in different fault domains. This improves the
applications availability, since data is still accessible even when some copies are unavailable. And because
persistent data is stored outside any of the applications role instances, an instance failure loses only
whatever data it was using at the moment it failed.

The Windows Azure programming model requires an application to behave correctly when a role instance
fails. To do this, every instance in an application must store all persistent data in Windows Azure storage
or another external storage mechanism (such as SQL Azure, Microsofts cloud-based service for relational
data). Theres one more option worth mentioning, however: Windows Azure drives. As already described,
any data an application writes to the local file system of its own VM can be lost when that VM stops
running. Windows Azure drives change this, using a blob to provide persistent storage for the file system
of a particular instance. These drives have some limitationsonly one instance at a time is allowed to

both read from and write to a particular Windows Azure drive, for example, with all other instances in this
application allowed only read accessbut they can be useful in some situations.
INTERACTIONS AMONG ROLE INSTANCES

When an application is divided into multiple parts, those parts commonly need to interact with one
another. In a Windows Azure application, this is expressed as communication between role instances.
For example, a Web role instance might accept requests from users, then pass those requests to a
Worker role instance for further processing.

The way this interaction happens isnt identical to how its done with ordinary Windows applications.
Once again, a key fact to keep in mind is that, most often, all instances of a particular role are
equivalenttheyre interchangeable. This means that when, say, a Web role instance passes work to a
Worker role instance, it shouldnt care which particular instance gets the work. In fact, the Web role
instance shouldnt rely on instance-specific things like a Worker role instances IP address to
communicate with that instance. More generic mechanisms are required.

The most common way for role instances to communicate in Windows Azure applications is
through Windows Azure queues. Figure 6 illustrates the idea.






























Figure 6: Role instances can communicate through queues, each of which replicates the messages it
holds three times.

In the example shown here, a Web role instance gets work from a user of the application, such as a person
making a request from a browser (step 1). This instance then creates a message containing this work and
writes it into a Windows Azure queue (step 2). These queues are implemented as part of Windows Azure
storage, and so like blobs and tables, each queue is replicated three times, as the figure


shows. As usual, this provides fault-tolerance, ensuring that the queues messages are still available if
a failure occurs.

Next, a Worker role instance reads the message from the queue (step 3). Notice that the Web role
instance that created this message doesnt care which Worker role instance gets itin this application,
theyre all equivalent. That Worker role instance does whatever work the message requires (step 4),
then deletes the message from the queue (step 5).

This last stepexplicitly removing the message from the queueis different from what on-premises
queuing technologies typically do. In Microsoft Message Queuing (MSMQ), for example, an application can
do a read inside an atomic transaction. If the application fails before completing its work, the transaction
aborts, and the message automatically reappears on the queue. This approach guarantees that every
message sent to an MSMQ queue is delivered exactly once in the order in which it was sent.

Windows Azure queues dont support transactional reads, and so they dont guarantee exactly-once,
in-order delivery. In the example shown in Figure 6, for instance, the Worker role instance might finish
processing the message, then crash just before it deletes this message from the queue. If this happens,
the message will automatically reappear after a configurable timeout period, and another Worker role
instance will process it. Unlike MSMQ, Windows Azure queues provide at-least-once semantics: A
message might be read and processed one or more times.

This raises an obvious question: Why dont Windows Azure queues support transactional reads? The
answer is that transactions require locking, and so they necessarily slow things down (especially with the
message replication provided by Windows Azure queues). Given the primary goals of the platform, its
designers opted for the fastest, most scalable approach.

Most of the time, queues are the best way for role instances within an application to communicate. Its
also possible for instances to interact directly, however, without going through a queue. To allow this,
Windows Azure provides an API that lets an instance discover all other instances in the same application
that meet specific requirements, then send a request directly to one of those instances. In the most
common case, where all instances of a particular role are equivalent, the caller should choose a target
instance randomly from the set the API returns. This isnt always truemaybe a Worker role
implements an in-memory cache with each role instance holding specific data, and so the caller must
access a particular one. Most often, though, the right approach is to treat all instances of a role as
interchangeable.
MOVING WINDOWS SERVER APPLICATIONS TO WINDOWS
AZURE

Anybody building a new Windows Azure application should follow the rules of the Windows Azure
programming model. To move an existing application from Windows Server to Windows Azure,
however, that application should also be made to follow the same rules. In addition, the application
might need to change how it interacts with the operating system, how it uses persistent storage, and the
way its components interact with each other.

How easy it is to make these changes depends on the application. Here are a few
representative examples:

An ASP.NET application with multiple load-balanced instances that share state stored in SQL Server.
This kind of application typically ports easily to Windows Azure, with each instance of the original
application becoming an instance of a Web or Worker role. Applications like this dont use sticky
sessions, which helps make them a good fit for Windows Azure. (Using ASP.NET session state is
acceptable, however, since Windows Azure provides an option to store session state persistently in
Windows Azure Storage tables.) And moving an on-premises SQL Server database to SQL Azure is
usually straightforward.

An ASP.NET application with multiple instances that maintains per-instance state and relies on sticky
sessions. Because it maintains client-specific state in each instance between requests, this application
will need some changes. Windows Azure doesnt support sticky sessions, and so making the
application run on this cloud platform will require redesigning how it handles state.

A Silverlight or Windows Presentation Foundation (WPF) client that accesses WCF services running in
a middle tier. If the services dont maintain per-client state between calls, moving them to Windows
Azure is straightforward. The client will continue to run on user desktops, as always, but it will now
call services running on Windows Azure. If the current services do maintain per-client state, however,
theyll need to be redesigned.

An application with a single instance running on Windows Server that maintains state on its own
machine. Whether the clients are browsers or something else, many enterprise applications are built
this way today, and they wont work well on Windows Azure without some redesign. It might be
possible to run this application unchanged in a single VM role instance, but its users probably wont
be happy with the results. For one thing, the Windows Azure SLA doesnt apply to applications with
only a single instance. Also, recall that the fabric controller can at any time reboot the machine on
which this instance runs to update that machines software. The application has no control over when
this happens; it might be smack in the middle of a workday. Since theres no second instance to take
overthe application wasnt built to follow the rules of the Windows Azure programming modelit
will be unavailable for some period of time, and so anybody using the application will have their work
interrupted while the machine reboots. Even though the VM role makes it easy to move a Windows
Server binary to Windows Azure, this doesnt guarantee that the application will run successfully in its
new home. The application must also conform to the rules of the Windows Azure programming
model.

A Visual Basic 6 application that directly accesses a SQL Server database, i.e., a traditional
client/server application. Making this application run on Windows Azure will most likely require
rewriting at least the client business logic. While it might be possible to move the database (including
any stored procedures) to SQL Azure, then redirect the clients to this new location, the applications
desktop component wont run as is on Windows Azure. Windows Azure doesnt provide a local user
interface, and it also doesnt support using Remote Desktop Services (formerly Terminal Services) to
provide remote user interfaces.

Windows Azure can help developers create better applications. Yet the improvements it offers require
change, and so moving existing software to this new platform can take some effort. Making good
decisions requires understanding both the potential business value and any technical challenges that
moving an application to Windows Azure might bring.


CONCLUSION

Cloud platforms are a new world, and they open new possibilities. Reflecting this, the Windows Azure
programming model helps developers create applications that are easier to administer, more available, and
more scalable than those built in the traditional Windows Server environment. Doing this requires following
three rules:

A Windows Azure application is built from one or more roles.

A Windows Azure application runs multiple instances of each role.

A Windows Azure application behaves correctly when any role instance fails.

Using this programming model successfully also requires understanding the changes it brings to how applications
interact with the operating system, use persistent storage, and communicate between role instances. For
developers willing to do this, however, the value is clear. While its not right for every scenario, the Windows
Azure programming model can be useful for anybody who wants to create easier to administer, more available,
and more scalable applications.
FOR FURTHER READING

Introducing Windows Azure: http://go.microsoft.com/?linkid=9682907

Introducing the Windows Azure Platform: http://go.microsoft.com/?linkid=9752185

Cloud service concept
When you create an application and run it in Windows Azure, the code and configuration together
are called a Windows Azure cloud service (known as a hosted service in earlier Windows Azure
releases).
By creating a cloud service, you can deploy a multi-tier application in Windows Azure, defining
multiple roles to distribute processing and allow flexible scaling of your application. A cloud service
consists of one or more web roles and/or worker roles, each with its own application files and
configuration.
For a cloud service, Windows Azure maintains the infrastructure for you, performing routine
maintenance, patching the operating systems, and attempting to recover from service and hardware
failures. If you define at least two instances of every role, most maintenance, as well as your own
service upgrades, can be performed without any interruption in service. A cloud service must have at
least two instances of every role to qualify for the Windows Azure Service Level Agreement, which
guarantees external connectivity to your Internet-facing roles at least 99.95 of the time.
Each cloud service has two environments to which you can deploy your service package and
configuration. You can deploy a cloud service to the staging environment to test it before you
promote it to production. Promoting a staged cloud service to production is a simple matter of
swapping the virtual IP addresses (VIPs) that are associated with the two environments.
cloud service role: A cloud service role is comprised of application files and a configuration. A cloud
service can have two types of role:
web role:A web role provides a dedicated Internet Information Services (IIS) web-server used
for hosting front-end web applications.
worker role: Applications hosted within worker roles can run asynchronous, long-running or
perpetual tasks independent of user interaction or input.
role instance: A role instance is a virtual machine on which the application code and role
configuration run. A role can have multiple instances, defined in the service configuration file.
guest operating system: The guest operating system for a cloud service is the operating system
installed on the role instances (virtual machines) on which your application code runs.
cloud service components: Three components are required in order to deploy an application as a
cloud service in Windows Azure:
service definition file: The cloud service definition file (.csdef) defines the service model,
including the number of roles.
service configuration file: The cloud service configuration file (.cscfg) provides configuration
settings for the cloud service and individual roles, including the number of role instances.
service package: The service package (.cspkg) contains the application code and the service
definition file.
cloud service deployment: A cloud service deployment is an instance of a cloud service deployed to
the Windows Azure staging or production environment. You can maintain deployments in both
staging and production.
deployment environments: Windows Azure offers two deployment environments for cloud services:
astaging environment in which you can test your deployment before you promote it to
the production environment. The two environments are distinguished only by the virtual IP addresses
(VIPs) by which the cloud service is accessed. In the staging environment, the cloud service's globally
unique identifier (GUID) identifies it in URLs (GUID.cloudapp.net). In the production environment, the
URL is based on the friendlier DNS prefix assigned to the cloud service (for
example, myservice.cloudapp.net).
swap deployments: To promote a deployment in the Windows Azure staging environment to the
production environment, you can "swap" the deployments by switching the VIPs by which the two
deployments are accessed. After the deployment, the DNS name for the cloud service points to the
deployment that had been in the staging environment.
minimal vs. verbose monitoring: Minimal monitoring, which is configured by default for a cloud
service, uses performance counters gathered from the host operating systems for role instances
(virtual machines).Verbose monitoring gathers additional metrics based on performance data within
the role instances to enable closer analysis of issues that occur during application processing. For
more information, see How to Monitor Cloud Services.
Windows Azure Diagnostics: Windows Azure Diagnostics is the API that enables you to collect
diagnostic data from applications running in Windows Azure. Windows Azure Diagnostics must be
enabled for cloud service roles in order for verbose monitoring to be turned on.
link a resource: To show your cloud service's dependencies on other resources, such as a Windows
Azure SQL Database instance, you can "link" the resource to the cloud service. In the Preview
Management Portal, you can view linked resources on the Linked Resources page, view their status
on the dashboard, and scale a linked SQL Database instance along with the service roles on
the Scale page. Linking a resource in this sense does not connect the resource to the application;
you must configure the connections in the application code.
scale a cloud service: A cloud service is scaled out by increasing the number of role instances
(virtual machines) deployed for a role. A cloud service is scaled in by decreasing role instances. In the
Preview Management Portal, you can also scale a linked SQL Database instance, by changing the SQL
Database edition and the maximum database size, when you scale your service roles.
Windows Azure Service Level Agreement (SLA): The Windows Azure Compute SLA guarantees
that, when you deploy two or more role instances for every role, access to your cloud service will be
maintained at least 99.95 percent of the time. Also, detection and corrective action will be initiated
99.9 percent of the time when a role instances process is not running. For more information,
see Service Level Agreements.

LECTURE 5: Data Management and Business
Analytics
Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do
this, Windows Azure provides a range of technologies for working with relational and non-relational
data. This article introduces each of the options.
Table of Contents
Blob Storage
Running a DBMS in a Virtual Machine
SQL Database
o SQL Data Sync
o SQL Data Reporting
Table Storage
Hadoop
Blob Storage
The word "blob" is short for "Binary Large OBject", and it describes exactly what a blob is: a collection
of binary information. Yet even though theyre simple, blobs are quite useful. Figure 1 illustrates the
basics of Windows Azure Blob Storage.

Figure 1: Windows Azure Blob Storage stores binary datablobsin containers.
To use blobs, you first create a Windows Azure storage account. As part of this, you specify the
Windows Azure datacenter that will store the objects you create using this account. Wherever it lives,
each blob you create belongs to some container in your storage account. To access a blob, an
application provides a URL with the form:
http://<StorageAccount>.blob.core.windows.net/<Container>/<BlobName>
<StorageAccount> is a unique identifier assigned when a new storage account is created, while
<Container> and <BlobName> are the names of a specific container and a blob within that
container.
Windows Azure provides two different kinds of blobs. The choices are:
Block blobs, each of which can contain up to 200 gigabytes of data. As its name suggests, a block
blob is subdivided into some number of blocks. If a failure occurs while transferring a block blob,
retransmission can resume with the most recent block rather than sending the entire blob again.
Block blobs are a quite general approach to storage, and theyre the most commonly used blob type
today.
Page blobs, which can be as large at one terabyte each. Page blobs are designed for random access,
and so each one is divided into some number of pages. An application is free to read and write
individual pages at random in the blob. In Windows Azure Virtual Machines, for example, VMs you
create use page blobs as persistent storage for both OS disks and data disks.
Whether you choose block blobs or page blobs, applications can access blob data in several different
ways. The options include the following:
Directly through a RESTful (i.e., HTTP-based) access protocol. Both Windows Azure applications and
external applications, including apps running on premises, can use this option.
Using the Windows Azure Storage Client library, which provides a more developer-friendly interface
on top of the raw RESTful blob access protocol. Once again, both Windows Azure applications and
external applications can access blobs using this library.
Using Windows Azure drives, an option that lets a Windows Azure application treat a page blob as a
local drive with an NTFS file system. To the application, the page blob looks like an ordinary
Windows file system accessed using standard file I/O. In fact, reads and writes are sent to the
underlying page blob that implements the Windows Azure Drive.
To guard against hardware failures and improve availability, every blob is replicated across three
computers in a Windows Azure datacenter. Writing to a blob updates all three copies, so later reads
wont see inconsistent results. You can also specify that a blobs data should be copied to another
Windows Azure datacenter in the same region but at least 500 miles away. This copying, called geo-
replication, happens within a few minutes of an update to the blob, and its useful for disaster
recovery.
Data in blobs can also be made available via the Windows Azure Content Delivery Network (CDN). By
caching copies of blob data at dozens of servers around the world, the CDN can speed up access to
information thats accessed repeatedly.
Simple as they are, blobs are the right choice in many situations. Storing and streaming video and
audio are obvious examples, as are backups and other kinds of data archiving. Developers can also
use blobs to hold any kind of unstructured data they like. Having a straightforward way to store and
access binary data can be surprisingly useful.
Running a DBMS in a Virtual Machine
Many applications today rely on some kind of database management system (DBMS). Relational
systems such as SQL Server are the most frequently used choice, but non-relational approaches,
commonly known as NoSQLtechnologies, get more popular every day. To let cloud applications use
these data management options, Windows Azure Virtual Machines allows you to run a DBMS
(relational or NoSQL) in a VM. Figure 2 shows how this looks with SQL Server.

Figure 2: Windows Azure Virtual Machines allows running a DBMS in a VM, with persistence
provided by blobs.
To both developers and database administrators, this scenario looks much like running the same
software in their own datacenter. In the example shown here, for instance, nearly all of SQL Servers
capabilities can be used, and you have full administrative access to the system. You also have the
responsibility of managing the database server, of course, just as if it were running locally.
As Figure 2 shows, your databases appear to be stored on the local disk of the VM the server runs in.
Under the covers, however, each of those disks is written to a Windows Azure blob. (Its similar to
using a SAN in your own datacenter, with a blob acting much like a LUN.) As with any Windows
Azure blob, the data it contains is replicated three times within a datacenter and, if you request it,
geo-replicated to another datacenter in the same region. Its also possible to use options such as
SQL Server database mirroring for improved reliability.
Another way to use SQL Server in a VM is to create a hybrid application, where the data lives on
Windows Azure while the application logic runs on-premises. For example, this might make sense
when applications running in multiple locations or on various mobile devices must share the same
data. To make communication between the cloud database and on-premises logic simpler, an
organization can use Windows Azure Virtual Network to create a virtual private network (VPN)
connection between a Windows Azure datacenter and its own on-premises datacenter.
SQL Database
For many people, running a DBMS in a VM is the first option that comes to mind for managing
structured data in the cloud. Its not the only choice, though, nor is it always the best choice. In some
cases, managing data using a Platform as a Service (PaaS) approach makes more sense. Windows
Azure provides a PaaS technology called SQL Database that lets you do this for relational
data. Figure 3 illustrates this option.

Figure 3: SQL Database provides a shared PaaS relational storage service.
SQL Database doesnt give each customer its own physical instance of SQL Server. Instead, it
provides a multi-tenant service, with a logical SQL Database server for each customer. All customers
share the compute and storage capacity that the service provides. And as with Blob Storage, all data
in SQL Database is stored on three separate computers within a Windows Azure datacenter, giving
your databases built-in high availability (HA).
To an application, SQL Database looks much like SQL Server. Applications can issue SQL queries
against relational tables, use T-SQL stored procedures, and execute transactions across multiple
tables. And because applications access SQL Database using the Tabular Data Stream (TDS) protocol,
the same protocol used to access SQL Server, they can work with data using Entity Framework,
ADO.NET, JDBC, and other familiar data access interfaces.
But because SQL Database is a cloud service running in Windows Azure data centers, you dont need
to manage any of the systems physical aspects, such as disk usage. You also dont need to worry
about updating software or handling other low-level administrative tasks. Each customer
organization still controls its own databases, of course, including their schemas and user logins, but
many of the mundane administrative tasks are done for you.
While SQL Database looks much like SQL Server to applications, it doesnt behave exactly the same
as a DBMS running on a physical or virtual machine. Because it runs on shared hardware, its
performance will vary with the load placed on that hardware by all of its customers. This means that
the performance of, say, a stored procedure in SQL Database might vary from one day to the next.
Today, SQL Database lets you create a database holding up to 150 gigabytes. If you need to work
with larger databases, the service provides an option called Federation. To do this, a database
administrator creates two or more federation members, each of which is a separate database with its
own schema. Data is spread across these members, something thats often referred to as sharding,
with each member assigned a unique federation key. An application issues SQL queries against this
data by specifying the federation key that identifies the federation member the query should target.
This allows using a traditional relational approach with large amounts of data. As always, there are
trade-offs; neither queries nor transactions can span federation members, for instance. But when a
relational PaaS service is the best choice and these trade-offs are acceptable, using SQL Federation
can be a good solution.
SQL Database can be used by applications running on Windows Azure or elsewhere, such as in your
on-premises datacenter. This makes it useful for cloud applications that need relational data, as well
as on-premises applications that can benefit from storing data in the cloud. A mobile application
might rely on SQL Database to manage shared relational data, for instance, as might an inventory
application that runs at multiple dealers around the world.
Thinking about SQL Database raises an obvious (and important) issue: When should you run SQL
Server in a VM, and when is SQL Database a better choice? As usual, there are trade-offs, and so
which approach is better depends on your requirements.
One simple way to think about it is to view SQL Database as being for new applications, while SQL
Server in a VM is a better choice when youre moving an existing on-premises application to the
cloud. It can also be useful to look at this decision in a more fine-grained way, however. For example,
SQL Database is easier to use, since theres minimal setup and administration. But running SQL
Server in a VM can have more predictable performanceits not a shared serviceand it also
supports larger non-federated databases than SQL Database. Still, SQL Database provides built-in
replication of both data and processing, effectively giving you a high-availability DBMS with very
little work. While SQL Server gives you more control and a somewhat broader set of options, SQL
Database is simpler to set up and significantly less work to manage.
Finally, its important to point out that SQL Database isnt the only PaaS data service available on
Windows Azure. Microsoft partners provide other options as well. For example, ClearDB offers a
MySQL PaaS offering, while Cloudant sells a NoSQL option. PaaS data services are the right solution
in many situations, and so this approach to data management is an important part of Windows
Azure.
SQL Data Sync
While SQL Database does maintain three copies of each database within a single Windows Azure
datacenter, it doesnt automatically replicate data between Windows Azure datacenters. Instead, it
provides SQL Data Sync, a service that you can use to do this. Figure 4 shows how this looks.

Figure 4: SQL Data Sync synchronizes data in SQL Database with data in other Windows Azure
and on-premises datacenters.
As the diagram shows, SQL Data Sync can synchronize data across different locations. Suppose
youre running an application in multiple Windows Azure datacenters, for instance, with data stored
in SQL Database. You can use SQL Data Sync to keep that data synchronized. SQL Data Sync can also
synchronize data between a Windows Azure datacenter and an instance of SQL Server running in an
on-premises datacenter. This might be useful for maintaining both a local copy of data used by on-
premises applications and a cloud copy used by applications running on Windows Azure. And
although its not shown in the figure, SQL Data Sync can also be used to synchronize data between
SQL Database and SQL Server running in a VM on Windows Azure or elsewhere.
Synchronization can be bi-directional, and you determine exactly what data is synchronized and how
frequently its done. (Synchronization between databases isnt atomic, howevertheres always at
least some delay.) And however its used, setting up synchronization with SQL Data Sync is entirely
configuration-driven; theres no code to write.
SQL Data Reporting
Once a database contains data, somebody will probably want to create reports using that data. To let
you do this with data stored in SQL Database, Windows Azure provides SQL Reporting. This cloud
service provides a subset of the functionality in SQL Server Reporting Services (SSRS), the reporting
technology included with SQL Server. In its initial incarnation, SQL Reporting is aimed primarily at
independent software vendors (ISVs) who need to embed reports in their applications. Figure
5 shows how the process works.

Figure 5: Windows Azure SQL Reporting provides reporting services for data in SQL Database.
Before a user can see a report, someone defines what that report should look like (step 1). With SQL
Reporting, this can be done using either of two tools: SQL Server Data Tools, part of SQL Server
2012, or its predecessor, Business Intelligence (BI) Development Studio. As with SSRS, these report
definitions are expressed in the Report Definition Language (RDL). After the RDL files for a report
have been created, they are uploaded to SQL Reporting in the cloud (step 2). The report definition is
now ready to use.
Next, a user of the application accesses the report (step 3). The application passes this request to
SQL Reporting (step 4), which contacts SQL Database to get the data it needs (step 5). SQL Reporting
uses this data and the relevant RDL files to render the report (step 6), then returns the report to the
application (step 7), which displays it to the user (step 8).
Embedding a report in an application, the scenario shown here, isnt the only option. Its also
possible to view reports in a SQL Reporting portal or in other ways. Reports can also be combined,
with one report containing a link to another.
Like SQL Database, SQL Reporting is a multi-tenant PaaS service. You can use it immediatelytheres
nothing to installand it requires minimal management. Microsoft monitors the service, provides
patches, handles scaling, and does the other work needed to keep the service available. While its
possible to run reports on SQL Database tables using the on-premises version of SSRS, SQL
Reporting is typically a better alternative for adding reporting to Windows Azure applications.
Table Storage
Relational data is useful in many situations, but its not always the right choice. If your application
needs fast, simple access to very large amounts of loosely structured data, for instance, a relational
database might not work well. A NoSQL technology is likely to be a better option.
Windows Azure Table Storage is an example of this kind of NoSQL approach. Despite its name, Table
Storage doesnt support standard relational tables. Instead, it provides whats known as a key/value
store, associating a set of data with a particular key, then letting an application access that data by
providing the key. Figure 6illustrates the basics.

Figure 6: Windows Azure Table Storage is a key/value store that provides fast, simple access to
large amounts of data.
Like blobs, each table is associated with a Windows Azure storage account. Tables are also named
much like blobs, with a URL of the form
http://<StorageAccount>.table.core.windows.net/<TableName>
As the figure shows, each table is divided into some number of partitions, each of which can be
stored on a separate machine. (This is a form of sharding, as with SQL Federation.) Both Windows
Azure applications and applications running elsewhere can access a table using either the RESTful
OData protocol or the Windows Azure Storage Client library.
Each partition in a table holds some number of entities, each containing as many as 255 properties.
Every property has a name, a type (such as Binary, Bool, DateTime, Int, or String), and a value. Unlike
relational storage, these tables have no fixed schema, and so different entities in the same table can
contain properties with different types. One entity might have just a String property containing a
name, for example, while another entity in the same table has two Int properties containing a
customer ID number and a credit rating.
To identify a particular entity within a table, an application provides that entitys key. The key has two
parts: apartition key that identifies a specific partition and a row key that identifies an entity within
that partition. InFigure 6, for example, the client requests the entity with partition key A and row key
3, and Table Storage returns that entity, including all of the properties it contains.
This structure lets tables be biga single table can contain up to 100 terabytes of dataand it
allows fast access to the data they contain. It also brings limitations, however. For example, theres no
support for transactional updates that span tables or even partitions in a single table. A set of
updates to a table can only be grouped into an atomic transaction if all of the entities involved are in
the same partition. Theres also no way to query a table based on the value of its properties, nor is
there support for joins across multiple tables. And unlike relational databases, tables have no support
for stored procedures.
Windows Azure Table Storage is a good choice for applications that need fast, cheap access to large
amounts of loosely structured data. For example, an Internet application that stores profile
information for lots of users might use tables. Fast access is important in this situation, and the
application probably doesnt need the full power of SQL. Giving up this functionality to gain speed
and size can sometimes make sense, and so Table Storage is just the right solution for some
problems.
Hadoop
Organizations have been building data warehouses for decades. These collections of information,
most often stored in relational tables, let people work with and learn from data in many different
ways. With SQL Server, for instance, its common to use tools such as SQL Server Analysis Services to
do this.
But suppose you want to do analysis on non-relational data. Your data might take many forms:
information from sensors or RFID tags, log files in server farms, clickstream data produced by web
applications, images from medical diagnostic devices, and more. This data might also be really big,
too big to be used effectively with a traditional data warehouse. Big data problems like this, rare just
a few years ago, have now become quite common.
To analyze this kind of big data, our industry has largely converged on a single solution: the open-
source technology Hadoop. Hadoop runs on a cluster of physical or virtual machines, spreading the
data it works on across those machines and processing it in parallel. The more machines Hadoop has
to use, the faster it can complete whatever work its doing.
This kind of problem is a natural fit for the public cloud. Rather than maintaining an army of on-
premises servers that might sit idle much of the time, running Hadoop in the cloud lets you create
(and pay for) VMs only when you need them. Even better, more and more of the big data that you
want to analyze with Hadoop is created in the cloud, saving you the trouble of moving it around. To
help you exploit these synergies, Microsoft provides a Hadoop service on Windows Azure. Figure
7 shows the most important components of this service.

Figure 7: Hadoop on Windows Azure runs MapReduce jobs that process data in parallel using
multiple virtual machines.
To use Hadoop on Windows Azure, you first ask this cloud platform to create a Hadoop cluster,
specifying the number of VMs you need. Setting up a Hadoop cluster yourself is a non-trivial task,
and so letting Windows Azure do it for you makes sense. When youre done using the cluster, you
shut it down. Theres no need to pay for compute resources that you arent using.
A Hadoop application, commonly called a job, uses a programming model known as MapReduce. As
the figure shows, the logic for a MapReduce job runs simultaneously across many VMs. By
processing data in parallel, Hadoop can analyze data much more rapidly than single-machine
solutions.
On Windows Azure, the data a MapReduce job works on is typically kept in blob storage. In Hadoop,
however, MapReduce jobs expect data to be stored in the Hadoop Distributed File System (HDFS).
HDFS is similar to Blob Storage in some ways; it replicates data across multiple physical servers, for
example. Rather than duplicate this functionality, Hadoop on Windows Azure instead exposes Blob
Storage through the HDFS API, as the figure shows. While the logic in a MapReduce job thinks its
accessing ordinary HDFS files, the job is in fact working with data streamed to it from blobs. And to
support the case where multiple jobs are run over the same data, Hadoop on Windows Azure also
allow copying data from blobs into full HDFS running in the VMs.
MapReduce jobs are commonly written in Java today, an approach that Hadoop on Windows Azure
supports. Microsoft has also added support for creating MapReduce jobs in other languages,
including C#, F#, and JavaScript. The goal is to make this big data technology more easily accessible
to a larger group of developers.
Along with HDFS and MapReduce, Hadoop includes other technologies that let people analyze data
without writing a MapReduce job themselves. For example, Pig is a high-level language designed for
analyzing big data, while Hive offers a SQL-like language called HiveQL. Both Pig and Hive actually
generate MapReduce jobs that process HDFS data, but they hide this complexity from their users.
Both are provided with Hadoop on Windows Azure.
Microsoft also provides a HiveQL driver for Excel. Using an Excel add-in, business analysts can create
HiveQL queries (and thus MapReduce jobs) directly from Excel, then process and visualize the results
using PowerPivot and other Excel tools. Hadoop on Windows Azure includes other technologies as
well, such as the machine learning libraries Mahout, the graph mining system Pegasus, and more.
Big data analysis is important, and so Hadoop is also important. By providing Hadoop as a managed
service on Windows Azure, along with links to familiar tools such as Excel, Microsoft aims at making
this technology accessible to a broader set of users.
More broadly, data of all kinds is important. This is why Windows Azure includes a range of options
for data management and business analytics. Whatever application youre trying to create, its likely
that youll find something in this cloud platform that will work for you.

Lecture 6:
Windows Azure SQL Database
9 out of 13 rated this helpful - Rate this topic
Microsoft Windows Azure SQL Database is a cloud-based relational database service that is built on SQL
Server technologies and runs in Microsoft data centers on hardware that is owned, hosted, and
maintained by Microsoft. This topic provides an overview of Windows Azure SQL Database and describes
some ways in which it is different from SQL Server.
Similarities and Differences
Similar to an instance of SQL Server on your premises, Windows Azure SQL Database exposes a tabular
data stream (TDS) interface for Transact-SQL-based database access. This allows your database
applications to use Windows Azure SQL Database in the same way that they use SQL Server. Because
Windows Azure SQL Database is a service, administration in Windows Azure SQL Database is slightly
different.
Unlike administration for an on-premise instance of SQL Server, Windows Azure SQL Database abstracts
the logical administration from the physical administration; you continue to administer databases, logins,
users, and roles, but Microsoft administers the physical hardware such as hard drives, servers, and storage.
This approach helps Windows Azure SQL Database provide a large-scale multi-tenant database service
that offers enterprise-class availability, scalability, security, and self-healing.
Because Microsoft handles all of the physical administration, there are some differences between
Windows Azure SQL Database and an on-premise instance of SQL Server in terms of administration,
provisioning, Transact-SQL support, programming model, and features. For more information,
seeTransact-SQL Support (Windows Azure SQL Database) and Tools and Utilities Support (Windows Azure
SQL Database).
Logical Administration and Physical Administration
Although Windows Azure SQL Database plays an active role in managing the physical resources of the
database, the DBA plays a very important role in administering SQL Database-based database
applications. Using Windows Azure SQL Database, DBAs manage schema creation, statistics management,
index tuning, query optimization, and security administration (logins, users, roles, and so on). For more
information about security administration in Windows Azure SQL Database, see Managing Databases and
Logins in Windows Azure SQL Database.
Database administration in Windows Azure SQL Database differs most from SQL Server in terms of
physical administration. Windows Azure SQL Database automatically replicates all data to provide high
availably. Windows Azure SQL Database also manages load balancing and, in case of a server failure,
transparent fail-over.
To provide this level of physical administration, you cannot control the physical resources of Windows
Azure SQL Database. For example, you cannot specify the physical hard drive or file group where a
database or index will reside. Because the computer file system is not accessible and all data is
automatically replicated, SQL Server backup and restore commands are not applicable to Windows Azure
SQL Database.
Note
SQL Database allows you to back up your database by copying it to a new database in SQL Database.
For more information, see Copying Databases in Windows Azure SQL Database.
Although backup and restore commands are not available, you can also use SQL Server Integration
Services and the SQLCMD utility to bulk copy data. For more information about using SQLCMD with
Windows Azure SQL Database, see How to: Connect to Windows Azure SQL Database Using sqlcmd.
Provisioning
When preparing an on-premises SQL Server deployment, it may be the role of the DBA or IT department
to prepare and configure the required hardware and software. When using Windows Azure SQL Database,
these tasks are performed by the SQL Database provisioning process.
You can begin provisioning your SQL Databases after you create a Windows Azure platform account. This
account allows you to access all the services, such as Windows Azure, Windows Azure AppFabric, and
Windows Azure SQL Database, and is used to set up and manage your subscriptions.
Each SQL Database subscription may be bound to one or more SQL Database servers at the Microsoft
data center. Your SQL Database server is an abstraction that defines a grouping of databases. To enable
load balancing and high availability, databases associated with your SQL Database server may reside on
separate physical computers at the Microsoft data center.
For more information about provisioning, see Windows Azure SQL Database Provisioning Model.
Transact-SQL Support
Many SQL Server Transact-SQL statements have parameters that allow you to specify file groups or
physical file paths. These types of parameters are not supported in Windows Azure SQL Database because
they have dependencies on the physical configuration. In such cases, the command is considered partially
supported. For more information about Transact-SQL support, see Transact-SQL Support (Windows Azure
SQL Database).
Features and Types
Windows Azure SQL Database does not support all of the features and data types found in SQL Server.
Analysis Services, Replication, and Service Broker are not currently provided as services on the Windows
Azure platform.
Because Windows Azure SQL Database performs the physical administration, any statements and options
that attempt to directly manipulate physical resources will be blocked, such as Resource Governor, file
group references, and some physical server DDL statements. It is also not possible to set server options
and SQL trace flags or use the SQL Server Profiler or the Database Tuning Advisor utilities.
Windows Azure SQL Database supports many SQL Server 2008 data types; it does not support data types
that have been deprecated from SQL Server 2008. For more information about data type support in
Windows Azure SQL Database, see Data Types (Windows Azure SQL Database). For more information
about SQL Server 2008 deprecated types, see Deprecated Database Engine Features in SQL Server 2008.

Compare SQL Server with Windows Azure SQL Database
(en-US)
Windows Azure SQL Database is a cloud-based relational database service from Microsoft. SQL Database provides
relational database functionality as a utility service. Cloud-based database solutions such as SQL Database can
provide many benefits, including rapid provisioning, cost-effective scalability, high availability, and reduced
management overhead. This paper provides an architectural overview of SQL Database, and describes how you can
use SQL Database to augment your existing on-premises data infrastructure or as your complete database solution.
Last Reviewed: 8/26/2011
Table of Contents

Similarities and Differences
Logical Administration vs. Physical Administration
Provisioning
Transact-SQL Support
Features and Types
Key Benefits of the Service
o Self-Managing
o High Availability
o Scalability
o Familiar Development Model
o Relational Data Model
See Also
Other Languages

Similarities and Differences
Similar to an instance of SQL Server on your premises, SQL Database exposes a tabular data stream (TDS) interface for
Transact-SQL-based database access. This allows your database applications to use SQL Database in the same way
that they use SQL Server. Since SQL Database is a service, administration in SQL Database is slightly different.
Unlike administration for an on-premise instance of SQL Server, SQL Database abstracts the logical administration
from the physical administration; you continue to administer databases, logins, users, and roles, but Microsoft
administers and configures the physical hardware such as hard drives, servers, and storage. This approach helps SQL
Database provide a large-scale multi-tenant database service that offers enterprise-class availability, scalability,
security, and self-healing.
Since Microsoft handles all of the physical administration, there are some differences between SQL Database and an
on-premise instance of SQL Server in terms of administration, provisioning, Transact-SQL support, programming
model, and features. For more information, see Guidelines and Limitations (Windows Azure SQL Database) .

Logical Administration vs. Physical Administration
Although SQL Database plays an active role in managing the physical resources of the database, the DBA plays a very
important role in administering SQL Database-based database applications. Using SQL Database, DBAs manage
schema creation, statistics management, index tuning, query optimization, and security administration (logins, users,
roles, etc.). For more information about security administration in SQL Database, see Managing Databases and Logins
in Windows Azure SQL Database .
Database administration in SQL Database differs most from SQL Server in terms of physical administration. SQL
Database automatically replicates all data to provide high availability. SQL Database also manages load balancing
and, in case of a server failure, transparent fail-over to a healthy machine hosting one of the backup copies of your
database.
To provide this level of physical administration, you cannot control the physical resources of SQL Database. For
example, you cannot specify the physical hard drive or file group where a database or index will reside. Because the
computer file system is not accessible and all data is automatically replicated, SQL Server backup and restore
commands are not applicable to SQL Database. The SQL Database service still backs up all databases; however they
are not accessible to regular users. This is a feature that may be offered in future.
Starting with SQL Database Service Update 4, SQL Database allows you to back up your database by copying it to a
new database in SQL Database. For more information, see Copying Databases in Windows Azure SQL Database .
For more information on the available options to transfer data to SQL Database, see Migrating Databases to Windows
Azure SQL Database .

Provisioning
When preparing an on-premises SQL Server deployment, it may be the role of the DBA or IT department to prepare
and configure the required hardware and software. When using SQL Database, these tasks are performed by the SQL
Database provisioning process.
You can begin provisioning your SQL Databases after you create a Windows Azure Platform account. This account
allows you to access all the services, such as Windows Azure, AppFabric, and SQL Database, and is used to set up and
manage your subscriptions.
Each SQL Database subscription is bound to one SQL Database server within one of the Microsoft data centers. Your
SQL Database server is an abstraction that defines a grouping of databases. To enable load-balancing and high
availability, databases associated with your SQL Database server may reside on separate physical computers within
the Microsoft data center.

For more information about provisioning, see Windows Azure SQL Database Provisioning Model .

Transact-SQL Support
Transact-SQL is a language that contains commands used to administer instances of SQL Server including creating
and managing all objects in an instance of SQL Server, and inserting, retrieving, modifying, and deleting all data in
tables. Applications can communicate with an instance of SQL Server by sending Transact-SQL statements to the
server. Windows Azure SQL Database supports a subset of Transact-SQL for SQL Server. For more information about
Transact-SQL support, see Transact-SQL Support (Windows Azure SQL Database) .

Features and Types
SQL Database does not support all of the features and data types found in SQL Server. Analysis Services, Replication,
and Service Broker are not currently provided as services on the SQL Database. You can connect from on-premises
Analysis Server to SQL Database and SQL Database can be used either as a data source or destination. When this
article is being updated, the Customer Technology Preview of Windows Azure SQL Reporting is also available. SQL
Reporting is a cloud-based reporting service built on SQL Database, SQL Server, and SQL Server Reporting Services
technologies. You can publish, view, and manage reports that display data from SQL Database data sources.
Because SQL Database performs the physical administration, any statements and options that attempt to directly
manipulate physical resources will be blocked, such as Resource Governor, file group references, and some physical
server DDL statements. It is also not possible to set server options and SQL trace flags or use the SQL Server Profiler
or the Database Tuning Advisor utilities.

Key Benefits of the Service
The benefits of using SQL Database include manageability, high availability, scalability, a familiar development model,
and a relational data model.
Self-Managing
SQL Database offers the scale and functionality of an enterprise data center without the administrative overhead that
is associated with on-premise instances of SQL Server. This self-managing capability enables organizations to
provision data services for applications throughout the enterprise without adding to the support burden of the central
IT department or distracting technology-savvy employees from their core tasks in order to maintain a departmental
database application.
With SQL Database, you can provision your data storage in minutes. This reduces the initial costs of data services by
enabling you to provision only what you need. When your needs change, you can easily extend your cloud-based
data storage to meet those needs.
High Availability
SQL Database is built on proven Windows Server and SQL Server technologies, and is flexible enough to cope with
any variations in usage and load. The service replicates multiple redundant copies of your data to multiple physical
servers to maintain data availability and business continuity. In the case of a hardware failure, SQL Database provides
automatic failover to ensure availability for your application.
Scalability
A key advantage of SQL Database is the ease with which you can scale your solution. As data grows, databases need
to either scale up or scale out. Scale up always has a ceiling whereas scale out has no virtual limits. A common scale
out technique is data-partitioning. After partitioning your data, the service scales as your data grows. A pay-as-you-
grow pricing model makes sure that you only pay for the storage that you use, so that you can also scale down the
service when you do not need it.
Familiar Development Model
When developers create on-premise applications that use SQL Server, they use client libraries like ADO.NET, ODBC
that use the tabular data stream (TDS) protocol to communicate between client and server. SQL Database provides
the same TDS interface as SQL Server so that you can use the same tools and libraries to build client applications for
data that is stored in SQL Database. For more about TDS, see Network Protocols and TDS Endpoints .
Relational Data Model
SQL Database will seem very familiar to developers and administrators because data is stored in SQL Database just
like it is stored in SQL Server, by using Transact-SQL. Conceptually similar to an on-premise instance of SQL Server, a
SQL Database server is logical group of databases that acts as an authorization boundary.
Within each SQL Database server, you can create multiple databases that have tables, views, stored procedures,
indices, and other familiar database objects. This data model makes good use of your existing relational database
design and Transact-SQL programming skills, and simplifies the process of migrating existing on-premise database
applications to SQL Database. For more about Transact-SQL and its relationship to SQL Database, see Transact-SQL
Support (Windows Azure SQL Database) .
SQL Database servers and databases are virtual objects that do not correspond to physical servers and databases. By
insulating you from the physical implementation, SQL Database enables you to spend time on your database design
and adding additional value to the business.
The following table provides a high-level comparison between SQL Database and SQL Server.
Feature
SQL Server
(On-premise)
SQL Database Mitigation
Data Storage No size limits
as such
* The Web Edition Database is best suited for
small Web applications and workgroup or
departmental applications. This edition supports
a database with a maximum size of 1 or 5 GB of
data.
* The Business Edition Database is best suited for
independent software vendors (ISVs), line-of-
business (LOB) applications, and enterprise
applications. This edition supports a database of
up to 150 GB of data, in increments of 10 GB.
Exact size and pricing information can be
obtained atPricing Overview .
An archival process can be created where
older data can be migrated to another
database in SQL Database or on premise.
Because of above size constraints, one of the
recommendations is to partition the data across
databases. Creating multiple databases will
allow you take maximum advantage of the
computing power of multiple nodes. The
biggest value in the Azure model is the
elasticity of being able to create as many
databases as you need, when your demand
peaks and delete/drop the databases as your
demand subsides. The biggest challenge is
writing the application to scale across multiple
databases. Once this is achieved, the logic can
be extended to scale across N number of
databases.
Edition Express
Workgroup
Standard
Enterprise
* Web Edition
* Business Edition
For more information, seeAccounts and Billing in
Windows Azure SQL Database .


Connectivity SQL Server
Management
* The SQL Server Management Studio from SQL
Server 2008 R2 and SQL Server 2008 R2 Express
can be used to access, configure, manage and

Studio
SQLCMD
administer SQL Database. Previous versions of
SQL Server Management Studio are not
supported.
*The Management portal for Windows Azure SQL
Database
* SQLCMD

For more information, see Tools and Utilities
Support .
Data Migration For more information, seeMigrating Databases to
Windows Azure SQL Database .

Authentication * SQL
Authentication
* Windows
Authentication
SQL Server Authentication only Use SQL Server authentication
Schema No such
limitation
SQL Database does not support heaps. ALL tables
must have a clustered index before data can be
inserted.
Check all scripts to make sure all table creation
scripts include a clustered index. If a table is
created without a clustered constraint, a
clustered index must be created before an
insert operation is allowed on the table.
TSQL
Supportability
Certain Transact-SQL commands are fully
supported; some are partially supported while
others are unsupported.
* Supported Transact-
SQL:http://msdn.microsoft.com/en-
us/library/ee336270.aspx
* Partially Supported Transact-
SQL:http://msdn.microsoft.com/en-
us/library/ee336267.aspx
* Unsupported Transact-
SQL:http://msdn.microsoft.com/en-
us/library/ee336253.aspx

USE command Supported In SQL Database, the USE statement does not
switch between databases. To change databases,
you must directly connect to the database.
In SQL Database, each of the databases created
by the user may not be on the same physical
server. So the application has to retrieve data
separately from multiple databases and
consolidate at the application level.
Transactional
Replication
Supported Not supported You can use BCP or SSIS to get the data out on-
demand into an on premise SQL Server. When
this article is being updated, the Customer
Technology Preview of SQL Data Sync is also
available. You can use it to keep on-premise
SQL Server and SQL Database in sync, as well as
two or more SQL Database servers.
For more information on available migration
options, seeMigrating Databases to Windows
Azure SQL Database .
Log Shipping Supported Not supported
Database
Mirroring
Supported Not supported
SQL Agent Supported Cannot run SQL agent or jobs on SQL Database You can run SQL Server Agent on your on-
premise SQL Server and connect to SQL
Database.
Server options Supported Some system views are supported
For more information, seeSystem Views
(Windows Azure SQL Database) on MSDN.
The idea is most system level metadata is
disabled as it does not make sense in a cloud
model to expose server level information
Connection
Limitations
N/A In order to provide a good experience to all SQL
Database customers, your connection to the
service may be closed. For more information,
see General Guidelines and Limitations on
MSDN and SQL Database Connection
Management.

SQL Server
Integration
Services (SSIS)
Can run SSIS
on-premise
SSIS service not available on Azure platform Run SSIS on site and connect to SQL Database
with ADO.NET provider


Federations in Windows Azure SQL
Database (formerly SQL Azure)
Federations in SQL Database are a way to achieve greater scalability and performance from the database
tier of your application through horizontal partitioning. One or more tables within a database are split by
row and portioned across multiple databases (Federation members). This type of horizontal partitioning is
often referred to as sharding. The primary scenarios in which this is useful are where you need to achieve
scale, performance, or to manage capacity.
SQL Database can deliver scale, performance, and additional capacity through federation, and can do so
dynamically with no downtime; client applications can continue accessing data during repartitioning
operations with no interruption in service.
Federation Architecture
A federation is a collection of database partitions that are defined by a federation distribution scheme,
known as the federation scheme. The federation scheme defines a federation distribution key, which
determines the distribution of data to partitions within the federation. The federation distribution key
must be an INT, BIGINT, UNIQUEIDENTIFIER, or VARBINARY (up to 900 bytes) and specifies a range value.
There can only be one federation scheme and one federation distribution key for a federation.
The database partitions within a federation are known as federation members, and each member covers a
part, or all, of the range of values covered by the data type of the federation distribution key. Federated
tables are tables which are spread across federation members. Each federation member has its own
schema, and contains the federated table rows that correspond to the federation members range. The
collection of all rows in a federation member that match a specific federation key value is called
afederation atomic unit. Each federation member contains many federation atomic units. A federation
member may also contain reference tables, which are tables that are not federation aware. Reference
tables are fully contained within a member, and often contain reference information that is retrieved in
combination with federated data.
A federation member provides physical separation between the data it contains and data stored in other
members. Each federation member has its own schema, which may temporarily diverge from the schema
of other members due to member specific processing such as performing a rolling schema upgrade
across all members.
While federation members are physically implemented as databases, they are logically referenced at the
application layer as a range of federation key values. For example, a federation member database that
contains rows associated with federation key values 50-100 would be logically accessed by specifying a
key value within that range rather than specifying the database name.
Federations are accessed through a federation root database, which represents the application boundary
of the federation. It functions as the logical endpoint for applications to connect to a federation by
routing connections to the appropriate federation member based on the specified federation key value.
Each root database may contain multiple federations, each with its own federation schema. It may also
contain global data, such as users, passwords, roles, or other application specific data.
The following diagrams illustrate the logical and physical model for federations:

Design Considerations
When designing a federation, one of the most important design decisions is what value to federate on.
Ideally you want to select a key that allows you to federate data from multiple, related tables so related
rows are stored together. For example, in the case of a multi-tenant application you might select the
tenant_id. The rows within each federated table that specify the same tenant_id value would be stored in
the same federation atomic unit.
You must also consider how to insert new records in such a way that all federation members are equally
utilized, instead of storing all new records in one member. Determining how to distribute new data
among federation members must be handled at the application layer.
Since there is a physical separation of data contained in different federation members and SQL Database
doesnt support join operations across databases, your application must implement the logic for joining
data from multiple federation members or multiple federations. For example, a query that needs to join
data from a two federations would need to perform separate queries against each and join the data within
the application. The same holds true for aggregating data across multiple shards within a single
federation, such as obtaining a count of all rows contained within the federation.

Identity: Windows Azure Active Directory
Windows Azure Active Directory (Windows Azure AD) is a modern, REST-based service that provides
identity management and access control capabilities for your cloud applications. Now you have one
identity service across Windows Azure, Microsoft Office 365, Dynamics CRM Online, Windows Intune
and other 3rd party cloud services. Windows Azure Active Directory provides a cloud-based identity
provider that easily integrates with your on-premises AD deployments and full support of third party
identity providers.
Use Windows Azure AD to:
Integrate with your on-premises active directory
Quickly extend your existing on-premises Windows Azure AD to apply policy and control and
authenticate users with their existing corporate credentials to Windows Azure and other cloud
services.
Offer access control for you applications
Easily manage access to your applications based on centralized policy and rules. Ensure consistent
and appropriate access to your organizations applications is maintained to meet critical internal
security and compliance needs. Windows Azure AD Access Control provides developers centralized
authentication and authorization for applications in Windows Azure using either consumer identity
providers or your on-premises Windows Server Active Directory
Build social connections across the enterprise
Windows Azure AD Graph is an innovative social enterprise graph providing an easy RESTful
interface for accessing objects such as Users, Groups, and Roles with an explorer view for easily
discovering information and relationships.
Provide single sign-on across your cloud applications
Provide your users with a seamless, single sign-on experience across Microsoft Online Services, third
party cloud services and applications built on Windows Azure with popular web identity providers
like Windows Live ID, Google, Yahoo!, and Facebook.

Windows Azure AppFabric Access Control

Access Control is part of the AppFabric set of middleware services in Windows Azure. It provides
a central point for handling federated security and access control to your services and
applications, running in the cloud or on premise. AppFabric Access Control has built in support
for federating against AD FS 2.0 or any custom identity provider that supports the WS-
Federation protocol.

In addition to supporting web based and SOAP based federation scenarios, Access Control also
support federation using the OAuth protocol and REST based services. It has a web based
management portal as well as OData based management services for configuring and managing
the service.

The Access Control service has a rule engine that can be used to transform the incoming claims,
thus being able to translate from a certain set of claims to another set of claims. It can also
translate token formats to/from SAML 1.0, SAML 2.0 and Simple Web Token (SWT) formats.

Using the Access Control service you set up trust between the service and every STS that you
want to federate with, including your own AD FS 2.0 STS:

Federated Identity through Windows Azure AppFabric Access Control

In this case your application running in Windows Azure still only has one point to point trust - to
the Access Control service and all the dependencies to the business partners are managed by
this service.

Below is a sequence diagram showing the generic flow for single sign on using the Access
Control service. The terms used in the diagram maps to our example as:

Client the client in our domain or our business partners domain.

Identity Provider our Active Directory + AD FS2 or our business partners STS.

Relying party our application running in Windows Azure



The above diagram shows the flow when the application is a web application and the client is a
web browser client (also known as a passive client). If the application would expose web
services that a locally installed smart client (an active client) would consume, for example WCF
services consumed by a WPF client the first step in the sequence would be to directly login to
the identity provider instead of being redirected to a login page.

Federation with public Identity Providers

Windows Azure Access Control also let you use some of the largest public Internet services as
Identity Providers. Out of the box there is built in support for using Windows Live ID, Facebook,
Yahoo and Google as identity providers. The Access Control service handles all protocol
transitions between the different providers, including Open ID 2.0 for Google and Yahoo,
Facebook Graph for Facebook, and WS-Federation for Windows Live ID. The service then
delivers a single SAML 1.1, SAML 2.0, or SWT token to your web application using the WS-
Federation protocol once a user is signed in.

This is of course a really interesting feature when you are building public facing consumer
oriented services, but could also be really useful in business to business scenarios. If for example
you are building a SAAS solution, using the public identity providers like Facebook and Windows
Live ID could be the option for smaller businesses while your enterprise customers could use
their own STS for federation.
Lecture 8:
WINDOWS AZURE SERVICE BUS
The Service Bus securely relays messages to and from any Web service regardless of the device or
computer on which they are hosted, or whether that device is behind a firewall or NAT router. It provides
direct one-way or request/response (relayed) messaging as well as brokered, or asynchronous, messaging
patterns.
The Service Bus and Access Control together make hybrid, connected applicationsapplications that
communicate from behind firewalls, across the Internet, from hosted cloud servers, between rich desktops
and smart deviceseasier to build, secure, and manage. Although you can build hybrid, connected
applications today, doing this often means you have to build important infrastructure components before
you can build the applications themselves. The Service Bus and Access Control provide several important
infrastructure elements so that you can more easily begin making your connected applications work now.
Software, Services, Clouds, and Devices
Todays business infrastructure is more feature-rich, connected, and interoperable than ever. People
access applications and data through stunning graphical programs running on traditional operating
systems; powerful Web applications that are running in browsers; and very small, intelligent computers
such as cell phones, netbooks, and other smart devices. Applications run locally, often on powerful servers
and server farms, and critical data is stored in performant databases, on desktops, and in the cloud.
The Internet connects people, applications, devices, and data across the world. Clouds of computing
powersuch as Windows Azurecan help us reduce costs and increase scalability and manageability.
Web services can expose functionality to any caller safely and securely.
With these technologies, platforms, and devices, you can build significantly distributed, interactive
applications that can reach almost anyone, use almost any useful data, and do both securely and robustly
regardless of where the user is at the moment. Such hybrid, connected programs including those often
referred to as Software plus Services -- could use proprietary or private data behind a firewall and return
only the appropriate results to a calling device, or notify that device when a particular event occurs.
Fulfilling the Potential
However, building these distributed applications currently is very, very hardand it should not be. There
are many reasons why, without a platform that solves these problems for you, it remains difficult to take
advantage of these wonderful technologies that could make a business more efficient, more productive,
and your customers happier.
Operating systems are still locatedtrapped is often a better wordon a local computer, typically behind
a firewall and perhaps network address translation (NAT) of some sort. This problem is true of smart
devices and phones, too.
As ubiquitous as Web browsers are, their reach into data is limited to an interactive exchange in a format
they understand.
Heterogeneous platforms, such as server applications, desktop or portable computers, smart devices, and
advanced cell phones, often interoperate at a rudimentary level, if at all. They can rarely share the same
code base or benefit from feature or component reuse.
Much of the most valuable data is stored in servers and embedded in applications that will not be
replaced immediatelysometimes referred to as legacy systems. The data in these systems are trapped
by technical limitations, security concerns, or privacy restrictions.
The Internet is not always the network being used. Private networks are an important part of the
application environment, and their insulation from the Internet is a simple fact of information technology
(IT) life.
Service Bus and Access Control are built to overcome these kinds of obstacles; they provide the fabric
that you can use to build, deploy, and manage the distributed applications that can help make the
promise of Software + Services become real. The Service Bus and Access Control services together are
highly-scalable services that are running in Microsoft data centers that can be used by applications
anywhere to securely bridge the gap between local applications behind a firewall, applications that are
running in the cloud, and client applications of any kind around the world. Another way of saying this is
that the Service Bus and Access Control are the glue that makes Software and Services work together.
Feature Overview
The Service Bus connects local, firewalled applications and data with applications in the cloud, rich
desktop applications, and smart, Web-enabled devices anywhere in the world.
Access Control Service is a claims-based access control service that can be used on most Web-enabled
devices to build interoperable, federated authentication and authorization into any connected application.
The following diagram illustrates this architecture.

Service Bus Features
Securely exposes to external callers Windows Communication Foundation (WCF)-based Web services that
are running behind firewalls and NAT routers -- without requiring you to open any inbound ports or
otherwise change firewall and router configurations.
Enables secure inbound communication from devices outside the firewall.
Provides a global namespace system that is location-independent: the name of a service in the Service
Bus provides no information about the final destination of the communication.
Provides a service registry for publishing and discovering service endpoint references in a service
namespace.
Provides relayed messaging capabilities: the relay service supports direct one-way messaging,
request/response messaging, and peer-to-peer messaging.
Provides brokered (or asynchronous) messaging capabilities: Senders and receivers do not have to be
online at the same time. The messaging infrastructure reliably stores messages until the receiving party is
ready to receive them. The core components of the brokered messaging infrastructure are Queues, Topics,
and Subscriptions.
Builds and hosts service endpoints that support:
o Exposing a Web service to remote users. Expose and secure a local Web service in the cloud without
managing any firewall or NAT settings.
o Eventing behavior. Listen for notifications on any device, anywhere in the world.
o Tunneling between any two endpoints to enable bidirectional streams.
The following diagram illustrates the capabilities of the Service Bus.


Relayed and Brokered Messaging
2 out of 2 rated this helpful - Rate this topic
The messaging pattern associated with the initial releases of the Windows Azure Service Bus is referred to
as relayed messaging. The latest version of the Service Bus adds another type of messaging option known
as brokered messaging. The brokered messaging scheme can also be thought of as asynchronous
messaging.
Relayed Messaging
The central component of the Service Bus is a centralized (but highly load-balanced) relay service that
supports a variety of different transport protocols and Web services standards. This includes SOAP, WS-*,
and even REST. The relay service provides a variety of different relay connectivity options and can even
help negotiate direct peer-to-peer connections when it is possible. The Service Bus is optimized for .NET
developers who use the Windows Communication Foundation (WCF), both with regard to performance
and usability, and provides full access to its relay service through SOAP and REST interfaces. This makes it
possible for any SOAP or REST programming environment to integrate with it.
The relay service supports traditional one-way messaging, request/response messaging, and peer-to-peer
messaging. It also supports event distribution at Internet-scope to enable publish/subscribe scenarios and
bi-directional socket communication for increased point-to-point efficiency. In the relayed messaging
pattern, an on-premise service connects to the relay service through an outbound port and creates a bi-
directional socket for communication tied to a particular rendezvous address. The client can then
communicate with the on-premises service by sending messages to the relay service targeting the
rendezvous address. The relay service will then relay messages to the on-premises service through the
bi-directional socket already in place. The client does not need a direct connection to the on-premises
service nor is it required to know where the service resides, and the on-premises service does not need
any inbound ports open on the firewall.
You must initiate the connection between your on-premise service and the relay service, using a suite of
WCF relay bindings. Behind the scenes, the relay bindings map to new transport binding elements
designed to create WCF channel components that integrate with the Service Bus in the cloud.
Relayed messaging provides many benefits, but requires the server and client to both be online at the
same time in order to send and receive messages. This is not optimal for HTTP-style communication, in
which the requests may not be typically long lived, nor for clients that connect only occasionally, such as
browsers, mobile applications, and so on. Brokered messaging supports decoupled communication, and
has its own advantages; clients and servers can connect when needed and perform their operations in an
asynchronous manner.
Brokered Messaging
In contrast to the relayed messaging scheme, brokered messaging can be thought of as asynchronous, or
temporally decoupled. Producers (senders) and consumers (receivers) do not have to be online at the
same time. The messaging infrastructure reliably stores messages until the consuming party is ready to
receive them. This allows the components of the distributed application to be disconnected, either
voluntarily; for example, for maintenance, or due to a component crash, without affecting the whole
system. Furthermore, the receiving application may only have to come online during certain times of the
day, such as an inventory management system that only is required to run at the end of the business day.
The core components of theService Bus brokered messaging infrastructure are Queues, Topics, and
Subscriptions. These components enable new asynchronous messaging scenarios, such as temporal
decoupling, publish/subscribe, and load balancing. For more information about these structures, see the
next section.
As with the relayed messaging infrastructure, the brokered messaging capability is provided for WCF and
.NET Framework programmers and also via REST.
What are Service Bus Queues
Service Bus Queues support a brokered messaging communication model. When using queues,
components of a distributed application do not communicate directly with each other, they instead
exchange messages via a queue, which acts as an intermediary. A message producer (sender) hands
off a message to the queue and then continues its processing. Asynchronously, a message consumer
(receiver) pulls the message from the queue and processes it. The producer does not have to wait for
a reply from the consumer in order to continue to process and send further messages. Queues
offer First In, First Out (FIFO) message delivery to one or more competing consumers. That is,
messages are typically received and processed by the receivers in the order in which they were
added to the queue, and each message is received and processed by only one message consumer.

Service Bus queues are a general-purpose technology that can be used for a wide variety of
scenarios:
Communication between web and worker roles in a multi-tier Windows Azure application
Communication between on-premises apps and Windows Azure hosted apps in a hybrid solution
Communication between components of a distributed application running on-premises in different
organizations or departments of an organization
Using queues can enable you to scale out your applications better, and enable more resiliency to
your architecture.
What are Service Bus Topics and Subscriptions
Service Bus topics and subscriptions support a publish/subscribe messaging
communication model. When using topics and subscriptions, components of a distributed
application do not communicate directly with each other, they instead exchange messages via a
topic, which acts as an intermediary.

In contrast to Service Bus queues, where each message is processed by a single consumer, topics and
subscriptions provide a one-to-many form of communication, using a publish/subscribe pattern. It is
possible to register multiple subscriptions to a topic. When a message is sent to a topic, it is then
made available to each subscription to handle/process independently.
A topic subscription resembles a virtual queue that receives copies of the messages that were sent to
the topic. You can optionally register filter rules for a topic on a per-subscription basis, which allows
you to filter/restrict which messages to a topic are received by which topic subscriptions.
Service Bus topics and subscriptions enable you to scale to process a very large number of messages
across a very large number of users and applications.
What is the Service Bus Relay
The Service Bus Relay service enables you to build hybrid applications that run in both a Windows
Azure datacenter and your own on-premises enterprise environment. The Service Bus relay facilitates
this by enabling you to securely expose Windows Communication Foundation (WCF) services that
reside within a corporate enterprise network to the public cloud, without having to open up a firewall
connection or requiring intrusive changes to a corporate network infrastructure.

The Service Bus relay allows you to host WCF services within your existing enterprise environment.
You can then delegate listening for incoming sessions and requests to these WCF services to the
Service Bus running within Windows Azure. This enables you to expose these services to application
code running in Windows Azure, or to mobile workers or extranet partner environments. The Service
Bus allows you to securely control who can access these services at a fine-grain level. It provides a
powerful and secure way to expose application functionality and data from your existing enterprise
solutions and take advantage of it from the cloud.

LECTURE 7:
NETWORKING AND CACHING IN
WINDOWS AZURE
Windows Azure Networking
The easiest way to connect to Windows Azure applications and data is through an ordinary Internet
connection. But this simple solution isnt always the best approach. Windows Azure also provides
three more technologies for connecting users to Windows Azure datacenters:
Virtual Network
Connect
Traffic Manager
This article takes a look at each of these.
Table of Contents
Windows Azure Virtual Network
Windows Azure Connect
Windows Azure Traffic Manager

Windows Azure Virtual Network
Windows Azure lets you create virtual machines (VMs) that run in Microsoft datacenters. Suppose
your organization wants to use those VMs to run enterprise applications or other software that will
be used by your firms employees. Maybe you want to create a SharePoint farm in the cloud, for
example, or run an inventory management application. To make life as easy as possible for your
users, youd like these applications to be accessible just as if they were running in your own
datacenter.
Theres a standard solution to this kind of problem: create a virtual private network (VPN).
Organizations of all sizes do this today to link, say, branch office computers to the main company
datacenter. This same approach can work with Windows Azure VMs, as Figure 1 shows.

Figure 1: Windows Azure Virtual Network allows creating a virtual network in the cloud thats
connected to your on-premises datacenter.
As the figure shows, Windows Azure Virtual Network lets you create a logical boundary around a
group of VMs, called a virtual network or VNET, in a Windows Azure datacenter. It then lets you
establish an IPsec connection between this VNET and your local network. The VMs in a VNET can be
created using Windows Azure Virtual Machines, Windows Azure Cloud Services, or both. In other
words, they can be VMs created using either Windows Azures Infrastructure as a Service (IaaS)
technology or its Platform as a Service (PaaS) technology. Whatever choice you make, creating the
IPsec connection requires a VPN gateway device, specialized hardware thats attached to your local
network, and it also requires the services of your network administrator. Once this connection is in
place, the Windows Azure VMs running in your VNET look like just another part of your
organizations network.
As Figure 1 suggests, you allocate IP addresses for the Windows Azure VMs from the same IP
address space used in your own network. In the scenario shown here, which uses private IP
addresses, the VMs in the cloud are just another IP subnet. Software running on your local network
will see these VMs as if they were local, just as they do with traditional VPNs. And its important to
note that because this connection happens at the IP level, the virtual and physical machines on both
sides can be running any operating system. Windows Azure VMs running Windows Server or Linux
can interact with on-premises machines running Windows, Linux, or other systems. Its also possible
to use mainstream management tools, including System Center and others, to manage the cloud
VMs and the applications they contain.
Using Windows Azure Virtual Network makes sense in many situations. As already mentioned, this
approach lets enterprise users more easily access cloud applications. An important aspect of this
ease of use is the ability to make the Windows Azure VMs part of an existing on-premises Active
Directory domain to give users single sign-on to the applications they run. You can also create an
Active Directory domain in the cloud if you prefer, then connect this domain to your on-premises
network.
Creating a VNET in a Windows Azure datacenter effectively gives you access to a large pool of on-
demand resources. You can create VMs on demand, pay for them while theyre running, then remove
them (and stop paying) when you no longer need them. This can be useful for scenarios that need
fast access to a preconfigured machine, such as development teams building new software. Rather
than wait for a local administrator to set up the resources they need, they can create these resources
themselves in the public cloud.
And just as Virtual Network makes Windows Azure VMs appear local to on-premises resources, the
reverse is also true: Software running in your local network now appears to be local to applications
running in your Windows Azure VNET. Suppose youd like to move an existing on-premises
application to Windows Azure, for example, because youve determined that it will be less expensive
to operate in the cloud. But what if the data that application uses is required by law to be stored on
premises? In a situation like this, using Virtual Network lets the cloud application see an on-premises
database system as if it were localaccessing it becomes straightforward. Whatever scenario you
choose, the result is the same: Windows Azure becomes an extension of your own datacenter.

Windows Azure Connect
Sometimes, connecting your entire on-premises network to a group of Windows Azure VMs is the
right thing to do. Windows Azure Virtual Network is designed to solve this problem. But what if you
dont need a solution thats this general? Suppose instead that all youd like to do is connect a single
Windows Azure applicationor even a single VMto a specific group of computers on your local
network. Addressing this problem is the goal of Windows Azure Connect, as Figure 2 shows.

Figure 2: Windows Azure Connect links one or more VMs in Windows Azure with a group of
on-premises machines running Windows.
Unlike Virtual Network, Connect doesnt require using a VPN gateway device, nor does it require the
services (or approval) of a network administrator. Instead, anybody with administrative access to a
Windows machine in the local network can install the required Windows Azure Connect software on
that machine. Once this is done, the software can create an IPsec link with designated Windows
Azure VMs.
As the figure shows, Connect doesnt link two networks together; the Windows Azure VMs retain
whatever IP addresses they already have. Instead, it creates direct IPsec connections between specific
on-premises Windows computers and specific Windows Azure VMs. (To work with existing firewall
settings, Connect actually sends IPsec on top of an SSL connection.) For Cloud Services applications,
you can choose one or more roles to connect to, and Windows Azure will make it possible to
communicate with each instance in those roles. For VMs created using Windows Azure Virtual
Machines, you can install the same Windows Azure Connect software used for on-premises
computers.
Windows Azure Connect is useful in a variety of situations. An application running on Windows Azure
might use Connect to link to an on-premises database system, for example, or a developer on the
local network might use Connect to domain-join a cloud VM to an on-premises environment. While
Connect isnt as general a solution as Virtual Network, it is significantly easier to set up. Developers
can do it without bothering their network admins and with no extra hardware. Which approach is
right for you depends on exactly what problems you need to solve.

Windows Azure Traffic Manager
Imagine that youve built a successful Windows Azure application. Your app is used by many people
in many countries around the world. This is a great thing, but as is so often the case, success brings
new problems. Here, for instance, your application most likely runs in multiple Windows Azure
datacenters in different parts of the world. How can you intelligently route traffic across these
datacenters so that your users always get the best experience?
Windows Azure Traffic Manager is designed to solve this problem. Figure 3 shows how.

Figure 3: Windows Azure Traffic Manager intelligently directs requests from users across
instances of an application running in different Windows Azure datacenters.
In this example, your application is running in VMs spread across four datacenters: two in the US,
one in Europe, and one in Asia. Suppose a user in Berlin wishes to access the application. If youre
using Traffic Manager, heres what happens.
As usual, the users system looks up the DNS name of the application (step 1). This query is
redirected to the Windows Azure DNS system (step 2), which then looks up the Traffic Manager
policy for this application. Each policy is created by the owner of a particular Windows Azure
application, either through a graphical interface or a REST API. However its created, the policy
specifies one of three options:
Performance: All requests are sent to the closest datacenter.
Failover: All requests are sent to the datacenter specified by the creator of this policy, unless that
datacenter is unavailable. In this case, requests are routed to other datacenters in the priority order
defined by the policys creator.
Round Robin: All requests are spread equally across all datacenters in which the application is
running.
Once it has the right policy, Traffic Manager figures out which datacenter this request should go to
based on which of the three options is specified (step 3). It then returns the location of the chosen
datacenter to the user (step 4), who accesses that instance of the application (step 5).
For this to work, Traffic Manager must have a current picture of which instances of the application
are up and running in each datacenter. To make this possible, Traffic Manager periodically pings
each copy of the application via an HTTP GET, then records whether it receives a response. If an
application instance stops responding, Traffic Manager will stop sending traffic to that instance until
it resumes responding to pings.
Not every application is big enough or global enough to need Traffic Manager. For those that do,
however, this can be a quite useful service.

Caching in Windows Azure
3 out of 5 rated this helpful - Rate this topic
Windows Azure Caching enables you to provision a cache in the cloud, to be used from any applications
or services that could benefit from Caching. ASP.NET applications can use Caching for the common
scenario of session state and output caching. Caching increases performance by temporarily storing
information from other backend sources. High performance is achieved by maintaining this cache in-
memory across multiple cache servers. For a Windows Azure solution, Caching can reduce the costs and
increase the scalability of other storage services such as SQL Database or Azure storage.
There are two main ways to use Caching:
Caching (Preview) where Caching is deployed on web/worker roles of your application
Shared Caching where Caching is consumed as a managed service
Caching (Preview) on Roles
Windows Azure Caching (Preview) allows you to host Caching within your Azure roles. This capability is
also referred to as role-based Caching. There are two main deployment topologies for this type of
Caching: dedicated and co-located. In the dedicated topology, you define a worker role that is dedicated
to Caching. This means that all of the worker role's available memory is used for the Caching and
operating overhead. In a co-located topology, you use a percentage of available memory on application
roles for Caching. For example, you could assign 20% of the physical memory for Caching on each web
role instance. In both cases, you only pay for the Compute services required for the running role instances.
For more information, see Windows Azure Caching (Preview) FAQ.
Note
Caching (Preview) role-based Caching is not supported in production at this time.
Shared Caching
Windows Azure Shared Caching enables you to register a cache through the Windows Azure
Management Portal. Theses caches do not reside on your own roles. Instead, they reside on a group of
servers in a multitenant environment. You can access your cache with a Service URL and Authentication
token from the Management Portal. In this model, you pay for one of several cache offerings that vary in
memory, bandwidth, transactions, and client connections. For more information, see Windows Azure
Shared Caching FAQ.
Note
Windows Azure Caching features are a subset of the features of the on-premise caching solution of
Windows Server AppFabric. For more information, see Differences Between Caching On-Premises
and in the Cloud.
Important
Windows Azure Caching is designed for Windows Azure applications hosted in the cloud. This
architecture achieves the best throughput at the lowest latency. With Shared Caching, it is possible
to test on-premises code that accesses a Windows Azure cache, but this design is not supported for
production. On-premises applications can instead rely on an on-premises cache cluster that uses
Windows Server AppFabric.


LECTURE 8:
OTHER TOPICS
MEDIA SERVICES
What are Media Services?
Windows Azure Media Services form an extensible media platform that integrates the best of the Microsoft Media
Platform and third-party media components in Windows Azure. Media Services provide a media pipeline in the cloud
that enables industry partners to extend or replace component technologies. ISVs and media providers can use Media
Services to build end-to-end media solutions. This overview describes the general architecture and common
development scenarios for Media Services.
The following diagram illustrates the basic Media Services architecture.

Media Services Feature Support
The current release of Media Services provides the following feature set for developing media applications in the
cloud. For information on future releases, see Media Services Upcoming Releases: Planned Feature Support.
Ingest. Ingest operations bring assets into the system, for example by uploading them and encrypting them before
they are placed into Windows Azure Storage. By the RTM release, Media Services will offer integration with partner
components to provide fast UDP (User Datagram Protocol) upload solutions.
Encode. Encode operations include encoding, transforming and converting media assets. You can run encoding tasks
in the cloud using the Media Encoder that is included in Media Services. Encoding options include the following:
o Use the Windows Azure Media Encoder and work with a range of standard codecs and formats, including
industry-leading IIS Smooth Streaming, MP4, and conversion to Apple HTTP Live Streaming.
o Convert entire libraries or individual files with total control over input and output.
o A large set of supported file types, formats, and codecs (see Supported File Types for Media Services).
o Supported format conversions. Media Services enable you to convert ISO MP4 (.mp4) to Smooth Streaming
File Format (PIFF 1.3) (.ismv; .isma). You can also convert Smooth Streaming File Format (PIFF) to Apple HTTP
Live Streaming (.msu8, .ts).
Protect. Protecting content means encrypting live streaming or on demand content for secure transport, storage, and
delivery. Media Services provide a DRM technology-agnostic solution for protecting content. Currently supported
DRM technologies are Microsoft PlayReady Protection and MPEG Common Encryption. Support for additional DRM
technologies will be available.
Stream. Streaming content involves sending it live or on demand to clients, or you can retrieve or download specific
media files from the cloud. Media Services provide a format-agnostic solution for streaming content. Media Services
provide streaming origin support for Smooth Streaming, Apple HTTP Live Streaming, and MP4 formats. Support for
additional formats will be available. You can also seamlessly deliver streaming content by using Windows Azure CDN
or a third-party CDN, which enables the option to scale to millions of users.
Media Services Development Scenarios
Media Services support several common media development scenarios as described in the following table.
Scenario Description
Building end-to-
end workflows
Build comprehensive media workflows entirely in the cloud. From uploading media to distributing
content, Media Services provide a range of components that can be combined to handle specific
application workflows. Current capabilities include upload, storage, encoding, format conversion,
content protection, and on-demand streaming delivery.
Building hybrid
workflows
You can integrate Media Services with existing tools and processes. For example, encode content
on-site then upload to Media Services for transcoding into multiple formats and deliver through
Windows Azure CDN, or a third-party CDN. Media Services can be called individually via standard
REST APIs for integration with external applications and services.
Providing cloud
support for
media players
You can create, manage, and deliver media across multiple devices (including iOS, Android, and
Windows devices) and platforms.
Media Services Client Development
Extend the reach of your Media Services solution by using SDKs and player frameworks to build media client
applications. These clients are for developers who want to build Media Services applications that offer compelling
user experiences across a range of devices and platforms. Depending on the devices that you want to build client
applications for, there are options for SDKs and player frameworks available from Microsoft and other third-party
partners.
Setting up a Windows Azure account for Media Services
To set up your Media Services account, use the Windows Azure Management Portal (recommended). See the
topic How to Create a Media Services Account. After creating your account in the Management Portal, you are ready
to set up your computer for Media Services development.




CONTENT DELIVERY NETWORK (CDN)
The Windows Azure Content Delivery Network (CDN) caches Windows Azure blobs and the static content output of
compute instances at strategically placed locations to provide maximum bandwidth for delivering content to users.
You can enable CDN delivery for your content providers using the Windows Azure Platform Management Portal. CDN
is an add-on feature to your subscription and has a separate billing plan.
The CDN offers developers a global solution for delivering high-bandwidth content by caching the content at physical
nodes in the United States, Europe, Asia, Australia and South America. For a current list of CDN node locations,
see Windows Azure CDN Node Locations.
The benefits of using CDN to cache Windows Azure data include:
Better performance and user experience for end users who are far from a content source, and are using applications
where many internet trips are required to load content
Large distributed scale to better handle instantaneous high load, say, at the start of an event such as a product launch
To use the Windows Azure CDN you must have a Windows Azure subscription and enable the feature on the storage
account or hosted service in the Windows Azure Management Portal.
Note
Enabling the CDN may take up to 60 minutes to propagate worldwide.


When a request for an object is first made to the CDN, the object is read retrieved directly from the Blob service or
from the hosted service. When a request is made using the CDN syntax, the request is redirected to the CDN
endpoint closest to the location from which the request was made to provide access to the object. If the object is not
found at that endpoint, then it is retrieved from the service and cached at the endpoint, where a time-to-live setting is
maintained for the cached object.
Caching content from Windows Azure blobs

Once the CDN is enabled on a Windows Azure storage account, any blobs that are in public containers and are
available for anonymous access will be cached via the CDN. Only blobs that are publically available can be cached
with the Windows Azure CDN. To make a blob publically available for anonymous access, you must denote its
container as public. Once you do so, all blobs within that container will be available for anonymous read access. You
have the option of making container data public as well, or restricting access only to the blobs within it. See Setting
Access Control for Containers for information on managing access control for containers and blobs.
For best performance, use CDN edge caching for delivering blobs less than 10 GB in size.
When you enable CDN access for a storage account, the Management Portal provides you with a CDN domain name
in the following format: http://<identifier>.vo.msecnd.net/. This domain name can be used to access blobs in a public
container. For example, given a public container namedmusic in a storage account named myaccount, users can
access the blobs in that container using either of the following two URLs:
Windows Azure Blob service URL:http://myaccount.blob.core.windows.net/music/
Windows Azure CDN URL:http:// <identifier>.vo.msecnd.net/music/
Caching content from hosted services

You can cache objects to the CDN that are provided by a Windows Azure hosted service.
CDN for hosted services has the following constraints:
Should be used to cache static content.
Warning
Caching of highly volatile or truly dynamic content may adversely affect
your performance or cause content problems, all at increased cost.
The hosted service must be deployed to in a production deployment.
The hosted service must provide the object on port 80 using HTTP.
The hosted service must place the content to be cached in, or delivered from, the /cdn folder on the hosted
service.
When you enable CDN access for on a hosted service, the Management Portal provides you with a CDN domain name
in the following format: http://<identifier>.vo.msecnd.net/. This domain name can be used to retrieve objects from a
hosted service . For example, given a hosted service named myHostedService and an ASP.NET web page called
music.aspx that delivers content, users can access the object using either of the following two URLs:
Windows Azure hosted service URL:http://myHostedService.cloudapp.net/cdn/music.aspx
Windows Azure CDN URL:http://<identifier>.vo.msecnd.net/music.aspx
Accessing cached content over HTTPS
Windows Azure allows you to retrieve content from the CDN using HTTPS calls. This allows you to incorporate content
cached in the CDN into secure web pages without receiving warnings about mixed security content types.
Accessing CDN content using HTTPS has the following constraints:
You must use the certificate provided by the CDN. Third party certificates are not supported.
You must use the CDN domain to access content. HTTPS support is not available for custom
domain names (CNAMEs) since the CDN does not support custom certificates at this time.
HTTPS is from the CDN to the client only. Requests from the CDN to the content provider (Storage
Account or hosted service) are still made using HTTP.
Even when HTTPS is enabled, content from the CDN can be retrieved using both HTTP and HTTPS.
For more information on enabling HTTPS for CDN content, see How to Enable CDN for Windows Azure.
Accessing cached content with custom domains
You can map the CDN HTTP endpoint to a custom domain name and use that name to request objects from the CDN.
For more information on mapping a custom domain, see How to Map CDN Content to a Custom Domain.

Introduction to Hadoop on Windows Azure
Overview
Apache Hadoop-based Services for Windows Azure is a service that deploys and provisions
clusters in the cloud, providing a software framework designed to manage, analyze and report on big
data.
Data is described as "big data" to indicate that is being collection is in ever escalating volumes, at
increasingly high velocities, and for a widening variety of unstructured formats and variable semantic
contexts. Big data collection does not provide value to an enterprise. For big data to provide value in
the form of actionable intelligence or insight, it must be accessible, cleaned, analyzed, and then
presented in a useful way, often in combination with data from various other sources.
Apache Hadoop is a software framework that facilitates big data management and analysis. Apache
Hadoop core provides reliable data storage with the Hadoop Distributed File System (HDFS), and a
simple MapReduce programming model to process and analyze in parallel the data stored in this
distributed system. HDFS uses data replication to address hardware failure issues that arise when
deploying such highly distributed systems.
To simplify the complexities of analyzing unstructured data from various sources, the MapReduce
programming model provides a core abstraction that provides closure for map and reduce
operations. The MapReduce programming model views all of its jobs as computations over key-value
pair datasets. So both input and output files must contain such key-value pair datasets. Other
Hadoop-related projects such as Pig and Hive are built on top of HDFS and the MapReduce
framework, providing higher abstraction levels such as data flow control and querying, as well as
additional functionality such as warehousing and mining, required to integrate big data analysis and
end-to-end management.
Implementing Hadoop on Windows Azure as a service in the cloud makes the HDFS/MapReduce
software framework and related projects available in a simpler, more scalable, and cost efficient
environment. To simplify configuring and running Hadoop jobs and interacting with the deployed
clusters, Microsoft provides JavaScript and Hive consoles. This simplified JavaScript approach enables
IT professionals and a wider group of developers to deal with big data management and analysis by
providing an accessible path into the Hadoop framework.
It addition to the available Hadoop-related ecosystem projects, it also provides Open Database
Connectivity (ODBC) drivers to integrate Business Intelligence (BI) tools such as Excel, SQL Server
Analysis Services, and Reporting Services, facilitating and simplifying end-to-end data analysis.
This topic describes the Hadoop ecosystem on Windows Azure, the main scenarios for Hadoop on
Windows Azure cases, and provides a tour around the Hadoop on Windows Azure portal. It contains
the following sections:
Big Data: Volume, Velocity, Variety and Variability. - The qualities of big data that render it best
managed by NoSQL systems like Hadoop, rather than by conventional Relational Database
Management System (RDBMS).
The Hadoop Ecosystem on Windows Azure - Hadoop on Windows Azure provides Pig, Hive, Mahout,
Pegasus, Sqoop, and Flume implementations, and supports other BI tools such as Excel, SQL Server
Analysis Services and Reporting Services that are integrated with HDFS and the MapReduce
framework.
Big Data Scenarios for Hadoop on Windows Azure - The types of jobs appropriate for using Hadoop
on Windows Azure.
Getting Started with Microsoft Hadoop on Windows Azure - Get Community Technical Preview (CTP)
access and an introduction to the Apache Hadoop-based Services for Windows Azure Portal.
Tour of the Portal - Deploying clusters, managing your account, running samples, and the interactive
JavaScript console.
Resources for Hadoop on Windows Azure - Where to find resources with additional information.
Big data: volume, velocity, variety, and variability
You cannot manage or process big data by conventional RDBMS because big data volumes are too
large, or because the data arrives at too high a velocity, or because the data structures variety and
semantic variability do not fit relational database architectures.
Volume
The Hadoop big data solution is a response to two divergent trends. On the one hand, because the
capacity of hard drives has continued to increase dramatically over the last 20 years, vast amounts of
new data generated by web sites and by new device and instrumentation generations connected to
the Internet, can be stored. In addition, there is automated tracking of everyone's online behavior.
On the other hand, data access speeds on these larger capacity drives have not kept pace, so reading
from and writing to very large disks is too slow.
The solution for this network bandwidth bottleneck has two principal features. First, HDFS provides a
type of distributed architecture that stores data on multiple disks with enabled parallel disk reading.
Second, move any data processing computational requirements to the data-storing node, enabling
access to the data as local as possible. The enhanced MapReduce performance depends on this
design principle known as data locality. The idea saves bandwidth by moving programs to the data,
rather than data to programs, resulting in the MapReduce programming model scaling linearly with
the data set size. For an increase in the cluster size proportionately with the data processed volume,
the job executes in more or less the same amount of time.
Velocity
The rate at which data is becoming available to organizations has followed a trend very similar to the
previously described escalating volume of data, and is being driven by increased ecommerce
clickstream consumer behavior logging and by data associated social networking such as Facebook
and Twitter. Smartphones and tablets device proliferation has dramatically increased the online data
generation rate. Online gaming and scientific instrumentation are also generating streams of data at
velocities with which traditional RDBMS are not able to cope. Insuring a competitive advantage in
commercial and gaming activities requires quick responses as well as quick data analysis results.
These high velocity data streams with tight feedback loops require a NoSQL approach like Hadoop's
optimized for fast storage and retrieval.
Variety
Most generated data is messy. Diverse data sources do not provide a static structure enabling
traditional RDBMS timely management. Social networking data, for example, is typically text-based
taking a wide variety of forms that may not remain fixed over time. Data from images and sensors
feeds present similar challenges. This sort of unstructured data requires a flexible NoSQL system like
Hadoop that enables providing sufficient structure to incoming data, storing it without requiring an
exact schema. Cleaning up unstructured data is a significant processing part required to prepare
unstructured data for use in an application. To make clean high-quality data more readily available,
data marketplaces are competing and specializing in providing this service.
Variability
Larger issues in the interpretation of big data can also arise. The term variability when applied to big
data tends to refer specifically to the wide possible variance in meaning that can be encountered.
Finding the most appropriate semantic context within which to interpret unstructured data can
introduce significant complexities into the analysis.
The Hadoop ecosystem on Windows Azure

Introduction
Hadoop on Windows Azure offers a framework implementing Microsoft cloud-based solution for
handling big data. This federated ecosystem manages and analyses large data amounts while
exploiting parallel processing capabilities, other HDFS architecture optimizations, and the
MapReduce programming model. Technologies such as Sqoop and Flume integrate HDFS with
relational data stores and log files. Hive and Pig integrate data processing and warehousing
capabilities. Pegasus provides graph-mining capabilities. Microsoft Big Data solution integrates with
Microsoft BI tools, including SQL Server Analysis Services, Reporting Services, PowerPivot and Excel.
Microsoft BI tools enable you to perform a straightforward BI on data stored and managed by the
Hadoop on Windows Azure ecosystem. The Apache-compatible technologies and sister technologies
are part of this ecosystem built to run on top of Hadoop clusters are itemized and briefly described
in this section.
Pig
Pig is a high-level platform for processing big data on Hadoop clusters. Pig consists of a data flow
language, called Pig Latin, supporting writing queries on large datasets and an execution
environment running programs from a console. The Pig Latin programs consist of dataset
transformation series converted under the covers, to a MapReduce program series. Pig Latin
abstractions provide richer data structures than MapReduce, and perform for Hadoop what SQL
performs for RDBMS systems. Pig Latin is fully extensible. User Defined Functions (UDFs), written in
Java, Python, C#, or JavaScript, can be called to customize each processing path stage when
composing the analysis. For more information, see Welcome to Apache Pig!
Hive
Hive is a distributed data warehouse managing data stored in an HDFS. It is the Hadoop query
engine. Hive is for analysts with strong SQL skills providing an SQL-like interface and a relational data
model. Hive uses a language called HiveQL; a dialect of SQL. Hive, like Pig, is an abstraction on top of
MapReduce and when run, Hive translates queries into a series of MapReduce jobs. Scenarios for
Hive are closer in concept to those for RDBMS, and so are appropriate for use with more structured
data. For unstructured data, Pig is better choice. Hadoop on Windows Azure includes an ODBC driver
for Hive, which provides direct real-time querying from business intelligence tools such as Excel into
Hadoop. For more information, see Welcome to Apache Hive!
Mahout
Mahout is an open source machine-learning library facilitating building scalable matching learning
libraries. Using the map/reduce paradigm, algorithms for clustering, classification, and batch-based
collaborative filtering developed for Mahout are implemented on top of Apache Hadoop. For more
information, see What is Apache Mahout.
Pagasus
Pegasus is a peta-scale graph mining system running on Hadoop. Graph mining is data mining used
to find the patterns, rules, and anomalies characterizing graphs. A graph in this context is a set of
objects with links that exist between any two objects in the set. This structure type characterizes
networks everywhere, including pages linked on the Web, computer and social networks (FaceBook,
Twitter), and many biological and physical systems. Before Pegasus, the maximum graph size that
could be mined incorporated millions of objects. By developing algorithms that run in parallel on top
of a Hadoop cluster, Pegasus develops algorithms to mine graphs containing billions of objects. For
more information, see the Project Pegasus Web site.
Sqoop
Sqoop is tool that transfers bulk data between Hadoop and relational databases such a SQL, or other
structured data stores, as efficiently as possible. Use Sqoop to import data from external structured
data stores into the HDFS or related systems like Hive. Sqoop can also extract data from Hadoop and
export the extracted data to external relational databases, enterprise data warehouses, or any other
structured data store type. For more information, see the Apache Sqoop Web site.
Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and
moving large log data amounts to HDFS. Flume's architecture is streaming data flow based. It is
robust and fault tolerant with tunable and reliability mechanisms with many failover and recovery
mechanisms. It has a simple extensible data model enabling online analytical applications. For more
information, see the Flume incubation site.
Business intelligence tools
Familiar Business Intelligence (BI) tools such as Excel, PowerPivot, SQL Server Analysis Services and
Reporting Services retrieves, analyzes, and reports data integrated with Hadoop on Windows Azure
using ODBC drivers. The Hive ODBC driver and Hive Add-in for Excel are available for download on
the Hadoop on Windows Azure portal How To Connect Excel to Hadoop on Windows Azure via
HiveODBC.
* For information Analysis Services, see SQL Server 2012 Analysis Services.
* For information Reporting Services, see SQL Server 2012 Reporting.
Big data scenarios for Hadoop on Windows Azure
An exemplary scenario that provides a case for an Hadoop on Windows Azure application is an ad
hoc analysis, in batch fashion, on an entire unstructured dataset stored on Windows Azure nodes,
which do not require frequent updates.
These conditions apply to a wide variety of activities in business, science, and governance. These
conditions include, for example, monitoring supply chains in retail, suspicious trading patterns in
finance, demand patterns for public utilities and services, air and water quality from arrays of
environmental sensors, or crime patterns in metropolitan areas.
Hadoop is most suitable for handling a large amount of logged or archived data that does not
require frequent updating once it is written, and that is read often, typically to do a full analysis. This
scenario is complementary to data more suitably handled by a RDBMS that require lesser amounts of
data (Gigabytes instead of Petabytes), and that must be continually updated or queried for specific
data points within the full dataset. RDBMS work best with structured data organized and stored
according to a fixed schema. MapReduce works well with unstructured data with no predefined
schema because it interprets data when being processed.
Getting started with Microsoft Hadoop on Windows Azure
The Hadoop on Windows Azure CTP
The Hadoop on Windows Azure service is available by invitation only during this Community
Technical Preview (CTP). The CTP purpose is for you to test Hadoop-based service on Windows
Azure, become more familiar with it, and provide feedback. The process for gaining access is outlined
below.
The portal used by Apache Hadoop-based services for Windows Azure
The Microsoft implementation of Hadoop on Windows Azure uses a Portal to provision new Apache
Hadoop clusters. Clusters provisioned on the portal are temporary and expire after several days.
When there is less than six hours left on the clock, an expiration time extension is allowed. These
clusters run jobs that process data either on the cluster or located elsewhere. For example, the data
could reside in a Windows Azure account or be transferred to the cluster over FTP.
The advantage of using a temporary cluster is that there is no cost to maintain the hardware needed
for the MapReduce parallel processing jobs. You use the cluster and then release it or allow it to
expire. Apache Hadoop deployment solutions are also available for deploying Apache Hadoop to a
Windows Azure account, or on-premise hardware that you manage.