You are on page 1of 73

Characterization of Distributed Systems y

http://idc.hust.edu.cn/~rxli/ Ruixuan Li School of Computer Science and Technology Huazhong University of Science and Technology Sep. 10, 2013

Outline

Introduction

motivation, definition, and characteristics

Examples of distributed systems Challenges Summary y


1$

(USA, 2003)

1 CPU 4 GB 1 1 GB 1 GB 3 10 M 10 TB 10 TB

2GH CPU, 2GB RAM: $2,000 200 GB, 100 50MB: $200 1 Mbps: $100/

10 KWhrs 14
3 1000.
3


(Views of Jim Gray, 2003)

1 10 100 000CPU 100,000 10 1M 10,000CPU1

Google g 10PB. HotmailE-mail. Amazon.com. . . . . 360. .

(Distributed Computing)
5

U. C. Berkeley

SETI@Home
305

SETI@Home67(67 Teraflops), 2005 SETI@home

1:

2:

Distributed System Motivation

Resource sharing

It characterizes the range of the things that can usefully be shared in a networked computer It extends from hardware components to software-defined ft d fi d entities. titi It includes the stream of video frames and the audio connection.

Collaborative computing

Parallel vs. Distributed Computing

10

Distributed System Architecture

A distributed system is one in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages.

Distributed Applications Middleware Machine Machine Machine Passing messages Networks


11

Distributed System Architecture (contd.)

Applications

P2P systems, Search Engines, Online games, Gmail, Gmail CORBA, DCOM, EJB, . Internet, Mobile phone networks, Wireless sensor networks networks, Corporation networks, networks Factory networks, Campus networks, Home networks
12

Middleware

Networks

Characteristics of Distributed System

Concurrency

concurrent programs execution share resource programs coordinate actions by exchanging messages when some systems fail, others may not know

No global clock

Independent failures

13

Outline

Introduction Examples of distributed systems Challenges Summary

14

Examples of distributed systems

Large distributed systems

The Internet, The Intranet DNS service Distributed file system P2P Applications (BitTorrent, eMule) Mobile and ubiquitous computing Search engine, Sensor network, Cloud computing
15

Typical distributed system


New fields in distributed systems y


Distributed systems nearby

The Internet

intranet ISP

backbone

satellite link desktop computer: server: network link:


16

The Internet Is Enormous

Image from http://www.nature.com/nature/webmatters/tomog/tomfigs/fig1.html

17

Internet: Past, Past Present, Present Future


140 Nu umber of ho osts (millions) 120 100 80 60 40 20 0 1965
TCP/IP HTML Mosaic XML PHASE 1. Packet Switching Networks 2 The 2. Th Internet I t t is i Born B TCP/IP becomes core protocol Domain Name System created IETF created (1986) 3 The 3. Th World W ld Wide Wid Web W b HTML hypertext system created CERN launch World Wide Web NCSA launch Mosaic interface 4 with 4. ith XML 5 Semantic 5. S ti Web W b

The 'Network Effect kicks in, and the web goes critical'

1970

1975

1980

1985

1990

1995

2000

2005

2010

1969: 4 US Universities linked to form ARPANET 1972: First e-mail program created 1976: Robert Metcalfe develops Ethernet

18

Web

1980Tim Berners Berners-Lee LeeEnquire (Enquire Within Upon Everything) 199011 Webnxoc01.cern.ch nxoc01 cern ch Tim Berners-Lee WebWorldWideWeb Web 1991 CERN ( (European p Particle Physics y Laboratory) Web Web eb W3CWorld Wide Web Consortium
19

Web

HTML URI HTTP Tim Berners-Lee"Web " Internet Web

20

Web

Load on the first Web server (info (info.cern.ch) cern ch) 1000 times what it has been 3 years earlier
21

Web

Number of web sites 1993-1996, from 130 to 600,000 sites 2010-2013, from 200,000,000 to 780,000,000 N t ft Netcraft

How does Netcraft do this?


22

CNNIC20086 2.53 2012 5.64

CNNIC201212 4.20

1993Mark AndreessenMosaic The great thing about the Internet--the thing that catalyzed it in the first place and renews it every day--is that there are so many people able to use it, able to do a million different things. It's an open platform that anybody can develop and create applications for for. A lot of people are able to apply their energy, and see it bear fruit. fruit.

25

1994, Mark AndreessenNetscape 1995, MicrosoftInternetInternet Explorer 1.02.0 1997, , IE4.0DHTMLWinner 1998, Netscape 2004 Mozilla.org 2004, Mozilla orgNetscapeFirefox 2008, GoogleChrome 2010, UC 2012, HTML5

Why? Web Browser


26

DOTCOM Bubble

Free publishing and instant worldwide information direct Web-based commerce

1997-2001 1997 2001 internet-based

The technology technology-heavy heavy NASDAQ Composite index peaked in March 2000, reflecting the high point of the dot-com bubble bubble.

27

WEB2.0
Web 1.0 Ofoto Flickr Akamai BitTorrent mp3.com Napster Britannica Online Wikipedia personal websites (blogging) evite upcoming.orgEVDB SEO page views cost per click screen scraping web services publishing participation content management wikis directories tagging, folksonomy stickiness syndication
28

Web 2.0 DoubleClick Google AdSense

Web2.0 Buzzwords

Web

AdSense Facebook Twitter Mash up Mash-up Wikipedia Yahoo, ebay, amazon del icio us Flickr del.icio.us, th perpetual the t lb beta t
29

Web

www.eBay.com y www.wikipedia.com www.napster.com t www.youtube.com www.blogger.com www friendsreunited com www.friendsreunited.com www.drudgereport.com ()

30

Web

www.myspace.com www.amazon.com www.slashdot.org l hd www.salon.com www.craigslist.org www.google.com l www.yahoo.com www.easyjet.com

31

Examples of distributed systems

Large distributed systems

The Internet, The Intranet DNS service Distributed file system P2P Applications (BT, eMule) Mobile and ubiquitous computing Search engine, Sensor network, Cloud computing
32

Typical distributed system


New fields in distributed systems y


Distributed systems nearby

A typical intranet
email server print and other servers Local area network

Desktop p computers

Web server

email server File server print other servers the rest th t of f the Internet router/firewall

33

Issues in intranet

File services Firewall The cost Th t of f software ft i installation t ll ti and d support

34

Examples of distributed systems

Large distributed systems

The Internet, The Intranet, Mobile computing DNS service Distributed file system P2P Applications (BT, eMule) Mobile and ubiquitous computing Search engine, Sensor network, Cloud computing
35

Typical distributed system


New fields in distributed systems y


Distributed systems nearby

/C/S

36

(Grid Computing) (Peer-to-Peer Computing) (Services Computing) (Autonomous Computing) (Edge Computing) ( (Mobile Computing) p g) (Sensor Network) (Pervasive/Ubiquitous (P i /Ubi it Computing) C ti ) (Cloud Computing)
37

Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

38

39

Web Services Service-Oriented Architecture ( (SOA) )

40


(MEMS)

Mote (Berkeley)

Cricket

(MIT)

Mantis (UC
Boulder )

SmartLocus
(HP-Labs)

Smart Dust
(Berkeley)

41

42

Mobile and ubiquitous computing

Mobile devices

Laptop computers Handheld devices PDA, mobile phone, pager, video camera, di it l camera digital Wearable devices e.g. smart watches, digital glasses Devices embedded in appliances e.g. washing machines, hi-fi systems, cars and refrigerators g
43

Mobile and ubiquitous computing (contd.)

Mobile computing (nomadic computing)

Access resources while on the move or in an unusual environment

Location-aware computing: utilize resources that


are conveniently nearby

Ubiquitous computing (pervasive computing)

The harnessing of many small, cheap computational devices

44

Potable and handheld devices


Internet

Host intranet

Wireless LAN

WAP gateway

Home intranet

Mobile phone Printer Camera Laptop Host site

45

Issues in Mobile and ubiquitous computing


Discovery of resources Eliminating the need for users to reconfigure their mobile devices To cope with limited connectivity as they travel Provide privacy and other security guarantees

46

Examples of distributed systems

Large distributed systems

The Internet, The Intranet, Mobile computing DNS service Distributed file system P2P Applications (BT, eMule) Mobile and ubiquitous computing Search engine, Sensor network, Cloud computing
47

T i l distributed Typical di t ib t d system t


New fields in distributed systems


Distributed systems nearby

Search Engine Architecture


crawl the web Check for duplicates, store the documents

Crawler machines

DocIds

user query

Create an inverted index

Show results To user

Search engine servers

Inverted index

48

Clusters for Search Engine

49

Issues in Search Engine


50

A Low Cost Resource Sharing Model

Cloud Computing:

Computing service is a standard utility Users and corporations contract the services by units Significantly reduce the IT personal and infrastructure costs Well utilize rich computing, storage, and Internet resources Principles p of cloud computing p g Cost-effectiveness is the basis for computing, storage, and communication models in cloud computing Targeting standard computing model in a wide range Exploiting p g locality y and load sharing g with low overhead
51

Cloud C oud Co Computing: pu g New e challenges c a e ges

New challenges (CS@Berkeley, (CS@Berkeley 2009) (1) availability of service (2) sharing data in different platforms (3) data security (4) minimizing communication cost (5) unpredictable performance (6) scalability of storage (7) reliability of large scale distributed systems (8) service scalability (9) trust to the cloud service (10) software ft li licensing i
52

Outline

Introduction

motivation, defination, and characteristics

Examples of distributed systems Challenges Summary

53


54

Eight Fallacies of Distributed Computing


(Peter Deutsch, Deutsch 1994)

Network is reliable

New reliable: N li bl failures f il of f switches, it h powers, and d others, th security it attacks tt k Systems must be duplicated Latency improvement significantly lags behind that of bandwidth Latency reduction anywhere is most important Network bandwidth is expensive, does not follow Moores Law.

Latency is zero

Bandwidth a d dt is s infinite te

Network is secure Topology does not change

Network topology is out of users control, subject to changes all the time Networking/system administration rules are different from organizations
Two costs are involved: software overhead (e.g. TCP/IP, others), monthly maintenance fee Types of networks, computers, software systems are very diverse
55

There is one administrator

Transport cost is zero

The network is homogeneous


56


57

Challenges,

Heterogeneity Openness Security Scalability Failure handling Concurrency Transparency

58

Heterogeneity

Networks

Ethernet, token ring, etc big endian / little endian different API of Unix and Windows diff different t representations t ti f for d data t structures t t no application standards
59

C Computer t h hardware d

Operating systems

Programming languages

Implementations from different developers

Heterogeneity (contd.)

Middleware

applies to a software layer that provides a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, programming g g languages g g OSs and p is used i d to t refer f t to code d th that t can b be sent tf from one computer to another and run at the destination

Mobile code

60

Openness

Openness of a computer system

is the characteristic that determines whether the system can be extended and re-implemented in various way.

e.g. Unix

Openness of distributed systems

is determined by the degree to which new resource sharing services can be added and be made available for use by a variety of client programs.

e.g. Web

How to deal with openness?

k i key interfaces t f are published. bli h d

e.g. RFC

61

Openness (contd.)

Open APIs ()

62

Openness (contd.)

(e.g. XML-RPC)

63

Security

Confidentiality

protection against disclosure to unauthorized individuals

e.g. ACL in Unix File System

I t it Integrity

protection against alteration or corruption

e.g. checksum

Availability

protection against interference with the means to access t the e resources esou ces

e.g. Denial of service


64

Scalability

A system is described as scalable

if it will remain effective when there is a significant increase in the number of resources and the number of users

A scalable l bl example l system: the h Internet

65

Scalability (contd.)

Design challenges

The cost of physical resources

e.g., servers support users at most O(n) e.g., DNS no worse than O(logn) e.g., IP address e.g., partitioning name table of DNS, cache and replication
66

The performance loss

Prevent software resources running out

Avoid id performance f bottlenecks b l k

Failure handling

Detecting

e.g. checksum for corrupted data Sometimes impossible p so suspect, p , e.g. g a remote crashed server in the Internet e.g. Retransmit message, standby server e.g. a web browser cannot contact a web server e.g. Roll back e.g. IP route, replicated name table of DNS
67

Masking

Tolerating

Recovery y

Redundancy y

Concurrency

Correctness

ensure the operations on shared resource correct in a concurrent environment

e.g. records bids for an auction

Performance

Ensure the high g performance p of concurrent operations p

68

Transparency

Access transparency

using identical operations to access local and remote resources e.g. a graphical user interface with folders resources to be accessed without knowledge of their location e.g. URL several processed operate concurrently using shared resources without interference with between them multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers
69

Location transparency

Concurrency y transparency p y

R li ti transparency Replication t

Transparency (contd.)

Failure transparency

users and applications to complete their tasks despite the failure of hardware and software components, e.g., email

Mobility transparency

movement of resources and clients within a system without affecting the operation of users and programs, e.g., mobile phone

Performance e o a ce t transparency a spa e cy

allows the system to be reconfigured to improve performance as loads vary

Scaling transparency

allows the system y and applications pp to expand p in scale without change to the system structure or the application algorithms
70

Summary

Distributed systems are pervasive Distributed computing and resource sharing are primary motivations for constructing distributed systems Characterization of Distributed System

Concurrency No global clock Independent failures

71

Summary (contd.)

Challenges to construct distributed system


Heterogeneity Openness Security Scalability Failure handling Concurrency Transparency

72

Q&A