Professional Documents
Culture Documents
1
2 1 BACKGROUND
creases in frequency increase the amount of power used Optimally, the speedup from parallelization would be
in a processor. Increasing processor power consumption lineardoubling the number of processing elements
led ultimately to Intel's May 8, 2004 cancellation of its should halve the runtime, and doubling it a second time
Tejas and Jayhawk processors, which is generally cited as should again halve the runtime. However, very few par-
the end of frequency scaling as the dominant computer allel algorithms achieve optimal speedup. Most of them
architecture paradigm.[11] have a near-linear speedup for small numbers of process-
Moores law is the empirical observation that the num- ing elements, which attens out into a constant value for
ber of transistors in a microprocessor doubles every 18 large numbers of processing elements.
to 24 months.[12] Despite power consumption issues, and The potential speedup of an algorithm on a parallel com-
repeated predictions of its end, Moores law is still in ef- puting platform is given by Amdahls law[13]
fect. With the end of frequency scaling, these additional
transistors (which are no longer used for frequency scal-
ing) can be used to add extra hardware for parallel com- 1
Slatency (s) = ,
puting. 1 p + ps
where
1.1 Amdahls law and Gustafsons law
S is the potential speedup in latency of the ex-
Amdahl's Law
ecution of the whole task;
20.00
18.00
s is the speedup in latency of the execution of the
16.00
Parallel portion
50%
parallelizable part of the task;
75%
14.00 90%
12.00
95%
p is the percentage of the execution time of the
whole task concerning the parallelizable part of the
Speedup
10.00
8.00
task before parallelization.
6.00
4.00 Since S < 1/(1 - p), it shows that a small part of the
2.00 program which cannot be parallelized will limit the over-
0.00 all speedup available from parallelization. A program
16384
32768
65536
1024
2048
4096
8192
128
256
512
16
32
64
1
Original process
Make B 5x faster
Make A 2x faster
Amdahls law only applies to cases where the problem In this example, instruction 3 cannot be executed before
size is xed. In practice, as more computing resources (or even in parallel with) instruction 2, because instruc-
become available, they tend to get used on larger prob- tion 3 uses a result from instruction 2. It violates condi-
lems (larger datasets), and the time spent in the paral- tion 1, and thus introduces a ow dependency.
lelizable part often grows much faster than the inher- 1: function NoDep(a, b) 2: c := a * b 3: d := 3 * b 4: e
ently serial work.[15] In this case, Gustafsons law gives := a + b 5: end function
a less pessimistic and more realistic assessment of paral-
lel performance:[16] In this example, there are no dependencies between the
instructions, so they can all be run in parallel.
Bernsteins conditions do not allow memory to be shared
Slatency (s) = 1 p + sp. between dierent processes. For that, some means of en-
forcing an ordering between accesses is necessary, such
Both Amdahls law and Gustafsons law assume that the as semaphores, barriers or some other synchronization
running time of the serial part of the program is indepen- method.
dent of the number of processors. Amdahls law assumes
that the entire problem is of xed size so that the total
amount of work to be done in parallel is also independent
of the number of processors, whereas Gustafsons law as- 1.3 Race conditions, mutual exclusion,
sumes that the total amount of work to be done in parallel synchronization, and parallel slow-
varies linearly with the number of processors. down
Many parallel programs require that their subtasks act in eral ways. Petri nets, which were introduced in Carl
synchrony. This requires the use of a barrier. Barriers are Adam Petris 1962 doctoral thesis, were an early at-
typically implemented using a software lock. One class of tempt to codify the rules of consistency models. Dataow
algorithms, known as lock-free and wait-free algorithms, theory later built upon these, and Dataow architec-
altogether avoids the use of locks and barriers. However, tures were created to physically implement the ideas of
this approach is generally dicult to implement and re- dataow theory. Beginning in the late 1970s, process
quires correctly designed data structures. calculi such as Calculus of Communicating Systems and
Not all parallelization results in speed-up. Generally, as a Communicating Sequential Processes were developed to
permit algebraic reasoning about systems composed of
task is split up into more and more threads, those threads
spend an ever-increasing portion of their time communi- interacting components. More recent additions to the
process calculus family, such as the -calculus, have
cating with each other. Eventually, the overhead from
communication dominates the time spent solving the added the capability for reasoning about dynamic topolo-
gies. Logics such as Lamports TLA+, and mathemati-
problem, and further parallelization (that is, splitting the
workload over even more threads) increases rather than cal models such as traces and Actor event diagrams, have
decreases the amount of time required to nish. This is also been developed to describe the behavior of concur-
known as parallel slowdown. rent systems.
See also: Relaxed sequential
til about 1986, speed-up in computer architecture was All modern processors have multi-stage instruction
driven by doubling computer word sizethe amount of pipelines. Each stage in the pipeline corresponds to a dif-
information the processor can manipulate per cycle.[21] ferent action the processor performs on that instruction in
Increasing the word size reduces the number of instruc- that stage; a processor with an N-stage pipeline can have
tions the processor must execute to perform an operation up to N dierent instructions at dierent stages of com-
on variables whose sizes are greater than the length of pletion and thus can issue one instruction per clock cycle
the word. For example, where an 8-bit processor must (IPC = 1). These processors are known as scalar pro-
add two 16-bit integers, the processor must rst add the cessors. The canonical example of a pipelined processor
8 lower-order bits from each integer using the standard is a RISC processor, with ve stages: instruction fetch
addition instruction, then add the 8 higher-order bits us- (IF), instruction decode (ID), execute (EX), memory ac-
ing an add-with-carry instruction and the carry bit from cess (MEM), and register write back (WB). The Pentium
the lower order addition; thus, an 8-bit processor requires 4 processor had a 35-stage pipeline.[23]
two instructions to complete a single operation, where a
16-bit processor would be able to complete the operation
with a single instruction.
Historically, 4-bit microprocessors were replaced with 8-
bit, then 16-bit, then 32-bit microprocessors. This trend
generally came to an end with the introduction of 32-bit
processors, which has been a standard in general-purpose
computing for two decades. Not until the early twothou-
sands, with the advent of x86-64 architectures, did 64-bit
processors become commonplace.
ory and connect via a bus.[27] Bus contention prevents bus a TCP/IP Ethernet local area network.[30] Beowulf tech-
architectures from scaling. As a result, SMPs generally nology was originally developed by Thomas Sterling and
do not comprise more than 32 processors.[28] Because of Donald Becker. The vast majority of the TOP500 super-
the small size of the processors and the signicant re- computers are clusters.[31]
duction in the requirements for bus bandwidth achieved Because grid computing systems (described below) can
by large caches, such symmetric multiprocessors are ex- easily handle embarrassingly parallel problems, modern
tremely cost-eective, provided that a sucient amount clusters are typically designed to handle more dicult
of memory bandwidth exists.[27] problemsproblems that require nodes to share inter-
mediate results with each other more often. This re-
3.2.3 Distributed computing quires a high bandwidth and, more importantly, a low-
latency interconnection network. Many historic and cur-
rent supercomputers use customized high-performance
Main article: Distributed computing
network hardware specically designed for cluster com-
puting, such as the Cray Gemini network.[32] As of 2014,
A distributed computer (also known as a distributed most current supercomputers use some o-the-shelf stan-
memory multiprocessor) is a distributed memory com- dard network hardware, often Myrinet, InniBand, or
puter system in which the processing elements are con- Gigabit Ethernet.
nected by a network. Distributed computers are highly
scalable.
Massively parallel computing Main article:
Massively parallel (computing)
Cluster computing Main article: Computer cluster A massively parallel processor (MPP) is a single
A cluster is a group of loosely coupled computers that
A Beowulf cluster.
work together closely, so that in some respects they can A cabinet from IBM's Blue Gene/L massively parallel
be regarded as a single computer.[29] Clusters are com- supercomputer.
posed of multiple standalone machines connected by a
network. While machines in a cluster do not have to be computer with many networked processors. MPPs have
symmetric, load balancing is more dicult if they are many of the same characteristics as clusters, but MPPs
not. The most common type of cluster is the Beowulf have specialized interconnect networks (whereas clusters
cluster, which is a cluster implemented on multiple iden- use commodity hardware for networking). MPPs also
tical commercial o-the-shelf computers connected with tend to be larger than clusters, typically having far
8 3 HARDWARE
more than 100 processors.[33] In an MPP, each CPU walked into AMD, they called us 'the socket stealers.'
contains its own memory and copy of the operating Now they call us their partners.[36]
system and application. Each subsystem communicates
with the others via a high-speed interconnect.[34]
General-purpose computing on graphics processing
IBM's Blue Gene/L, the fth fastest supercomputer in the units (GPGPU) Main article: GPGPU
world according to the June 2009 TOP500 ranking, is an General-purpose computing on graphics processing units
MPP.
Moores-law-driven general-purpose computing, has ren- Passing Interface (MPI) is the most widely used message-
dered ASICs unfeasible for most parallel computing ap- passing system API.[43] One concept used in program-
plications. However, some have been built. One example ming parallel programs is the future concept, where one
is the PFLOPS RIKEN MDGRAPE-3 machine which part of a program promises to deliver a required datum
uses custom ASICs for molecular dynamics simulation. to another part of a program at some future time.
CAPS entreprise and Pathscale are also coordinating their
Vector processors Main article: Vector processor eort to make hybrid multi-core parallel programming
A vector processor is a CPU or computer system that can (HMPP) directives an open standard called OpenHMPP.
The OpenHMPP directive-based programming model
oers a syntax to eciently ooad computations on
hardware accelerators and to optimize data movement
to/from the hardware memory. OpenHMPP directives
describe remote procedure call (RPC) on an accelerator
device (e.g. GPU) or more generally a set of cores. The
directives annotate C or Fortran codes to describe two
sets of functionalities: the ooading of procedures (de-
noted codelets) onto a remote device and the optimization
of data transfers between the CPU main memory and the
accelerator memory.
The rise of consumer GPUs has led to support for
compute kernels, either in graphics APIs (referred to as
The Cray-1 is a vector processor. compute shaders), in dedicated APIs (such as OpenCL),
or in other language extensions.
execute the same instruction on large sets of data. Vector
processors have high-level operations that work on linear
arrays of numbers or vectors. An example vector opera- 4.2 Automatic parallelization
tion is A = B C, where A, B, and C are each 64-element
vectors of 64-bit oating-point numbers.[42] They are Main article: Automatic parallelization
closely related to Flynns SIMD classication.[42]
Cray computers became famous for their vector- Automatic parallelization of a sequential program by a
processing computers in the 1970s and 1980s. How- compiler is the holy grail of parallel computing. Despite
ever, vector processorsboth as CPUs and as full com- decades of work by compiler researchers, automatic par-
puter systemshave generally disappeared. Modern allelization has had only limited success.[44]
processor instruction sets do include some vector process- Mainstream parallel programming languages remain ei-
ing instructions, such as with Freescale Semiconductor's ther explicitly parallel or (at best) partially implicit, in
AltiVec and Intel's Streaming SIMD Extensions (SSE). which a programmer gives the compiler directives for
parallelization. A few fully implicit parallel programming
languages existSISAL, Parallel Haskell, SequenceL,
4 Software System C (for FPGAs), Mitrion-C, VHDL, and Verilog.
Concurrent programming languages, libraries, APIs, and As a computer system grows in complexity, the mean
parallel programming models (such as algorithmic skele- time between failures usually decreases. Application
tons) have been created for programming parallel com- checkpointing is a technique whereby the computer sys-
puters. These can generally be divided into classes based tem takes a snapshot of the applicationa record of
on the assumptions they make about the underlying mem- all current resource allocations and variable states, akin
ory architectureshared memory, distributed memory, to a core dump; this information can be used to re-
or shared distributed memory. Shared memory program- store the program if the computer should fail. Applica-
ming languages communicate by manipulating shared tion checkpointing means that the program has to restart
memory variables. Distributed memory uses message from only its last checkpoint rather than the beginning.
passing. POSIX Threads and OpenMP are two of the While checkpointing provides benets in a variety of sit-
most widely used shared memory APIs, whereas Message uations, it is especially useful in highly parallel systems
10 7 HISTORY
unstructured grid problems (such as found in nite In April 1958, S. Gill (Ferranti) discussed parallel pro-
element analysis); gramming and the need for branching and waiting.[52]
Also in 1958, IBM researchers John Cocke and Daniel
Monte Carlo method; Slotnick discussed the use of parallelism in numerical
calculations for the rst time.[53] Burroughs Corporation
combinational logic (such as brute-force crypto- introduced the D825 in 1962, a four-processor com-
graphic techniques); puter that accessed up to 16 memory modules through
graph traversal (such as sorting algorithms); a crossbar switch.[54] In 1967, Amdahl and Slotnick pub-
lished a debate about the feasibility of parallel process-
dynamic programming; ing at American Federation of Information Processing
Societies Conference.[53] It was during this debate that
branch and bound methods; Amdahls law was coined to dene the limit of speed-up
graphical models (such as detecting hidden Markov due to parallelism.
models and constructing Bayesian networks); In 1969, company Honeywell introduced its rst Multics
system, a symmetric multiprocessor system capable of
nite-state machine simulation.
running up to eight processors in parallel.[53] C.mmp, a
1970s multi-processor project at Carnegie Mellon Uni-
versity, was among the rst multiprocessors with more
6 Fault-tolerance than a few processors.[50] The rst bus-connected multi-
processor with snooping caches was the Synapse N+1 in
Further information: Fault-tolerant computer system 1984.[50]
SIMD parallel computers can be traced back to the
Parallel computing can also be applied to the design of 1970s. The motivation behind early SIMD computers
fault-tolerant computer systems, particularly via lockstep was to amortize the gate delay of the processors control
systems performing the same operation in parallel. This unit over multiple instructions.[55] In 1964, Slotnick had
provides redundancy in case one component should fail, proposed building a massively parallel computer for the
and also allows automatic error detection and error cor- Lawrence Livermore National Laboratory.[53] His design
rection if the results dier. These methods can be used was funded by the US Air Force, which was the earli-
to help prevent single event upsets caused by transient est SIMD parallel-computing eort, ILLIAC IV.[53] The
errors.[47] Although additional measures may be required key to its design was a fairly high parallelism, with up
in embedded or specialized systems, this method can pro- to 256 processors, which allowed the machine to work on
vide a cost eective approach to achieve n-modular re- large datasets in what would later be known as vector pro-
dundancy in commercial o-the-shelf systems. cessing. However, ILLIAC IV was called the most infa-
11
mous of supercomputers, because the project was only primary method of improving processor performance
one fourth completed, but took 11 years and cost almost Even representatives from Intel, a company generally as-
four times the original estimate.[48] When it was nally sociated with the 'higher clock-speed is better' position,
ready to run its rst real application in 1976, it was out- warned that traditional approaches to maximizing perfor-
performed by existing commercial supercomputers such mance through maximizing clock speed have been pushed
to their limits.
as the Cray-1.
[5] Concurrency is not Parallelism, Waza conference Jan 11,
2012, Rob Pike (slides) (video)
8 See also
[6] Parallelism vs. Concurrency. Haskell Wiki.
List of important publications in concurrent, paral- [7] Hennessy, John L.; Patterson, David A.; Larus, James
lel, and distributed computing R. (1999). Computer organization and design: the hard-
ware/software interface (2. ed., 3rd print. ed.). San Fran-
List of distributed computing conferences cisco: Kaufmann. ISBN 1-55860-428-6.
Concurrency (computer science)
[8] Barney, Blaise. Introduction to Parallel Computing.
Synchronous programming Lawrence Livermore National Laboratory. Retrieved
2007-11-09.
Content Addressable Parallel Processor
[9] Hennessy, John L.; Patterson, David A. (2002). Com-
Manycore puter architecture / a quantitative approach. (3rd ed.). San
Francisco, Calif.: International Thomson. p. 43. ISBN 1-
Serializability 55860-724-2.
[19] Lamport, Leslie (1 September 1979). How to Make a [39] Shimokawa, Y.; Fuwa, Y.; Aramaki, N. (1821
Multiprocessor Computer That Correctly Executes Multi- November 1991). A parallel ASIC VLSI neuro-
process Programs. IEEE Transactions on Computers. C computer for a large number of neurons and billion
28 (9): 690691. doi:10.1109/TC.1979.1675439. connections per second speed. International Joint
Conference on Neural Networks. 3: 21622167.
[20] Patterson and Hennessy, p. 748. doi:10.1109/IJCNN.1991.170708. ISBN 0-7803-0227-
3.
[21] Singh, David Culler ; J.P. (1997). Parallel computer ar-
chitecture ([Nachdr.] ed.). San Francisco: Morgan Kauf- [40] Acken, Kevin P.; Irwin, Mary Jane; Owens, Robert
mann Publ. p. 15. ISBN 1-55860-343-3. M. (July 1998). A Parallel ASIC Architecture
for Ecient Fractal Image Coding. The Jour-
[22] Culler et al. p. 15. nal of VLSI Signal Processing. 19 (2): 97113.
doi:10.1023/A:1008005616596.
[23] Patt, Yale (April 2004). "The Microprocessor Ten Years
From Now: What Are The Challenges, How Do We Meet [41] Kahng, Andrew B. (June 21, 2004) "Scoping the Prob-
Them? (wmv). Distinguished Lecturer talk at Carnegie lem of DFM in the Semiconductor Industry. Univer-
Mellon University. Retrieved on November 7, 2007. sity of California, San Diego. Future design for man-
ufacturing (DFM) technology must reduce design [non-
[24] Culler et al. p. 124. recoverable expenditure] cost and directly address man-
ufacturing [non-recoverable expenditures]the cost of a
[25] Culler et al. p. 125. mask set and probe cardwhich is well over $1 million
at the 90 nm technology node and creates a signicant
[26] Patterson and Hennessy, p. 713. damper on semiconductor-based innovation.
[27] Hennessy and Patterson, p. 549. [42] Patterson and Hennessy, p. 751.
[28] Patterson and Hennessy, p. 714. [43] The Sidney Fernbach Award given to MPI inventor Bill
Gropp refers to MPI as the dominant HPC communica-
[29] What is clustering? Webopedia computer dictionary. Re- tions interface
trieved on November 7, 2007.
[44] Shen, John Paul; Mikko H. Lipasti (2004). Modern pro-
[30] Beowulf denition. PC Magazine. Retrieved on Novem- cessor design : fundamentals of superscalar processors (1st
ber 7, 2007. ed.). Dubuque, Iowa: McGraw-Hill. p. 561. ISBN 0-
07-057064-7. However, the holy grail of such research
[31] Architecture share for 06/2007. TOP500 Supercomput- automated parallelization of serial programshas yet to
ing Sites. Clusters make up 74.60% of the machines on materialize. While automated parallelization of certain
the list. Retrieved on November 7, 2007. classes of algorithms has been demonstrated, such suc-
cess has largely been limited to scientic and numeric ap-
[32] Interconnect. plications with predictable ow control (e.g., nested loop
structures with statically determined iteration counts) and
[33] Hennessy and Patterson, p. 537. statically analyzable memory access patterns. (e.g., walks
over large multidimensional arrays of oat-point data).
[34] MPP Denition. PC Magazine. Retrieved on November
7, 2007. [45] Encyclopedia of Parallel Computing, Volume 4 by David
Padua 2011 ISBN 0387097651 page 265
[35] Kirkpatrick, Scott (2003). COMPUTER SCIENCE:
Rough Times Ahead. Science. 299 (5607): 668669. [46] Asanovic, Krste, et al. (December 18, 2006). The
doi:10.1126/science.1081623. PMID 12560537. Landscape of Parallel Computing Research: A View
from Berkeley (PDF). University of California, Berkeley.
[36] D'Amour, Michael R., Chief Operating Ocer, DRC Technical Report No. UCB/EECS-2006-183. See table
Computer Corporation. Standard Recongurable Com- on pages 1719.
puting. Invited speaker at the University of Delaware,
February 28, 2007. [47] Dobel, B., Hartig, H., & Engel, M. (2012) Operating sys-
tem support for redundant multithreading. Proceedings
[37] Boggan, Sha'Kia and Daniel M. Pressel (August 2007). of the tenth ACM international conference on Embedded
GPUs: An Emerging Platform for General-Purpose Com- software, 8392. doi:10.1145/2380356.2380375
putation (PDF). ARL-SR-154, U.S. Army Research Lab.
Retrieved on November 7, 2007. [48] Patterson and Hennessy, pp. 74950: Although success-
ful in pushing several technologies useful in later projects,
[38] Maslennikov, Oleg (2002). Systematic Generation of the ILLIAC IV failed as a computer. Costs escalated from
Executing Programs for Processor Elements in Parallel the $8 million estimated in 1966 to $31 million by 1972,
ASIC or FPGA-Based Systems and Their Transformation despite the construction of only a quarter of the planned
into VHDL-Descriptions of Processor Element Control machine . It was perhaps the most infamous of super-
Units. Lecture Notes in Computer Science, 2328/2002: computers. The project started in 1965 and ran its rst
p. 272. real application in 1976.
13
[49] Menabrea, L. F. (1842). Sketch of the Analytic Engine In- Designing and Building Parallel Programs, by Ian
vented by Charles Babbage. Bibliothque Universelle de Foster
Genve. Retrieved on November 7, 2007. quote: when
a long series of identical computations is to be performed, Internet Parallel Computing Archive
such as those required for the formation of numerical ta-
bles, the machine can be brought into play so as to give Parallel processing topic area at IEEE Distributed
several results at the same time, which will greatly abridge Computing Online
the whole amount of the processes.
Parallel Computing Works Free On-line Book
[50] Patterson and Hennessy, p. 753.
Frontiers of Supercomputing Free On-line Book
[51] R.W Hockney, C.R Jesshope. Parallel Computers 2: Covering topics like algorithms and industrial appli-
Architecture, Programming and Algorithms, Volume 2. cations
1988. p. 8 quote: The earliest reference to parallelism
in computer design is thought to be in General L. F. Universal Parallel Computing Research Center
Menabreas publication in 1842, entitled Sketch of the
Analytical Engine Invented by Charles Babbage". Course in Parallel Programming at Columbia Uni-
versity (in collaboration with IBM T.J Watson X10
[52] Parallel Programming, S. Gill, The Computer Journal Vol. project)
1 #1, pp2-10, British Computer Society, April 1958.
Parallel and distributed Grobner bases computation
[53] Wilson, Gregory V (1994). The History of the Devel- in JAS
opment of Parallel Computing. Virginia Tech/Norfolk
State University, Interactive Learning with a Digital Li- Course in Parallel Computing at University of
brary in Computer Science. Retrieved 2008-01-08. Wisconsin-Madison
[54] Anthes, Gry (November 19, 2001). The Power of Paral- OpenHMPP, A New Standard for Manycore
lelism. Computerworld. Retrieved 2008-01-08.
Berkeley Par Lab: progress in the parallel com-
[55] Patterson and Hennessy, p. 749. puting landscape, Editors: David Patterson, Dennis
Gannon, and Michael Wrinn, August 23, 2013
The trouble with multicore, by David Patterson,
10 Further reading posted 30 Jun 2010
Rodriguez, C.; Villagra, M.; Baran, B. (29 The Landscape of Parallel Computing Research: A
August 2008). Asynchronous team algorithms View From Berkeley (one too many dead link at this
for Boolean Satisability. Bio-Inspired Mod- site)
els of Network, Information and Computing Sys-
tems, 2007. Bionetics 2007. 2nd: 6669.
doi:10.1109/BIMNICS.2007.4610083.
11 External links
Go Parallel: Translating Multicore Power into Ap-
plication Performance
12.2 Images
File:AmdahlsLaw.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/ea/AmdahlsLaw.svg License: CC BY-SA 3.0 Con-
tributors: Own work based on: File:AmdahlsLaw.png Original artist: Daniels220 at English Wikipedia
File:Beowulf.jpg Source: https://upload.wikimedia.org/wikipedia/commons/8/8c/Beowulf.jpg License: GPL Contributors: ? Original
artist: User Linuxbeak on en.wikipedia
File:BlueGeneL_cabinet.jpg Source: https://upload.wikimedia.org/wikipedia/commons/a/a7/BlueGeneL_cabinet.jpg License: CC-BY-
SA-3.0 Contributors: ? Original artist: ?
File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: PD Contributors: ? Origi-
nal artist: ?
File:Cray_1_IMG_9126.jpg Source: https://upload.wikimedia.org/wikipedia/commons/6/6e/Cray_1_IMG_9126.jpg License: CC BY-
SA 2.0 fr Contributors: Own work Original artist: Rama
File:En-Parallel_computing.ogg Source: https://upload.wikimedia.org/wikipedia/commons/3/3b/En-Parallel_computing.ogg License:
CC BY-SA 3.0 Contributors:
Derivative of Parallel computing Original artist: Speaker: Mangst
Authors of the article
File:Fivestagespipeline.png Source: https://upload.wikimedia.org/wikipedia/commons/2/21/Fivestagespipeline.png License: CC-BY-
SA-3.0 Contributors: ? Original artist: ?
File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by-
sa-3.0 Contributors: ? Original artist: ?
File:Gustafson.png Source: https://upload.wikimedia.org/wikipedia/commons/d/d7/Gustafson.png License: CC BY-SA 3.0 Contributors:
Own work Original artist: Peahihawaii
File:IBM_Blue_Gene_P_supercomputer.jpg Source: https://upload.wikimedia.org/wikipedia/commons/d/d3/IBM_Blue_Gene_P_
supercomputer.jpg License: CC BY-SA 2.0 Contributors: originally posted to Flickr as Blue Gene / P Original artist: Argonne National
Laboratorys Flickr page
File:ILLIAC_4_parallel_computer.jpg Source: https://upload.wikimedia.org/wikipedia/commons/9/91/ILLIAC_4_parallel_
computer.jpg License: CC BY 2.0 Contributors: Flickr Original artist: Steve Jurvetson from Menlo Park, USA
12.3 Content license 15