Parallel
computing is a form of computation in the which many calculations are
Carried out simultaneously, operating on the principle that large
problems can be divided into smaller Often ones, the which are then
solved concurrently ("in parallel"). There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallelism
has been employed for many years, mainly in high-performance computing,
but interest in it has grown lately due to the physical constraints
Preventing frequency scaling. As power consumption (and consequently heat generation) by computers
has Become a concern in recent years, parallel computing has Become the
dominant paradigm in computer architecture, mainly in the form of
multi-core processors.
A. Types of parallelism
Bit-level parallelism
From
the advent of very-large-scale integration (VLSI) computer-chip
fabrication technology in the 1970s until about 1986, a speed-up in
computer architecture was driven by doubling computer word size-the
amount of information the processor can manipulate per cycle. Increasing
the word size Reduces the number of instructions the processor must
execute to perform an operation on variables Whose sizes are greater
than the length of the word. For
example, where an 8-bit processor must add two 16-bit integers, the
processor must first add the 8 lower-order bits from each integer using
the standard addition instruction, then add the 8 higher-order bits
using an add-with -carry instruction and the carry bit from the addition of lower order; Thus, an 8-bit processor requires two instructions to complete a
single operation, where a 16-bit processor would be Able to complete the
operation with a single instruction.
Instruction-level parallelism
A canonical five-stage pipeline in a RISC machine (IF = Instruction
Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB =
Register write back)
A computer program, is in essence, a stream of instructions executed by a processor. These
instructions can be re-ordered and combined into groups roomates are
then executed in parallel without changing the result of the program. This is known as instruction-level parallelism. Advances in instruction-level parallelism dominated computer architecture from the mid-1980s until the mid-1990s.
Modern processors have multi-stage instruction pipelines. Each stage in the pipeline corresponds to a different action the processor performs on that instruction in that stage; a processor with an N-stage pipeline can have up to N different instructions at different stages of completion. The
canonical example of a pipelined processor is a RISC processor, with
five stages: instruction fetch, decode, execute, memory access, and
write back. The Pentium 4 processor had a 35-stage pipeline.
Task parallelism
Task
parallelism is the characteristic of a parallel program that "entirely
different calculations can be performed on either the same or different
sets of data". This contrasts with the data parallelism, where the same calculation is performed on the same or different sets of data. Task parallelism does not usually scale with the size of a problem.
B. Distributed computing
A
distributed computer (also known as a distributed memory
multiprocessor) is a distributed memory computer system in the which the
processing elements are connected by a network. Distributed computers are highly scalable.
C. Architectural Parallel computer
A logical view of a Non-Uniform Memory Access (NUMA) architecture. Processors in one directory can access that directory's memory with
less latency than they can access memory in the other directory's
memory.
D. Parallel programming languages
Concurrent
programming languages, libraries, APIs, and parallel programming models
(such as Algorithmic Skeletons) have been created for programming
parallel computers. Generally
these can be divided into classes based on the makeup they Assumptions
about the underlying memory architecture-shared memory, distributed
memory, shared or distributed memory. Shared memory programming languages communicate by manipulating shared memory variables. Distributed memory uses message passing. POSIX
Threads and OpenMP are two of most Widely used shared memory APIs,
whereas Message Passing Interface (MPI) is the most Widely used
message-passing system APIs. One concept used in programming parallel programs is the future
concept, where one part of a program promises to deliver a required
datum to another part of a program at some future time.
CAPS
entreprise and Pathscale are also coordinating their effort to the make
HMPP (Hybrid Multicore Parallel Programming) directives OpenHMPP called
an Open Standard. The
OpenHMPP directive-based programming model offers a syntax to
efficiently offload computations on hardware accelerators and to
optimize data movement to / from the hardware memory. OpenHMPP directives describe remote procedure call (RPC) on an accelerator device (eg GPU) or more Generally a set of cores. The directives annotate C or Fortran codes to describe two sets of
functionalities: the offloading of procedures (denoted codelets) onto a
remote device and the optimization of the data transfers between the CPU
main memory and the accelerator memory.
E. introductory programming cuda gpu
CUDA
(Compute Unified Device Architecture) is a parallel computing platform
and programming model created by NVIDIA and implemented a by the
graphics processing units (GPUs) that they produce. CUDA program gives developers direct access to the virtual instruction
set and memory of the parallel computational elements in CUDA GPUs.
Using CUDA, the GPUs can be used for general purpose processing (ie, not exclusively graphics); this approach is known as GPGPU. Unlike CPUs however, GPUs have a parallel throughput architecture that
emphasizes executing many concurrent threads slowly, rather than
executing a single thread very quickly intervening.
General-Purpose
Computing on Graphics Processing Units (GPGPU, rarely GPGP or GP ² U)
is the utilization of a graphics processing unit (GPU), the which
typically handles computation only for computer graphics, to perform
computation in applications traditionally handled by the central
processing unit ( CPU).
Any GPU providing a functionally complete set of operations performed on arbitrary bits can compute any computable value. Additionally, the use of multiple graphics cards in one computer, or
large numbers of graphics chips, further parallelizes the already
parallel nature of graphics processing.
OpenCL is the currently dominant open general-purpose GPU computing language. The dominant framework is Nvidia's proprietary CUDA.
References: http://en.wikipedia.org/wiki/