Last change: 7.4.1995
Distributed Shared Memory
Two main attempts to solve the problems arising with the DSM approach have
been made:
- building hardware, that supports DSM
- implementing the DSM mimics in software
A global survey across all kinds of DSM systems is "A. Mohindra,
U. Ramachandran, A Survey of
Distributed Shared Memory in Loosely-coupled Systems". But the report is
from 1991 and thus is a little outdated.
A paper discussing the lack of user acceptance of current DSM systems is
"John B. Carter, Dilip Khandekar, Linus Kamb,
Distributed Shared Memory: Where We Are and Where We Should Be Headed".
Theoretical aspects of DSM systems, mainly about sequential consistency memory
models, can be found in "M. Mizuno, M. Raynal, J.Z. Zhou,
Sequential Consistency in Distributed
Systems: Theory and Implementation".
Case Studies
The following case studies provide an overview about software based DSM systems.
IVY
One of the first designs ever made for a DSM runtime system was
IVY. It was implemented at the
Yale University and provides the abstraction
of two classes of memory: private and shared.
IVY uses the write invalidate update protocol and implements
multiple reader - single writer semantics. The granularity of
access is a 1Kbyte page - for access detection to shared memory locations the
virtual memory primitives are used. Write accesses and first read accesses to
a shared page cause page faults; the page fault handler aquires the page from
the current holder. Using the mentioned technics, IVY provides a
strictly consistent memory model.
Three page management implementations were integrated into IVY:
- centralized manager scheme
- fixed distributed manager scheme
- dynamic distributed manager scheme
In all three implementations the double fault problem is
inherent. Successive read and write accesses to a page on a single node cause
the page to be transferred twice. The authors provide a scheme to eliminate
this problem using sequence numbers for every shared page.
IVY's synchronisation primitives which are needed to serialize concurrent
accesses to shared memory locations, are eventcounts. These
eventcounts are atomic operations on shared counters which are implemented
through the system's shared memory semantics.
Mirage
Mirage extends the IVY mechanisms by introducing a time interval, a page
is pinned to a certain processor. During this interval, the ownership of the
page will not be forwarded to another processor. This avoids page
thrashing if two processors reference a single page repeatedly.
Clouds
Clouds enables the programmer to define "pin intervals" to certain shared
data segments. It also allows the reduction of the shared memory granularity to
the needs of the application. A paper further describing the Clouds programming
model and distribution mechanisms is "M. Ahamad, et. al,
Shared Memory Programming in a
Distributed System".
Munin
Munin attacks the main problems in conventional DSM systems
with four techniques:
- software release consistency
- multiple consistency protocols
- write-shared protocols
- update-with-timeout mechanisms
These techniques mostly deal with reducing the communication overhead and
lowering message counts caused by
- double faults and
- false sharings.
Munin provides distinct consistency protocols for these types of access
patterns:
- conventional (single-writer, multiple-reader)
- read-only (replication on demand)
- migratory (write-access on first access)
- write-shared (program driven synchronisation)
All these issues are discussed in
"John B. Carter, University of Utah; John K. Bennett and Willy Zwaenepoel, Rice University, Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory Systems".
Detailed implementation issues are presented in
"J.B. Carter, Design of the Munin Distributed Shared Memory System" and.
"J.B. Carter, et al., Implemenation and Performance of Munin".
A new kind of consistency model for DSM systems called lazy release
consistency (LRC) is currently evaluated in Munin and TreadMarks.
P. Keleher wrote his Ph.D. thesis called
"Lazy Release Consistency for Distributed Shared Memory"
about these issues. LRC reduces memory coherence related commucication with
similar mechanisms as entry consistency developed for the Midway system.
The thesis discusses LRC in very much detail heavily dealing with performance
and correctness issues.
Midway
In the Midway project
launched at The Midway Distributed Shared Memory System".
The write detection mechanism is described in
"Matthew J. Zekauskas, Wayne A. Sawdon and Brian N. Bershad,
Software Write Detection for a Distributed Shared Memory".
And finally the concept of entry consistency is further discussed in
"Brian N. Bershad, Matthew J. Zekauskas,
Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors".
A not supported snapshot of the Midway code is available
here.
Erich Meier, Uni Erlangen, 1995