Bellosa, Frank
:
Techniques for building a fast threads package on NUMA architectures.
Erlangen:
FAU.
1994
TR-I4-94-06.- Interner Bericht.
Abstract:
Operating system abstractions do not always reach high enough for
direct use by a language or applications designer. The gap is filled by
application-specific runtime environments. Typical arguments for their
use include complete user-level control over threads scheduling and
possibilities regarding the customization of threads synchronization or
communications constructs. Especially on NUMA architectures an
interface between scheduler and application is essential to overlap
computation and memory transfer.
We think about a nonpreemptive user-space threads package with an
application interface. The application should be able to get
information about scheduling decisions of the runtime system to invoke
prefetch operations. Furthermore efficient machine dependent code for
creating, running and stopping threads has to be provided by the
runtime system. By separating the notion of execution (starting and
stopping threads) from threads allocation and scheduling, changing
scheduling policies can be as simple as using different function
pointers and can be done efficiently at runtime. Thus details of the
threads package are not fixed, but can instead be tuned to the needs of
the application. To implement this package we want to follow a two
level approach: The lower level consists of assembler code for fast
thread initialization and context switching. The upper level is a
toolbox for building application specific schedulers and
synchronization operations. The kernel threads provided by the
operating system represent the "virtual processors" of the runtime
system. This kind of threads package can only work efficiently, if we
use gang-scheduled kernel threads in a multiuser environment or
individual-scheduled kernel threads in an environment with just one
running application on each processor set.
A fast threads package on NUMA architectures is the prerequisite for an
easy implementation of adaptive numerical methods on unstructured
grids. A first approach for an implementation is given in the next
section. Last but not least, a fast threads package can be the support
library for a compiler doing automatic parallelization.