Distributed Deadlock Detection
Distributed Deadlock Detection
Deadlock can occur whenever two or more processes are
competing for limited resources and the processes are allowed to acquire and hold
a resource (obtain a lock) thus preventing others from using the resource while
the process waits for other resources.
Two common places where deadlocks may occur are with
processes in an operating system (distributed or centralized) and with transactions
in a database. The concepts discussed
here are applicable to any system that allocates resources to processes.
Locking protocols such as the popular Two Phase Locking
(see concurrency control) give rise to deadlock as follows: process A gets a
lock on data item X while process B gets a lock on data item Y. Process A then tries to get a lock on
Y. As Y is already locked, process A
enters a blocked state. Process B now
decides to get a lock on X, but is blocked.
Both processes are now blocked, and, by the rules of Two Phase Locking,
neither will relinquish their locks.
This is a deadlock: process A is waiting for a resource
held by process B and process B is waiting for a resource held by process
A. No progress will take place without
outside intervention. Several processes
can be involved in a deadlock when there exists a cycle of processes waiting
for each other. Process A waits for B which waits for C which waits for A.
Four conditions must hold for deadlock to occur:
1. Exclusive
use – when a process accesses a resource, it is granted exclusive use of that
resource.
2. Hold
and wait – a process is allowed to hold onto some resources while it is waiting
for other resources.
3. No
preemption – a process cannot preempt or take away the resources held by
another process.
4. Cyclical
wait – there is a circular chain of waiting processes, each waiting for a
resource held by the next process in the chain.
The structure of the system may allow additional
complexities in the deadlock problem.
The simplest model, single-resource,
requires that a process have no more than one unfulfilled request. Thus a blocked process is waiting for only
one other process and can be involved in at most one deadlock cycle.
In the AND model (also called the multiple-resource model),
a process is allowed to make several resource requests, and it is blocked until
all of the requests are granted.
Processes in this model can be involved in several deadlock cycles at
once.
In the OR model (also called the communication model), a process
makes several requests and is blocked until any one of them is granted. The AND-OR model allows a combination of
request types, such as a request for resource X and either Y or Z. Unless specifically stated, this article
assumes the AND model.
The problem of deadlocks can be handled in several ways:
Prevention, Avoidance, and Detection.
In prevention, some requirement of the system makes deadlocks impossible
so that no runtime support is required.
Avoidance schemes require decisions by the system while it is running to
insure that deadlocks will not occur.
Detection requires the most sophisticated runtime support: the system
must find deadlocks and break them by choosing a suitable victim that is terminated or aborted
and restarted if appropriate.
Deadlock Prevention
Prevention is the name given to schemes that guarantee
that deadlocks can never happen because of the way the system is
structured. One of the four conditions
for deadlock is prevented, thus preventing deadlocks. One way to do this is to make processes declare all of the
resources they might eventually need, when the process is first started. Only if all the resources are available is
the process allowed to continue. All of
the resources are acquired together, and the process proceeds, releasing all
the resources when it is finished.
Thus, hold and wait cannot
occur.
The major disadvantage of this scheme is that resources
must be acquired because they might be used, not because they will be
used. Also, the pre-allocation
requirement reduces potential concurrency.
Another prevention scheme is to impose an order on the
resources and require processes to request resources in increasing order. This prevents cyclic wait and thus makes deadlocks impossible.
One advantage of prevention is that process aborts are
never required due to deadlocks. While
most systems can deal with rollbacks, some systems may not be designed to
handle them and thus must use deadlock prevention.
Deadlock Avoidance
In deadlock avoidance the system considers resource
requests while the processes are running and takes action to insure that those
requests do not lead to deadlock.
Avoidance based on the banker's
algorithm, sometimes used in centralized systems, is considered not
practical for a distributed system. Two
popular avoidance algorithms based on timestamps or priorities are wound-wait and wait-die. They depend on
the assignment of unique global timestamps or priority to each process when it
starts. Some authors refer to these as
prevention.
In wound-wait, if process A requests a resource currently
held by process B, their timestamps are compared. B is wounded and must
restart if it has a larger timestamp (is younger) than A. Process A is allowed to wait if B has the
smaller timestamp. Deadlock cycles
cannot occur since processes only wait for older processes. In wait-die, if a request from process A
conflicts with process B, A will wait if B has the larger timestamp (is
younger). If B is the older process, A
is not allowed to wait, so it dies
and restarts.
In timeout based avoidance, a process is blocked when it
requests a resource that is not currently available. If it has been blocked longer than a timeout period, it is
aborted and restarted. Given the
uncertainty of message delays in distributed systems, it is difficult to
determine good timeout values.
These avoidance strategies have the disadvantage that the
aborted process may not have been actually involved in a deadlock.
Deadlock Detection
Deadlock detection attempts to find and resolve actual
deadlocks. These strategies rely on a
Wait-For-Graph (WFG) that in some schemes is explicitly built and analyzed for
cycles. In the WFG, the nodes
represent processes and the edges represent the blockages or dependencies. Thus, if process A is waiting for a resource
held by process B, there is an edge in the WFG from the node for process A to
the node for process B.
In the AND model (resource model), a cycle in the graph
indicates a deadlock. In the OR model,
a cycle may not mean a deadlock since any of a set of requested resources may
unblock the process. A knot in the WFG is needed to declare a
deadlock. A knot exists when all nodes
that can be reached from some node in a directed graph can also reach that
node.
In a centralized system, a WFG can be constructed fairly
easily. The WFG can be checked for
cycles periodically or every time a process is blocked, thus potentially adding
a new edge to the WFG. When a cycle is
found, a victim is selected and aborted.
Distributed Deadlock
A distributed system consists of a number of sites
connected by a network. Each site
maintains some of the resources of the system.
Processes with a globally unique identifier run on the distributed
system. They make resource requests to
a controller. There is one controller
per site. If the resource is local, the
process makes a request of the local controller. If the desired resource is at a remote site, the process sends a
message. After a process makes a
request, but before it is granted, it is blocked and said to be dependent on the process that holds the
desired resource.
The controller at each site could maintain a WFG on the
process requests that it knows about.
This is the local WFG. However,
each site's WFG could be cycle free and yet the distributed system could be
deadlocked. This is called global deadlock. This would occur in the following
situation:
1. Process A at site 1 holds a lock on resource X.
2. Process A has requested, but has not been granted,
resource Y at site 2.
3. Process B at site 2 holds a lock on resource Y.
4. Process B has requested, but has not been granted,
resource X at site 1.
Both processes are blocked by the other one. There is a global deadlock. However, the deadlock will not be discovered
at either site unless they exchange information via some detection scheme.
Distributed Detection Schemes
A detection scheme is evaluated by two criteria: 1) If
there exists an actual deadlock, it must be detected in a finite amount of
time, and; 2) The scheme must not find a deadlock that is not actually there.
One way to detect deadlocks in distributed systems is for
each site to construct a local WFG on the part it knows about. A site that suspects deadlock initiates a
global snapshot protocol that constructs a consistent global state of the
distributed system (see Distributed Snapshots). The global state corresponds to a deadlock if the union of the
local WFGs has a cycle. This algorithm
is correct because deadlock is a stable property.
Comments
Post a Comment