Next: Redirection Mechanisms Up: A DNS-based Client Redirector Previous: Contents Contents

Problem description

With the continuous growth of the Internet there is also an increase in the need for replication of popular (and thus frequently-accessed) services. The existence of multiple copies of a service can increase its availability, as we reduce chances that none of the copies can be reached. More importantly, however, replication also aims at reducing the delay between sending a request to the service and having the answer retrieved. Since the actual data transfer accounts for about 80% of the total request time, accessing a nearby copy is expected to have a significant impact on performance [26]. The key problem here is how to select a service replica that is closest to a given client. Another problem that we need to solve is how to efficiently direct the client to the selected replica.

In this thesis we assume that a service has one home address. The home machine can redirect each client to a selected service replica that is likely to service it most efficiently. An important observation is that it is possible for the home machine to never run the service itself. Instead, it can execute only a redirection mechanism and each time let the selected replica do the main part of the job. As it will be shown in this thesis, the acceptance of this assumption makes the redirection problem a great deal easier.

The redirection problem can be split into the following two components: a redirection mechanism, and a redirection policy. The redirection mechanism (or, simply, the redirector) has three main duties. Firstly, it has to keep track of existing replicas, which can join or leave the group frequently. Each replica is described by a list of services it provides, thus allowing the redirector to isolate a subset of replicas supporting a specific service. The actual operations of inserting and removing replicas from the group are initiated by an external managerial component, for which an appropriate interface is provided.

Secondly, the redirector takes care of gathering information that can be useful for accurate replica selection. Both the type and the amount of the information depend on the redirection policy that is going to be used. It may include, for example, network load, processing capabilities of each replica, its availability, any QoS constraints it can meet, etc.

Thirdly, the redirector must react to incoming client queries. Each query contains a service identifier which is used to determine the subset of replicas supporting the service. Knowing this subset, the address of the client, and all the previously gathered information, we can apply a redirection policy to choose one or more ``best'' replica(s). Once they are known, the addresses of the respective replicas are sent back to the client, who can subsequently connect to any of the replicas to use the actual service.

While fulfilling the above three duties, the redirector has to adhere to a number of non-functional requirements. The most important one is that the redirection is transparent for the clients. As long as they are not aware of being switched between different replicas they can transparently access the service as any other, non-replicated one. With non-transparent redirection, references remain bound to a specific replica. Although some time later a different replica can become preferable for that client, it will keep on using the original one, which can result in performance loss. Worse yet, if this original replica has been removed from the set of replicas, the client can no longer use his reference. Instead, we want clients to be able to save references on replica A, and use them later, despite physically working with replica B this time.

Another requirement, which is somewhat related to transparency, is that the redirector makes use of network protocols that are understood by existing clients. In this way we ensure that our users do not need to change their software.

One more requirement is that the redirector is scalable - it can service a huge number of clients. Since the service has been replicated due to its popularity, the number of its clients is likely to reach extremely high values.

The last requirement is that the complete redirection mechanism is easy to deploy on a wide range of network architectures. To ensure portability, it has to be independent of both the underlying hardware and the network structure. Moreover, it cannot be too complicated, because it is going to be installed, configured, and maintained by people with possibly no detailed knowledge of computer networks.

Apart from the redirection mechanism we also need a redirection policy. It has a few important features as well. The most important one is that the method of selection is accurate. In general, it is difficult to predict which replica will service a given client most efficiently, as the network conditions and the load of replicas change continuously. Thus, in many cases, the policies will only provide approximations. The criteria used to evaluate quality of service depend on the service characteristics, and are usually hard to determine.

Another feature of the redirection policy is that it must be capable of selecting more than one replica for a given client, and rank them according to the predicted quality of service. By doing that the policy cooperates with the client-side redirection mechanisms, as they can still apply their own criteria of replica selection to the set of replicas chosen by the policy. Moreover, selecting several replicas increases the service availability (if the first replica is unreachable, it is still possible for the client to access another one).

Each result can also be associated with a lease period for which it is known to remain valid. In this way we allow the results to be cached by the client, thus off-loading the redirector and reducing the time needed to obtain the service address.

Both the redirection mechanism and the redirection policy have to work fast. The complete redirection process must not generate significant overhead on the request-reply delay, as its main goal is to minimize it. Using non-local information or performing complex computations while servicing a client request is therefore prohibited.

We plan to use the redirector as part of another project of ours, called Globule [18]. Globule is a platform for self-replicating Web documents. It automates all aspects of document replication, such as creating and destroying replicas, and maintaining their consistency. Globule is designed as a module for the Apache HTTP server. To ensure that the redirection mechanism easily cooperates with other Globule components, we decided to implement it as part of Apache, too.

This thesis is structured as follows. In Chapter 2 we discuss advantages and drawbacks of three widely used redirection mechanisms. We compare them and show that DNS redirection comes closest to the objectives described above. In Chapter 3 we present our experience with implementing a DNS redirector inside Apache. Chapter 4 treats of redirection policies and different kinds of measures that can be useful for replica selection. Finally, Chapter 5 describes the efficiency tests we made, and Chapter 6 concludes.

Next: Redirection Mechanisms Up: A DNS-based Client Redirector Previous: Contents Contents

root 2002-08-27