Introduction to rdist

"Using rdist doesn't make sense unless you are dealing with a sufficiently large number of machines---let's say at least two." (Markus Moster)
Derived from: https://web.archive.org/web/20230516133631/http://www.benedikt-stockebrand.de/rdist-intro_en.html

Introduction

In a Nutshell

Synchronizing files and file hierarchies across multiple machines is an ever-recurring task in almost any reasonably-sized computer network environment. Since the arrival of BSD 4.3 a widely ignored tool called rdist is available to deal with this task. At least among machines running Un*x-style operating systems, that is.

Simple Real World Examples

What does rdist provide? The basic functionality is best demonstrated with some simplified real-world examples.

First consider a WAN connected through low-bandwidth (64kbit/s) links. At about 40 sites there exist general-purpose servers. These servers provide some software repository to say twenty clients each. Whenever a user feels like it s/he may install software from that repository on his/her desktop client. The software in the repository is frequently updated and contains about 500 MB worth of software. Using rdist you simply define one server as the "master" or "reference" machine. Whenever you've modified the repository on that machine you run rdist, for large modifications preferably overnight, and it updates the remote servers without further sysadmin intervention. Since rdist will only update files that have changed this provides a very bandwidth and time preserving way to deal with this distribution issue. If you ever fear that for some reason a repository got out of sync you re-run rdist. Unless things actually have changed this will have little impact on the servers or the bandwidth available in between. Of course the same approach works for the document roots of multiple web servers behind a load balancer, too.

Consider the same servers again. Most of their configuration is the same on all machines, or at least should be. If you set them up the same and then use rdist to distribute all configuration files that shouldn't be adjusted individually you won't have to update 40 /etc/profile's by hand (eventually getting things out of sync anyway) but rather do that only once and then run rdist. If you want to test new configurations you do so on only one machine. If things work out you run rdist from that machine, otherwise you can easily do a rollback by running rdist from one of the remaining functional machines. This has proven to be particularly useful in cluster environments where cluster members must be kept identically configured and no excuses. Guess what Markus Moster (see the quote above) is habitually working on.

If you are responsible for quality control in a software development team you may want to test the installation packages you get from the development team in a clearly defined test environment. Of course you might want to reinstall those machines either from backup or through an install server (like a Sun[TM] Solaris[TM] JumpStart[TM]). You might however want to find out what system files the package installation has messed up. If you define a reference file system somewhere rdist lets you do all this with a reasonable amount of trouble. And imagine the educational value of test machines that are always in a clearly defined state every morning.

In a security-sensitive environment where you need to maintain a consistent user base on multiple machines but can afford neither the security issues of NIS nor the hassles and Solaris-onliness of NIS+ you can easily set up one machine as the "master" passwd server and distribute whatever /etc/passwd, /etc/shadow, /etc/groups and such it has through cron to all other machines. New user accounts get created on that machine or people change their password on that machine and maybe a minute later the new account or password is active on all machines.

How rdist Works

How does rdist provide this functionality? Well, assume that you have a machine master that wants to copy some files to a list of machines slave_1 to slave_n. Here's what happens:

  1. You invoke rdist, possibly with some options.

  2. If you didn't specify directly what to copy where the first thing rdist does is to read a Distfile. This file tells rdist what to copy where.

  3. Now rdist starts a "daemon" program called rdistd on all slaves. By default it does so using rsh but anybody working in a remotely security-sensitive environment will tell it to use the Secure Shell (ssh) instead.

  4. Once the connection between the master rdist and slaves rdistds is up and running the master starts to stat(2) all files and file hierarchies specified in the distfile. It asks the slaves to do so as well. Whenever the stat(2) results prove different the file is sent from the master to the slave where it is installed.

  5. Once the master is through it tells the slave rdistd to terminate which will also shut down the connection between the machines.

This general behaviour can be modified in various ways. Among others you can:

Setting Up rdist

Here's an outline how to set up rdist to work in your environment. If you want to use it in conjunction with ssh make sure that ssh is running first.

  1. Get, compile and install an up-to-date version of rdist. Sources should be available from your favourite FTP site. At the time of this writing the latest version is 6.1.5. Note that many Un*xen come with an older version you shouldn't use. Make sure you install the "daemon" binary rdistd on all slave machines, preferably somewhere within the default PATH or at least in the same directory on all machines. If you install rdistd outside the default PATH you need to specify where to find it with the "-p path-to-rdistd" option.

  2. Set up the underlying transport and authentication mechanism. Choose between standard security-free rsh and non-standard secure ssh.

    1. By default rdist uses the traditional rlogin/rsh for this. If that's what you want you need to add the master machine to the ~/.rhosts or /etc/hosts.equiv or whatever file on all slave machines. And of course the slave machines have to run the rshd, which is usually started via inetd. Make sure that an "rsh slave hostname" will output the hostname of the slave.

    2. If you're in any way concerned about security you should set up ssh so that you can connect from the master to the slave machines. Make sure that the master knows about all slaves in its known_hosts file by ssh'ing to all slaves once. To use rdist with ssh you need to specify ssh as the underlying transport mechanism using the option "-P path-to-ssh".

  3. Now see if rdist works. First create some temporary file in /tmp to distribute:

    	  master$ cd /tmp
    	  master$ echo "Hello, world!" >distribute-me
    	

    To test our setup use the "-c" option which allows to specify a mini "distfile" on the command line and send it to a slave. Don't forget to add the "-p path-to-rdistd" and/or "-P path-to-ssh" options here if necessary.

    	  master$ rdist -c distribute-me slave:/var/tmp
    	

    This will distribute our file /tmp/distribute-me from the master to the /var/tmp directory on our slave.

  4. If you want to make life easier you may want to create a simple shell script that takes care of those "-p" and "-P" options. Try this one:

    	  #! /bin/sh
    	  SSH="`which ssh`"
    	  RDISTD="`which rdistd`"
    	  
    	  rdist -p "$RDISTD" -P "$SSH" "$@"
    	

    Adjust as needed, especially if you don't have ssh and/or rdistd in your path at all.

Now the environment is ready to use rdist. At least for one-shot jobs, that is. The real power of rdist is in its distfiles however, so read on.

Elementary Distfiles

Basics

A distfile defines what files and file hierarchies to copy where and with which options. You may specify a particular distfile using the "-f distfile" option. Otherwise rdist will default to a default filename "distfile", or else "Distfile", in the current working directory.

The syntax is somewhat similar to a standard makefile. Here's a distfile that'd copy our previous test file /tmp/distribute-me to some slave machine slave_1.

      # Distfile 1
      /tmp/distribute-me -> slave_1
          install /tmp/distribute-me;
    

This distfile tells rdist to install our file on slave_1 in its /tmp directory. In detail the first line specifies the source file and the destination machine. The second line instructs that the file specified needs to be installed on the slave machine.

Note that the opt_dest_name, as the manual page labels it, (i.e. the "e;/tmp/distribute-me"e; part) is optional, or may specify a different pathname for the destination host(s), or in certain cases it may be a directory. It is considered a a directory if one of the following conditions is true: