| Russ Allbery > Technical Notes > Debian | Debian Packaging Tools > |
Introduction
File System Layout
Packages vs. Templates
Packaging Issues
Template Layout
The Build Script
The Bundle
Log Rotation and Filtering
This document describes the standards for running infrastructure servers on Debian GNU/Linux at Stanford University. The policies and standards here are in many respects specific to Stanford, although other sites may find some of our decisions of interest. It will be kept up-to-date here as we modify and evolve our procedures.
One of the goals in moving to Debian is to minimize the amount of custom work that we need to do for servers. Under Solaris, we built nearly all software that we used from source and made extensive changes to the operating system configuration, mostly not using the software that came with Solaris. With Debian, the goal is to use already-packaged software as much as is possible, to use the native Debian configuration methods and file locations, and to only do work the work that is necessary and directly related to the differences in Stanford's environment.
One implication of this is that software will be managed as Debian packages wherever possible. Not only does this let us easily move to the native Debian package when our local customized package isn't necessary, but it lets us use the Debian packaging system and package tools to manage upgrades and system maintenance rather than inventing our own procedures. As a general rule, any compiled software should be built as a Debian package, as should any scripts and configuration which are usable by more than one service. Those Debian packages should be managed like any other Debian package, following the Debian policy standard to the degree that it is applicable, with proper changelog entries. This will make it easier for us to contribute packages back to Debian where possible.
Exceptions to this policy may be made for software that is extremely difficult to package for some reason (an Oracle server, for instance), or software that uses its own automatic update features and for which maintaining a package would just mean pointless extra work (such as Sophos PureMessage).
Production servers are expected to be running the current stable version of Debian, including whatever updates have been released. Some test and development boxes may run Debian unstable or testing. Special attention must be given to keeping those systems up to date with relevant security fixes.
There are innumerable different ways of organizing software and services in the file system that all make sense. Some are better in some situations and others are better in other situations. However, standardization on a workable single layout across all services has more advantages than picking a layout particularly well-suited for a particular application. Furthermore, Debian itself already enforces a standard for all native packages.
Accordingly, we will stick as closely as possible to the Filesystem Hierarchy Standard for all software managed as Debian packages. This will make our local packages and configuration consistent with what Debian does, make it easier to contribute packages back to Debian, allow us to use normal checking tools like lintian that assume FHS compliance, and make it easier to train new staff.
Specifically, this means that user binaries will go in /usr/bin,
daemons and other sysadmin-only binaries will go in /usr/sbin,
architecture-independent data goes in /usr/share, libraries and
architecture-dependent support binaries, modules, or data go in
/usr/lib, data modified by the service during normal operation will
go into the appropriate directory in /var, PID files will go into
/var/run, and logs will go into /var/log. See the standard
document referenced above for all of the exact details.
If we have scripts that aren't worth packaging as a regular Debian
package, they can be installed in /usr/local/bin with manual pages
in /usr/local/man. However, this should be minimized; where
reasonable, scripts should be generalized and packaged as regular Debian
packages to allow for easier upgrades, testing, and revision control.
Scripts that do live in /usr/local/bin may look for configuration
files in either /etc or in /usr/local/etc as seems the most
appropriate for that script.
Executables should not have extensions indicating what language they're
written in. Users don't care that a given executable is implemented as a
Perl script, and invocations of executables end up in other scripts and
aliases that would then break if the implementation language changes.
Avoid any extensions like .pl or .sh.
Normally, we won't have extensive service-specific data that will be
served off local disk and won't fit naturally into /var/lib or the
like. However, some applications will have this (such as a Subversion
server or a streaming media server). The FHS has a provision for these
sorts of applications, namely the /srv directory, which is a
location for site data that should be independent of the package
management system. This is a fairly new provision, but we should try to
use it and only move away from it if it really doesn't work.
The following exceptions to the FHS are permitted:
/service
We run svscan on all of our systems to provide a simple and
easy way of running monitored services. Rather than try to modify
daemontools to make it FHS-compliant, we use it stock, which means
that the control files for svscan will be in /service.
/command
To avoid too many unnecessary divergences, we let djb's daemontools
install itself in /command (and likewise with other djb
software that has been modified to use that packaging system).
/apps
For Java applications that we have not yet packaged or that are too
difficult to package for some reason, we may still install them into
/apps with their own startup scripts and log directory. This
is not a good long-term solution. Long-term, we need to figure out
how to properly package Java applications so that we can use all of
our normal packaging infrastructure to configure and deploy them.
However, in the short term, this will likely be more trouble than it's
worth.
Any additional exceptions should be kept to a minimum and only used for cases where it's very difficult to modify the software to comply with FHS and there is little gain in doing so. For internally-developed software, we should strive for making it FHS-compliant as much as possible.
Note that while we are maintaining symlinks in /etc/leland for
backward compatibility reasons, we're moving away from /etc/leland
towards putting configuration files in /etc like all normal Debian
software does. This particularly includes the Kerberos configuration
files, which work fine with the standard Debian Kerberos packages. All of
krb5.conf, krb5.keytab, krb.conf, krb.realms,
and srvtab must be present in /etc as well as (only if
needed for backward compatibility) /etc/leland so that the native
Kerberos packages will work properly.
For right now, we're also using /etc/leland as a place to put
additional srvtabs and keytabs used by the system. This isn't an
unreasonable place to put such files, although this may change later to
something like /etc/keytabs, depending on whether the migration
trouble seems worth it.
Each of our services will have a template, in /afs/ir/service,
which contains the scripts, bundles, and
configuration files to build a new instance of that service. For each
service we run under Debian, we have to decide what files and
configuration should go into the template and what should go into Debian
packages.
As stated above, the basic rule of thumb is that all compiled software and non-trivial scripts should go into Debian packages so that we can use the package management system to upgrade and maintain them. Similarly, any software that's used across multiple services should be made into a Debian package so that it can easily be reused. For other components of a service, the question can be somewhat more ambiguous.
If a service needs to have a Debian package (generally named something like stanford-service to make it clear that it's Stanford-specific), it's easiest to put everything service-related that can be put into that package there. That way, as much of the configuration as possible is all in the same place. The exception is that we don't want to produce multiple versions of the Debian package for different tiers of servers (production, preprod, test, development), so any configuration that needs to change according to the tier of the service needs to live in the template. (This is far easier than prompting during the installation of the package.) The service should be structured so that the quantity of those configuration files is kept to an absolute minimum, preferrably only a single file, so that as much of the configuration as possible can be kept generic.
The Debian package for a service should also only contain things specific to that service, not more general system configuration. Most of the generic configuration will be maintained in the stanford-server Debian package and in the bundles installed off of the Debian build system, but if configuration for other daemons (such as lbcd) is required, this should go into the template rather than in the Debian package for a service. Also, all SSL certificates, srvtabs, and keytabs should be downloaded by the build script in the template area rather than by a Debian package.
Modifying files that are installed by other packages in a Debian package
is difficult and prone to trouble, and should be avoided unless necessary.
As a result, it is preferrable to maintain overrides to configuration
files for other packages in the template rather than in the Debian
package. If a Debian package does modify the configuration of another
Debian package, it must use dpkg-divert appropriately to avoid
packaging conflicts.
If the service doesn't warrant its own package (if, for example, we can use stock Debian packages), it's fine to put more borderline configuration or small scripts into the template area to avoid the overhead of maintaining a separate Debian package. Beware of letting the template become too large, though; as soon as it has acquired comprehensive configuration or extensive scripts that are infrequently updated and are better maintained by a Debian package, a Debian package for that service should be created. Note that having a Debian package is also convenient if the packages that make up a service are complex, since the Debian package for the service can depend on all of the other required services, making it easier to install.
Debian has excellent documentation of how to create Debian packages, and anyone learning how to create Debian packages should read that documentation first. The three primary guides are the New Maintainer's Guide, the Debian Policy Manual, and the Debian Developer's Reference. The New Maintainer's Guide is where you start; Policy is more of a detailed reference manual. The Developer's Reference contains useful information about packaging, but also explains how Debian itself is managed, how to use Debian's bug tracking system, and similar topics.
All of our packages, even the ones that are purely local to Stanford, should comply as closely as possible to Debian policy, since that's what makes Debian a consistent operating system and keeps the overall quality high.
All of our packages should be lintian-clean, which means that running
lintian on the package should not produce any output. Some lintian
errors are unavoidable, particularly for packages that cannot comply with
Debian policy for whatever reason. Those packages should include lintian
overrides to silence the error messages.
If there is any software that we've developed locally that would be useful for people at other sites, we should always try to clean it up, make it sufficiently generic that someone else could use it, and package it independently of anything Stanford-specific. Even if we can't get it completely generic, it's worthwhile to get as far as we can without spending too much time on it, since someone else may always have the chance to finish it later.
By making it generic, I mean that it should expect configuration files in
/etc or a directory in /etc like normal Debian software,
comply with Policy and the FHS (although all of our packages should do
that), have an appropriate license and a correct copyright file,
have manual pages, and have all of the other components of a regular
Debian package.
For software we maintain locally and make releases of with regular version
numbers (things like newsyslog, kstart, or remctl), we maintain the Debian
packaging files in a debian directory in the regular CVS tree for
that package. Any changes outside of the debian directory warrant
a new, regular release of the package and a new Debian build with Debian
version -1. Changes only to the files in debian can be made
without a regular release and with a regular increment of the Debian
version. The Debian packaging files are not included in the regular
release of the package, following Debian packaging best practices, but
it's convenient to have them in the same CVS repository.
If we use software that we don't maintain but that hasn't been packaged for Debian, we should package it following the normal rules and best practices for packaging software for Debian. These packages should be maintained using svn-buildpackage and the Debian Subversion repository on subversion.stanford.edu.
Packages specific to Stanford should generally have names beginning with "stanford-" and should only contain the Stanford-specific bits of configuration and scripts that are not generally usable at other sites. When possible, more generic software should live in separate packages with more generic names and the Stanford-specific package should depend on the more generic packages.
One very useful use of Stanford-specific packages is to have them depend on all of the regular packages that a particular service needs. This is useful enough that it may even be worthwhile creating an empty Stanford-specific package for some services, just to manage the dependencies.
While it is not as important for Stanford-specific packages to comply with Debian policy in every respect, every effort should still be made to keep them compliant. This is both for consistency and so that standard packaging tools like lintian can be used to check for real errors without producing a lot of noise.
dpkg-divert
Much Debian software supports configuration by dropping additional files
into a particular directory (see, for instance,
/etc/apache2/conf.d), precisely so that multiple packages can add
their own relevant configuration without having to modify the main
configuration files. This mechanism should be used whenever possible,
since it's a significant additional hassle to manage cases where multiple
packages provide the same configuration file.
Sometimes, however, it's necessary to override a configuration file
provided by another package with one that contains Stanford-specific
configuration. In this case, the dpkg-divert command should be run
in the preinstall and post-remove scripts of the Stanford-specific package
to add and remove a diversion for that configuration file.
The basic idea of a diversion is to allow one package to provide the same
file as is provided by another package by forcing the other package's file
to be installed under a different name. For more information, see the
dpkg-divert manual page.
Here is an example preinst script from a Stanford-specific Debian
package named stanford-weblogin, which diverts the files
/etc/webkdc/token.acl and /etc/webkdc/webkdc.conf:
#!/bin/sh
# preinst for stanford-weblogin. Divert some of the standard
# configuration files from libapache2-webkdc and libwebkdc-perl so
# that we can install the Stanford ones.
set -e
if [ "$1" = install -o "$1" = upgrade ] ; then
dpkg-divert --add --package stanford-weblogin --rename \
--divert /etc/webkdc/token.acl.generic /etc/webkdc/token.acl
dpkg-divert --add --package stanford-weblogin --rename \
--divert /etc/webkdc/webkdc.conf.generic /etc/webkdc/webkdc.conf
fi
#DEBHELPER#
exit 0
Here is the corresponding postrm script:
#!/bin/sh
# postrm for stanford-weblogin. Remove the diversions added for the
# generic configuration files.
set -e
if [ "$1" = remove ] ; then
dpkg-divert --remove --package stanford-weblogin --rename \
--divert /etc/webkdc/token.acl.generic /etc/webkdc/token.acl
dpkg-divert --remove --package stanford-weblogin --rename \
--divert /etc/webkdc/webkdc.conf.generic /etc/webkdc/webkdc.conf
fi
#DEBHELPER#
exit 0
When creating packages that need diversions, use the above examples as models, changing the package name of the Stanford-specific package and, of course, the file names to divert as appropriate. You can also do the equivalent of the above in the build script if a Stanford-specific package isn't warranted.
All services should have a template in /afs/ir/service. See the
instructions in the README file in that directory for more
information on the layout and how it should be used.
Inside the template for a particular service should be at least the
following elements: a script to install the files needed for a particular
service, named something like build-service; a bundle to
install whatever files are maintained in the template area, traditionally
named setup.b; and a (possibly very brief) README file that
explains the purpose of the template.
The preferred layout of the template is to organize files within the
template as a mirror of where they will be installed on disk. In other
words, a replacement for /etc/syslog.conf on the system would be in
a directory named etc in the template, and scripts to be installed
in /usr/local/bin should be in a directory named
usr/local/bin in the template. This has several advantages: it
makes it easier to find the file in the template corresponding to a system
file, it requires less thought in figuring out how to organize the
template and therefore makes template organization more consistent, and it
potentially allows templates to be easily turned into Debian packages
using rsync or the like.
Some older templates instead have directories like system (for
system configuration like log rotation), scripts, acl, and
so forth and use the bundle to put things in the right place. This is
still acceptable, and may be preferrable if there are a lot of scripts or
ACL files, but is not as commonly used and probably shouldn't be used for
new templates without a good reason.
The build script should be a Bourne or bash shell script that does everything required to turn a generic build into a server for a particular service. This includes installing all required packages, running the appropriate bundles in the template area, installing the required keytabs, srvtabs, and SSL certificates, and installing iptables rules as required by that service.
The build script must never change directories into the template directory, since there may be multiple versions of the template and the one being installed may not be the current version. Instead, it should assume that it's being run from the top level of the service template, and any directory changes that it has to do internally should be done in a subshell (in other words, wrap the set of commands requiring a directory change in parentheses).
The build script should start with set -e so that it will exit
automatically if any command it runs fails. This will make it easier to
pinpoint failures and to recover from them.
The build script should begin with:
. /usr/pubsw/lib/build-scripts/functions
so that it can use the various generic functions we've developed for use in these build scripts.
Avoid defining new functions in the build script and then calling them; instead, just do everything in linear order. This is easier to read. If there are particular things that are just crying out to be functions, put them into the global function library instead.
If the service has multiple supported tiers (production, preprod, etc.), the build script should either prompt for the tier (making sure that the entered value is valid) or, more commonly, should prompt for the hostname that the system is being built as and then use that to determine the appropriate tier. The build script can then generate appropriate configuration files or run appropriate bundles based on the tier, as well as use the tier information to chose the right keytabs, srvtabs, and SSL certificates to install.
Keytabs and srvtabs should be installed with the get_keytab function. This function takes three arguments: the name of the principal (must be in K4 format, not K5 format), the path to where to put the keytab, and the path to where to put the srvtab. The last argument may be omitted if no srvtab is wanted (for webauth principals, for instance). For example:
get_keytab webauth.www /etc/webauth/keytab
The needed Debian packages should be installed with apt-get.
Normally, apt-get update should be run before package installation
to make sure to get the latest versions and to make sure that one doesn't
get "file not found" errors from apt-get install due to old package
indices.
SSL certificates should be installed with the get_cert function. This function takes three arguments: the base path of the certificate to install, the name under which to install the certificate on local disk, and which Comodo root certificate to use (optional, assumes 2012 if not given). The base path should be in a certs directory under the relevant service directory, one up from the template directory, and the name of the cert should generally be the machine name or class. For example, the base path for the certificate for the production weblogin pool would be:
/afs/ir/service/auth/certs/weblogin
.cert.pgp and .key.pgp will be appended to find the
encrypted files, and the certificate will be installed in
/etc/apache2/ssl. The second argument should generally be the name
of the system or pool.
The build script should also run:
/afs/ir/service/jumpstart/scripts/build-iptables -i
iptables-restore < /etc/iptables/eth0
to set up the iptables rules for the system. If there are any additional iptables modules needed for the system, add them to the build-iptables command line.
At the end of the build script, it's best to print out a summary of what (if anything) has to be done next to produce a running system, and also a summary of all variables that the script prompted for or figured out for itself while running. This serves as a useful final sanity check on the build process.
Note that the build script should ideally be something one can re-run repeatedly.
Any configuration not installed by the general build system or by packages installed by the build script should be managed by bundle. This allows easy application of configuration changes, verification against the template, and diffs against the installed configuration.
The main bundle for a template should generally be named setup.b.
An exception is if the same template installs multiple systems, in which
case one can name the bundles after the systems they install, with or
without a setup- prefix as seems appropriate. Try to keep
everything together in one bundle as much as possible; if there are shared
files between all of the systems using that template (and if not, why are
they together in one template?), having a single setup.b that does
the shared work and then separate supplemental bundles for the different
services is the best approach. Try to avoid breaking particular
subsystems off into separate bundles unless required for some reason; it's
much easier to not have to figure out which bundle needs applying after
making a change.
All references to template files in the bundle must be relative (unless the bundle is, for some reason, pulling files from outside of that template -- that should almost never be the case, since such shared files should be installed as a Debian package). The bundle may assume that it will be run with a working directory of the top of the template area. Absolute paths should not be used since there may be multiple copies of the same template corresponding to different version numbers or work in progress on a new template.
It's harder to update systems if the main setup.b bundle requires
that variables be set on the command-line, so instead setup.b
should just do everything that can be done without requiring knowledge of
any variable settings and which is the same across all possible
installation tiers. Separate bundles should then be used for each
installation tier (or for the master and the slaves if the service is
broken down like that), and for the portion of the installation that
requires variables be set. That way, the build script can invoke all of
the bundles as appropriate, with the right variable settings, and for the
most normal case of an update that doesn't change anything that's
tier-specific or dependent on a variable, a systems administrator can just
run the setup.b bundle. Tier-specific bundles should be named
setup-tier.b, such as setup-prod.b,
setup-dev.b, etc. Try to avoid variables whenever possible, even
in supplemental bundles, since they make it very hard to apply selected
changes without re-running the build script.
If it can possibly helped, do not install files via bundle that contain variables and then filter the files after installation to replace those variables with something else. If you must do this, set younger=1 when installing those files in bundle so that they don't always show up as changed, and be aware of the side effects of that choice. A far better tactic is to avoid the need for this sort of customization. For example, it should never be necessary to substitute the system name or IP address into an Apache configuration. The need for the IP address can be avoided by using name-based virtual hosts with the wildcard cert for SSL, and the system name can be handled with tier-specific bundles or, if necessary, be written into a one-line Apache configuration fragment in the build script rather than installed via bundle.
We use newsyslog for log rotation and use filter-syslog for syslog analysis instead of the standard Debian packages. This is for a variety of reasons, mostly related to staff familiarity with newsyslog and its improved support for saving logs into AFS. At some point, we may modify logrotate to do what we need, but it's not a high priority project.
Normally, all system logs will go to /var/log/syslog rather than
using Debian's default log layout, so that everything can be found in the
same place. This log should be filtered with filter-syslog against a list
of known expected log entries, so that we're notified of anything unusual
in that log. This is set up automatically by the Debian build system; the
only thing that each template has to do is install additional filtering
rules in /etc/filter-syslog if necessary. At least two weeks of
old logs should be kept on local disk, but this log should not be saved
into AFS. The analysis should be sent to the local root account, which
should be forwarded to the appropriate alerts address for the service so
that the people maintaining the service will see the mail (this is set up
by the Debian build system).
If the service logs through syslog and is verbose or provides logs that we
want to keep for metric information, those entries should be directed to a
different file with a modified /etc/syslog.conf. That file should
be rotated into AFS and in some cases may also need to be analyzed.
newsyslog is configured to run all of the configuration fragments in
/etc/newsyslog.daily, /etc/newsyslog.weekly, and
/etc/newsyslog.monthly on those intervals, and other configuration
can be dropped into the appropriate directory to rotate application and
web server logs.
| Russ Allbery > Technical Notes > Debian | Debian Packaging Tools > |