Notes on the 1997 LISA conference

Norman Wilson
Computer Science
York University

Boilerplate

LISA is an annual system administration conference held by SAGE, the system administrators' special interest group within the USENIX Association. The 1997 conference was held on 26-31 October in San Diego CA, from 26-31 October 1997. LISA has a conventional format: several days of half- and full-day tutorials (at extra cost) followed by a three-day conference. The main conference has two parallel tracks: `refereed papers', usually three talks to a 90-minute session; `invited talks', 90 minutes each. There are also a collection of one- and two-hour informal birds-of-a-feather (BOF) sessions each evening on a variety of special topics.

Proceedings are published in two volumes: one containing the full text of each refereed paper, the other containing whatever the speaker provided (often just slides) for each invited talk. USENIX members may view the proceedings on the USENIX web server, http://www.usenix.org; anyone may read abstracts there, and hardcopy proceedings may be purchased by anyone.

What follows are my notes on the parts of the conference I found most interesting; beware that my tastes may not be the same as yours. None of the tutorials seemed attractive enough to justify the extra cost, so I attended just the regular conference. I spent most of my time in the invited talks, since one can always read the papers later, and few speakers can say much of interest in only half an hour.

Sysadmin-killer apps

Several purveyors of web cache engines described their wares at a BOF:

Some caches are http proxies, which means you must configure your browser to use the cache; others transparently intercept http requests, but that doesn't work unless your cache is also your network gateway. Transparent caches are easier to use and to enforce, but it's harder to put such smarts in a gateway these days, since gateways must be faster and faster. (If most of your network use is web-browsing, maybe it doesn't matter.) There seems to be some disagreement about whether a cache is meant to speed things up (your big GIF image arrives faster) or to make better use of bandwidth (your limited pipe to the Internet isn't clogged by web data); there are implications for how things work inside, and how much memory and disk is needed to make the cache work well.

Paul Vixie ran a BOF on dealing with spam (unsolicited junk e-mail). Vixie maintains a list of IP addresses of systems known to generate or forward spam, including those used unknowingly for forwarding spam. (Spammers do this to muddy their tracks, and to try to circumvent lists such as Vixie's.) The list may be queried through a name server; a full copy of the list may be fetched if you're willing to sign an indemnification agreement. The list is updated regularly, to add new offenders and to remove those who have reformed (e.g. innocent third parties who have placed restrictions on mail forwarding). Vixie is well known in the Internet community, and enough people are programming their routers to discard packets to or from addresses on the list that about 18% of the Internet is inaccessible to putative spammers.

Vixie seems to have two goals in mind. One is just to make life harder for spammers, and at the same time to goad system administrators into configuring their systems more carefully (e.g. disabling promiscuous mail forwarding, which is usually enabled by default but rarely needed). The other is to attract lawsuits to try to clarify the legality of spam, and perhaps to encourage legislation to forbid it.

Network monitoring

Marcus Ranum (Network Flight Recorder, Inc.) described the Network Flight Recorder (NFR), a software package that records and analyzes packets on live networks. The general idea is to record what happens during some interesting event like a security breach, and be able to go back and pick out the details later (e.g. when assembling evidence to lay criminal charges). The hard part is in picking out interesting network packets and presenting them in a comprehensible way; there is a special language to control things, and there are a handful of output filters. Although the system was designed for tracking breakins, it has proved general enough to be useful for more benign sorts of performance monitoring and network problem debugging. NFR isn't finished yet, but will soon be available: free for commercial use, paid license for money-making products. (Paper in proceedings.)

Peter Honeyman (University of Michigan) described the Packet Vault, a more ambitious system that continuously records all network traffic, storing every packet on a CD-ROM. As for NFR, the primary goal is to be able to check the records when a breakin is reported. Pains are taken to keep data private: all data on the CD are encrypted, using a different key for each communication session; keys are stored on the CD, but are themselves encrypted. An amusing implementation detail: separate computers are used to collect the network data and to record it on the CD. Both are Intel PCs; the collector runs OpenBSD for its good packet filter, the recorder Linux for its good CD-ROM writing code. The system is still being developed. (Slides in invited talk notes.)

Honeyman remarked that Vern Paxton will be describing a third network monitor, said to be a very good one, at the 1998 USENIX Security Symposium.

Honeyman's talk was paired with one given by Joe Saul (University of Michigan), who spoke of the legal issues surrounding monitoring, with the Packet Vault as a concrete example. It is important to remember that three parties have an interest in whether and how a network is monitored--the system administrator, the user community, and the institution--and that each of these has different needs and views. There are some U.S. laws that restrict monitoring of electronic communications (the ECPA explicitly, other laws implicitly by restricting access to student academic records, library patron records, and the like), but there is little case law to make things clear. Some laws may even require monitoring of certain communications; e.g. securities traders are often required to keep records of their transactions. Sometimes it may be better not to monitor, or at least to discard records after a short interval, lest someone be able to demand them under the Freedom of Information Act or in a civil discovery action. It is not clear whether encryption makes a difference; analogy with a locked filing cabinet suggests that you may be forced to surrender encryption keys with the encrypted document, though there is no case law yet. Monitoring may even offend laws regulating research on human subjects, or offend the U.S. First Amendment by chilling free speech. The point is not that one shouldn't record network packets, but that there are serious issues involved that are worth pondering, and that the hard parts haven't been tested in court yet. (Slides in invited talk notes.)

Network configuration war stories

Joel Avery (Nortel Technology) described how Nortel regularized its domain name structure. Nortel is made of several formerly semi-independent companies; each formerly had its own top-level domain name, and the different companies had different rules and conventions and fragmentary and mostly informal name server administration, including such oddities as internal fake root servers. There is now a single, centrally-administered name space for all of Nortel's 180000 hosts, with officially-supported name servers and corresponding support staff. There were both cultural and technical lessons learned in the transition:

(Slides in invited talk notes.)

Tom Limoncelli (Lucent Bell Laboratories) told a different sort of tale. The network in the southern New Jersey parts of Bell Labs had to be split to reflect the AT&T/Lucent corporate breakup. This meant that the physical network infrastructure had to be separated, so that AT&T and Lucent employees had non-overlapping networks, and that hosts had to be renumbered. All this had to be done without turning the network off for very long. The most interesting part of the story was an unexpected bottleneck: a carefully-thought-out plan to rework the network incrementally fell down because a critical link carried far more traffic than expected. As in the Nortel renaming, the change was also taken as an opportunity to clean up an existing messy network, though the gain was less visible to users. (Paper in proceedings.)

New network hardware

Alteon Networks, one of the vendors building and selling Gigabit Ethernet hardware, held a BOF to describe the technology and their implementation.

The technology: it's just Ethernet sped up by another factor of ten; even the minimum and maximum packet sizes are the same. The standard is still in draft; it will probably be out in the spring. The initial standard will define transmission over multi-mode and single-mode optical fiber, and over up to 25m of twinaxial cable, but not over unshielded twisted pair (UTP). The first three media use a signalling scheme borrowed from Fibre Channel, but apparently it isn't good enough for UTP. Twisted pair will be defined by a separate standard which is expected to take another 18-24 months to work out; the hope is that 100m of 4-pair Category 5 wire will be good enough, but nobody is sure yet.

The implementation details: 1000 megabits per second is fast enough to make a real difference in network interface card design. Many host I/O buses and memory subsystems can't promise to transfer a whole packet that fast, so it is mandatory for the card to buffer packets. Packets arrive ten times faster, so hosts have to handle much more interrupt load. (Alteon offers a cheap hack workaround: their interfaces will send and receive overlength packets up to 9000 bytes, so two cooperating systems can use longer packets and handle fewer interrupts.) It has even become useful for the network interface card to compute IP and TCP checksums, which is ironic since those checksums were never much good except for spotting software errors.

New network protocols

Bob Fink (Lawrence Berkeley National Laboratory) gave an update on IPv6.

The 6bone (experimental IPv6 test network) has about 130 sites in 30 countries now, with 40 core routers connected by tunnels over the existing IPv4 network. A religious war over routing protocols (IDRP versus BGP4) was recently settled pragmatically: three interoperable BGP4 implementations appeared, so that protocol won.

A new addressing scheme is being tried out. AGGR (aggregatable unicast addressing) divides the 128-bit IPv6 address into two pieces: a 64-bit interface ID, conventionally an IEEE EUI-64 ID, usually just the EUI-64 encoding for a 48-bit Ethernet MAC address; and a 61-bit routing tag with parts reserved for top-level (backbone) routing and for provider- and site-specific routing. (The other three bits are a constant identifying this as an AGGR address.) The idea is that identity and routing are separate, and that since EUI-64 IDs are already unique, one might as well just use them. Under AGGR, you don't have to apply for an IP address; it is automatically made from your network interface card's EUI-64 ID and your network service provider's routing tag. If you change providers, the tag changes, but the ID stays the same; if the prior provider is aware and willing, packets sent to your old address can even be forwarded for a while. A new DNS record type that distinguishes between ID and routing tag is in the works, to further ease changing providers.

Fink stressed that IPv6 is still a work in progress. It will be several years yet before it is clear whether the new protocol is successful; real migration won't start until then, and it will probably be a long time before IPv4 entirely disappears. It is important to encourage implementors to hide the protocol difference from high-level applications; network administrators will have to know that the protocol has changed, but ordinary users shouldn't. (Slides in invited talk notes.)

Programming

Scott Fritchie (Minnesota Regional Network) discovered the INN netnews transport system was severely limited by disk speed, and that much of the delay had to do with searching and updating directories and file attributes. He sped it up by by changing the way articles are stored: in effect, INN has its own simple log-structured file system kept in a set of circular buffers. The circular buffers also aid space management: when space runs out, the oldest articles are quietly overwritten (the circular buffer wraps around). The new INN uses appreciably more CPU time, but spends much less time waiting for disk seeks; throughput increases by a factor of three or four. There are complications in teaching news-reading programs about the new storage structure, but not outrageous ones. (Paper in proceedings.)

Mark Burgess (Oslo College) has made Oslo's system administration task scheduler more robust by adding two timers. The scheduler uses locks to prevent a new copy of a task from starting while an old one is still running; the new timers specify an `expireafter' interval after which the lock may be broken (the running task is is killed, and a new instance may be started), and an `ifelapsed' interval since a given task was started during which the same task may not be started again, even if the first instance has finished. `Expireafter' allows hung tasks to be caught, killed, and restarted; `ifexpired' allows the scheduler to make frequent retries without risk of overload. None of this seems to be rocket science; just good engineering. (Paper in proceedings.)

Henry Spencer (SP Systems) nominally spoke about Shuse, a system for maintaining user data (the passwd file and the like) in a large community with many users and many people. In fact he focused on the lessons learned from two years of developing, maintaining, and enhancing Shuse, and from its use in environments other than its birthplace:

(Paper in proceedings, but said to be an early draft; a better one will allegedly appear anon.)

Politics and management

There were two sessions about managing systems at educational institutions: a panel with representatives from Northeastern University (Computer Science), Penn State (Applied Research Lab, a centre for Navy-funded research), Harvard (EECS, i.e. computer science), and the University of Wisconsin (Computer Science), and a BOF organized by folks from the University of Waterloo (Mathematics, which includes computer science). Although nearly everyone was from a computer science department, which is only one of the sorts of environment one finds at a university, department sizes and focuses varied; e.g. Harvard EECS is much smaller.

Everyone does things differently: different funding models (funding is often assembled of dribs and drabs, but from different places); different staffing levels; different sorts of student involvement (some places let undergraduates do a lot of work, including some sensitive system management, while others are scared to trust students); different system support philosophies (some places have a help desk, some assert that computer scientists should be able to stand on their own feet); different styles of problem-tracking (large places tend to use formal trouble-ticket software, mostly a package called req; small places do things informally). Only a few general principles emerged:

Steve Johnson (Transmeta Corp) is a technical person who has also been a manager of technical people; he gave a technically-oriented view of management from both sides. The trouble with managing technical people is that it is much more likely that the manager and the employee will view the world in different ways, and communications will be scrambled. It doesn't help that technical people often take too technical a view of the world, and Johnson unwittingly demonstrated that in his talk. he used a model that classifies people into three types (thinkers, feelers, and act-ers), and applied it as if everyone can be placed in exactly one of those categories. People certainly are different, and the model helps make that clear; but neither are they easily classified. The talk offered much food for thought, but should be thought about, not taken as a script. (Slides in invited talk notes.)

Real life

Jim Reid (RTFM Ltd) gave a wonderful old-fashioned talk about the UNIX systems used by Eurocontrol, the organization that runs air traffic control in Europe.

UNIX systems maintain important near-real-time databases: IFPS, a central registry of all active flight plans; TACT, a four-dimensional model of all controlled air space in Europe, used to allocate slots for planes, watch for congestion, and decide how to reroute planes when necessary. These are scheduling and supplementary tools for the controllers; planes won't crash if TACT or IFPS stops working, but if they stay down for very long, controllers will be annoyed and airlines may lose a lot of money.

All hardware is duplicated: `we don't like to have one of anything.' There are two servers (one active, one hot standby); on each server, there are two copies of each database, on independent disks (and all the disks are dual-ported, so either server may access them). Two networks connect the servers to the controllers' workstations. All of this is duplicated at two sites, the main one in Brussels and a backup in Brettigny.

The software environment is minimal; stock vendor OS software is used, but stripped down as much as possible to simplify it. Network services in particular are stripped down: no NIS, no DNS, no NFS, no rdist, no e-mail. (These systems exist just to run TACT and IFPS, not as general-purpose computers.) This makes life simpler in some ways, harder in others (e.g. it's clumsy to add a user to all systems). The TACT and IFPS applications make up for all this simplicity: e.g. TACT is 300,000 lines of Ada.

There are two off-line UNIX environments: a pre-operational environment for testing changes and training new staff; a development environment, where systems are not as stripped-down. The pre-operational environment is fed live data, but the controllers never consult it (it is not a backup system). Both off-line worlds are completely separate from the operational systems; no shared networks, for example.

Public domain software is officially banned, mostly because of the desire to have an official support contact for everything in the systems. In practice, some PD software has been sneaked in; e.g. TACT uses Tcl/Tk, but one TACT engineer has been assigned to support it, and the system vendor has been talked into providing special support for a few packages like xntpd.

There is very strict change management for the operational systems. To make even the smallest change, one must submit a written request, explaining in detail what is to be changed, which systems will be affected, and when the work will be done; the request must be approved at a weekly committee meeting; if approved, the change may be performed only at the time specified in the request (if you get stuck in traffic and miss your time slot, you have to start all over). Many logs and audit trails are kept. Only one change may be specified per request, and only one may be approved at a time; if the same change must be made to six systems, six requests must be submitted, and the process will take (at least) six weeks. There are, of course, faster processes to deal with genuine emergencies (you don't have to submit a request to the committee to reboot a system that has crashed), but even these are carefully logged. All this sounds (and is) pretty tedious, but Reid pointed out that it also lends a useful discipline to tasks that are often done too hackishly, and that it's nice to have explicit written authority to do things.

There is a monthly one-hour maintenance window (early in the morning when there are few planes in the sky) when the roles of the live and backup server are swapped. System software may not be changed on the live server; the backup server is updated instead, it becomes the live server at the next swap, and the former live server (now the backup) is then updated. Swapping roles takes about 45 minutes (mostly a slow database startup), so there are really only 15 minutes for system work.

Operationally, all this is a success: end users have seen no important downtime and seem happy. Having redundant hardware has been important. The change management procedures are painful but useful. On the other hand, migrating to the next version of the vendor's OS (support for Ada and Oracle on the old one is drying up) is likely to be pretty painful.