The Design and Implementation of the 4.4BSD Operating System

Marshall Kirk McKusick, Keith Bostic, Michael J. Karels & John S. Quarterman

avril 1996
Prix (à titre indicatif)
67 €
Nombre de pages
Programmeur ou administrateur système confirmé

This book describes the design and implementation of the 4.4BSD operating system, the latest release of what previously was known as the "Berkeley" version of UNIX. Because the 4.4BSD operating system is freely available in source and binary form, it is the system of choice for researchers, developers, and programmers. From its years of use in commercial environments, the robustness of the 4.4BSD operating system has made it the most common platform for providing network services on the Internet including routing, firewall protection, WWW services, and electronic mail handling. As key participants in the development of 4.4BSD, the authors provide comprehensive and up-to-date technical information needed by system programmers and application programmers.


This book is an extensive revision of the first authoritative and full-length description of the design and implementation of the research versions of the UNIX system developed at the University of California at Berkeley. Most detail is given about 4.4BSD, which incorporates the improvements of the previous Berkeley versions. Although 4.4BSD includes nearly 500 utility programs in addition to the kernel, this book concentrates almost exclusively on the kernel.

The UNIX System

The UNIX system runs on computers ranging from personal home systems to the largest supercomputers. It is the operating system of choice for most multiprocessor, graphics, and vector-processing systems, and is widely used for its original purpose of timesharing. It is the most common platform for providing network services (from FTP to WWW) on the Internet. It is the most portable operating system ever developed. This portability is due partly to its implementation language, C [Kernighan & Ritchie, 1978] (which is itself one of the most widely ported languages), and partly to the elegant design of the system. Many of the system's features are imitated in other systems [O'Dell, 1987].

Since its inception in 1969 [Ritchie & Thompson, 1978], the UNIX system has developed in a number of divergent and rejoining streams. The original developers continued to advance the state of the art with their Ninth and Tenth Edition UNIX inside AT&T Bell Laboratories, and then their Plan 9 successor to UNIX. Meanwhile, AT&T licensed UNIX System V as a product, before selling it to Novell. Novell passed the UNIX trademark to X/OPEN and sold the source code and distribution rights to Santa Cruz Operation (SCO). Both System V and Ninth Edition UNIX were strongly influenced by the Berkeley Software Distributions produced by the Computer Systems Research Group (CSRG) of the University of California at Berkeley.

Berkeley Software Distributions

These Berkeley systems have introduced several useful programs and facilities to the UNIX community:

  • 2BSD (the Berkeley PDP-11 system): the text editor vi.
  • 3BSD (the first Berkeley VAX system): demand-paged virtual-memory support.
  • 4.0BSD: performance improvements.
  • 4.1BSD: job control, autoconfiguration, and long C identifiers.
  • 4.2BSD and 4.3BSD: reliable signals; a fast filesystem; improved networking, including a reference implementation of TCP/IP; sophisticated interprocess-communication (IPC) primitives; and more performance improvements.
  • 4.4BSD: a new virtual memory system; a stackable and extensible vnode interface; a network filesystem (NFS); a log-structured filesystem, numerous filesystem types, including loopback, union, and uid/gid mapping layers; an ISO9660 filesystem (e.g., CD-ROM); ISO networking protocols; support for 68K, SPARC, MIPS, and PC architectures; POSIX support, including termios, sessions, and most utilities; multiple IP addresses per interface; disk labels; and improved booting.

4.2BSD, 4.3BSD, and 4.4BSD are the bases for the UNIX systems of many vendors, and are used internally by the development groups of many other vendors. Many of these developments have also been incorporated by System V, or hav e been added by vendors whose products are otherwise based on System V.

The implementation of the TCP/IP networking protocol suite in 4.2BSD and 4.3BSD, and the availability of those systems, explain why the TCP/IP networking protocol suite is implemented so widely throughout the world. Numerous vendors have adapted the Berkeley networking implementations, whether their base system is 4.2BSD, 4.3BSD, 4.4BSD, System V, or even Digital Equipment Corporation's VMS or Microsoft's Winsock interface in Windows '95 and Windows/NT.

4BSD has also been a strong influence on the POSIX (IEEE Std 1003.1) operating-system interface standard, and on related standards. Several features--such as reliable signals, job control, multiple access groups per process, and the routines for directory operations--have been adapted from 4.3BSD for POSIX.

Material Covered in this Book

This book is about the internal structure of 4.4BSD [Quarterman et al, 1985], and about the concepts, data structures, and algorithms used in implementing 4.4BSD's system facilities. Its level of detail is similar to that of Bach's book about UNIX System V [Bach, 1986]; however, this text focuses on the facilities, data structures, and algorithms used in the Berkeley variant of the UNIX operating system. The book covers 4.4BSD from the system-call level down--from the interface to the kernel to the hardware itself. The kernel includes system facilities, such as process management, virtual memory, the I/O system, filesystems, the socket IPC mechanism, and network protocol implementations. Material above the system-call level--such as libraries, shells, commands, programming languages, and other user interfaces--is excluded, except for some material related to the terminal interface and to system startup. Like Organick's book about Multics [Organick, 1975], this book is an in-depth study of a contemporary operating system.

Where particular hardware is relevant, the book refers to the Hewlett-Packard HP300 (Motorola 68000-based) architecture. Because 4.4BSD was developed on the HP300, that is the architecture with the most complete support, so it provides a convenient point of reference.

Readers who will benefit from this book include operating-system implementors, system programmers, UNIX application developers, administrators, and curious users. The book can be read as a companion to the source code of the system, falling as it does between the manual [CSRG, 1994] and the code in detail of treatment. But this book is specifically neither a UNIX programming manual nor a user tutorial (for a tutorial, see [Libes & Ressler, 1988]). Familiarity with the use of some version of the UNIX system (see, for example, [Kernighan & Pike, 1984]), and with the C programming language (see, for example, [Kernighan & Ritchie, 1988]) would be extremely useful.

Use in Courses on Operating Systems This book is suitable for use as a reference text to provide background for a primary textbook in a second-level course on operating systems. It is not intended for use as an introductory operating-system textbook; the reader should have already encountered terminology such as memory management, process scheduling, and I/O systems [Silberschatz & Galvin, 1994]. Familiarity with the concepts of network protocols [Tanenbaum, 1988; Stallings, 1993; Schwartz, 1987] will be useful for understanding some of the later chapters.

Exercises are provided at the end of each chapter. The exercises are graded into three categories indicated by zero, one, or two asterisks. The answers to exercises that carry no asterisks can be found in the text. Exercises with a single asterisk require a step of reasoning or intuition beyond a concept presented in the text. Exercises with two asterisks present major design projects or open research questions.


This text discusses both philosophical and design issues, as well as details of the actual implementation. Often, the discussion starts at the system-call level and descends into the kernel. Tables and figures are used to clarify data structures and control flow. Pseudocode similar to the C language is used to display algorithms. Boldface font identifies program names and filesystem pathnames. Italics font introduces terms that appear in the glossary and identifies the names of system calls, variables, routines, and structure names. Routine names (other than system calls) are further identified by the name followed by a pair of parenthesis (e.g., malloc() is the name of a routine, whereas argv is the name of a variable).

The book is divided into five parts, organized as follows:

Part 1, Overview
Three introductory chapters provide the context for the complete operating system and for the rest of the book. Chapter 1, History and Goals, sketches the historical development of the system, emphasizing the system's research orientation. Chapter 2, Design Overview of 4.4BSD, describes the services offered by the system, and outlines the internal organization of the kernel. It also discusses the design decisions that were made as the system was developed. Sections 2.3 through 2.14 in Chapter 2 give an overview of their corresponding chapter. Chapter 3, Kernel Services, explains how system calls are done, and describes in detail several of the basic services of the kernel.

Part 2, Processes
The first chapter in this part--Chapter 4, Process Management--lays the foundation for later chapters by describing the structure of a process, the algorithms used for scheduling the execution of processes, and the synchronization mechanisms used by the system to ensure consistent access to kernel-resident data structures. In Chapter 5, Memory Management, the virtual-memory!=management system is discussed in detail.

Part 3, I/O System
First, Chapter 6, I/O System Overview, explains the system interface to I/O and describes the structure of the facilities that support this interface. Following this introduction are four chapters that give the details of the main parts of the I/O system. Chapter 7, Local Filesystems, details the data structures and algorithms that implement filesystems as seen by application programs. Chapter 8, Local Filestores, describes how local filesystems are interfaced with local media. Chapter 9, The Network Filesystem, explains the network filesystem from both the server and client perspectives. Chapter 10, Terminal Handling, discusses support for character terminals, and provides a description of a character-oriented device driver.

Part 4, Interprocess Communication
Chapter 11, Interprocess Communication, describes the mechanism for providing communication between related or unrelated processes. Chapters 12 and 13, Network Communication and Network Protocols, are closely related, as the facilities explained in the former are implemented by specific protocols, such as the TCP/IP protocol suite, explained in the latter.

Part 5, System Operation
Chapter 14, System Startup, discusses system startup, shutdown, and configuration, and explains system initialization at the process level, from kernel initialization to user login.

The book is intended to be read in the order that the chapters are presented, but the parts other than Part 1 are independent of one another and can be read separately. Chapter 14 should be read after all the others, but knowledgeable readers may find it useful independently.

At the end of the book are a Glossary with brief definitions of major terms and an Index. Each chapter contains a Reference section with citations of related material.

À propos des auteurs
Marshall Kirk McKusick writes books and articles, consults, and teaches classes on UNIX- and BSD-related subjects. While at the University of California at Berkeley, he implemented the 4.2BSD fast file system, and was the Research Computer Scientist at the Berkeley Computer Systems Research Group (CSRG) overseeing the development and release of 4.3BSD and 4.4BSD. His particular areas of interest are the virtual-memory system and the filesystem. One day, he hopes to see them merged seamlessly. He earned his undergraduate degree in Electrical Engineering from Cornell University, and did his graduate work at the University of California at Berkeley, where he received Masters degrees in Computer Science and Business Administration, and a doctoral degree in Computer Science. He is a past president of the Usenix Association, and is a member of ACM and IEEE. In his spare time, he enjoys swimming, scuba diving, and wine collecting. The wine is stored in a specially constructed wine cellar (accessible from the net using the command "telnet McKusick.COM 451") in the basement of the house that he shares with Eric Allman, his domestic partner of 17-and-some-odd years.

Keith Bostic is a member of the technical staff at Berkeley Software Design, Inc. He spent 8 years as a member of the CSRG, overseeing the development of over 400 freely redistributable UNIX-compatible utilities, and is the recipient of the 1991 Distinguished Achievement Award from the University of California, Berkeley, for his work to make 4.4BSD freely redistributable. Concurrently, he was the principle architect of the 2.10BSD release of the Berkeley Software Distribution for PDP-11s, and the coauthor of the Berkeley Log Structured Filesystem and the Berkeley database package (DB). He is also the author of the widely used vi implementation, nvi. He received his undergraduate degree in Statistics and his Masters degree in Electrical Engineering from George Washington University. He is a member of the ACM, the IEEE, and several POSIX working groups. In his spare time, he enjoys scuba diving in the South Pacific, mountain biking, and working on a tunnel into Kirk and Eric's specially constructed wine cellar. He lives in Massachusetts with his wife, Margo Seltzer, and their cats.

Michael J. Karels is the System Architect and Vice President of Engineering at Berkeley Software Design, Inc. He spent 8 years as the Principal Programmer of the CSRG at the University of California, Berkeley as the system architect for 4.3BSD. Karels received his Bachelor's degree in Microbiology from the University of Notre Dame. While a graduate student in Molecular Biology at the University of California, he was the principal developer of the 2.9BSD UNIX release of the Berkeley Software Distribution for the PDP-11. He is a member of the ACM, the IEEE, and several POSIX working groups. He lives with his wife Teri Karels in the backwoods of Minnesota.

John S. Quarterman is Senior Technical Partner at Texas Internet Consulting, which consults in networks and open systems with particular emphasis on TCP/IP networks, UNIX systems, and standards.He is the author of The Matrix: Computer Networks and Conferencing Systems Worldwide (Digital Press, 1990), and is a coauthor of UNIX, POSIX, and Open Systems: The Open Standards Puzzle (1993), Practical Internetworking with TCP/IP and UNIX (1993), The Internet Connection: System Connectivity and Configuration (1994), and The E-Mail Companion: Communicating Effectively via the Internet and Other Global Networks (1994), all published by Addison-Wesley. He is editor of Matrix News, a monthly newsletter about issues that cross network, geographic, and political boundaries, and of Matrix Maps Quarterly; both are published by Matrix Information and Directory Services, Inc. (MIDS) of Austin, Texas. He is a partner in Zilker Internet Park, which provides Internet access from Austin. He and his wife, Gretchen Quarterman, split their time among his home in Austin, hers in Buffalo, New York, and various other locations.

Chapitre 2 "Design Overview of 4.4BSD" [HTML : 121Ko]
Sujets (ou table des matières)


  • 1. History and Goals.
    History of the UNIX System.
    Research UNIX.
    AT&T UNIX System III and System V.
    Other Organizations.
    Berkeley Software Distributions.
    UNIX in the World.
    BSD and Other Systems.
    The Influence of the User Community.
    Design Goals of 4BSD.
    4.2BSD Design Goals.
    4.3BSD Design Goals.
    4.4BSD Design Goals.
    Release Engineering.
  • 2. Design Overview of 4.4BSD.
    4.4BSD Facilities and the Kernel.
    The Kernel.
    Kernel Organization.
    Kernel Services.
    Process Management.
    Process Groups and Sessions.
    Memory Management.
    BSD Memory-Management Design Decisions.
    Memory Management Inside the Kernel.
    I/O System.
    Descriptors and I/O.
    Descriptor Management.
    Socket IPC.
    Scatter/Gather I/O.
    Multiple Filesystem Support.
    Network Filesystem.
    Interprocess Communication.
    Network Communication.
    Network Implementation.
    System Operation.
  • 3.Kernel Services.
    Kernel Organization.
    System Processes.
    System Entry.
    Run-Time Organization.
    Entry to the Kernel.
    Return from the Kernel.
    System Calls.
    Result Handling.
    Returning from a System Call.
    Traps and Interrupts.
    I/O Device Interrupts.
    Software Interrupts.
    Clock Interrupts.
    Statistics and Process Scheduling.
    Memory-Management Services.
    Timing Services.
    Real Time.
    Adjustment of the Time.
    External Representation.
    Interval Time.
    User, Group, and Other Identifiers.
    Host Identifiers.
    Process Groups and Sessions.
    Resource Services.
    Process Priorities.
    Resource Utilization.
    Resource Limits.
    Filesystem Quotas.
    System-Operation Services.


  • 4. Process Management.
    Introduction to Process Management.
    Process State.
    The Process Structure.
    The User Structure.
    Context Switching.
    Process State.
    Low-Level Context Switching.
    Voluntary Context Switching.
    Process Scheduling.
    Calculations of Process Priority.
    Process-Priority Routines.
    Process Run Queues and Context Switching.
    Process Creation.
    Process Termination.
    Comparison with POSIX Signals.
    Posting of a Signal.
    Delivering a Signal.
    Process Groups and Sessions.
    Job Control.
    Process Debugging.
  • 5. Memory Management.
    Processes and Memory .
    Replacement Algorithms.
    Working-Set Model.
    Advantages of Virtual Memory.
    Hardware Requirements for Virtual Memory.
    Overview of the 4.4BSD Virtual-Memory System.
    Kernel Memory Management.
    Kernel Maps and Submaps.
    Kernel Address-Space Allocation.
    Kernel Malloc.
    Per-Process Resources.
    4.4BSD Process Virtual-Address Space.
    Page-Fault Dispatch.
    Mapping to Objects.
    Objects to Pages.
    Shared Memory.
    Mmap Model.
    Shared Mapping.
    Private Mapping.
    Collapsing of Shadow Chains.
    Private Snapshots.
    Creation of a New Process.
    Reserving Kernel Resources.
    Duplication of the User Address Space.
    Creation of a New Process Without Copying.
    Execution of a File.
    Process Manipulation of Its Address Space.
    Change of Process Size.
    File Mapping.
    Change of Protection.
    Termination of a Process.
    The Pager Interface.
    Vnode Pager.
    Device Pager.
    Swap Pager.
    Page Replacement.
    Paging Parameters.
    The Pageout Daemon.
    The Swap-In Process.
    Portability .
    The Role of the pmap Module.
    Initialization and Startup.
    Mapping Allocation and Deallocation.
    Change of Access and Wiring Attributes for Mappings.
    Management of Page-Usage Information.
    Initialization of Physical Pages.
    Management of Internal Data Structures.

    III. I/O System.

  • 6. I/O System Overview.
    I/O Mapping from User to Device.
    Device Drivers.
    I/O Queueing.
    Interrupt Handling.
    Block Devices.
    Entry Points for Block-Device Drivers.
    Sorting of Disk I/O Requests.
    Disk Labels.
    Character Devices.
    Raw Devices and Physical I/O.
    Character-Oriented Devices.
    Entry Points for Character-Device Drivers.
    Descriptor Management and Services.
    Open File Entries.
    Management of Descriptors.
    File-Descriptor Locking.
    Multiplexing I/O on Descriptors.
    Implementation of Select.
    Movement of Data Inside the Kernel.
    The Virtual-Filesystem Interface.
    Contents of a Vnode.
    Vnode Operations.
    Pathname Translation.
    Exported Filesystem Services.
    Filesystem-Independent Services.
    The Name Cache.
    Buffer Management.
    Implementation of Buffer Management.
    Stackable Filesystems.
    Simple Filesystem Layers.
    The Union Mount Filesystem.
    Other Filesystems.
  • 7. Local Filesystems.
    Hierarchical Filesystem Management.
    Structure of an Inode.
    Inode Management.
    Finding of Names in Directories.
    Pathname Translation.
    File Locking.
    Other Filesystem Semantics.
    Large File Sizes.
    File Flags.
  • 8. Local Filestores.
    Overview of the Filestore.
    The Berkeley Fast Filesystem.
    Organization of the Berkeley Fast Filesystem.
    Optimization of Storage Utilization.
    Reading and Writing to a File.
    Filesystem Parameterization.
    Layout Policies.
    Allocation Mechanisms.
    Block Clustering.
    Synchronous Operations.
    The Log-Structured Filesystem.
    Organization of the Log-Structured Filesystem.
    Index File.
    Reading of the Log.
    Writing to the Log.
    Block Accounting.
    The Buffer Cache.
    Directory Operations.
    Creation of a File.
    Reading and Writing to a File.
    Filesystem Cleaning.
    Filesystem Parameterization.
    Filesystem-Crash Recovery.
    The Memory-Based Filesystem.
    Organization of the Memory-Based Filesystem.
    Filesystem Performance.
    Future Work.
  • 9. The Network Filesystem.
    History and Overview.
    NFS Structure and Operation.
    The NFS Protocol.
    The 4.4BSD NFS Implementation.
    Client-Server Interactions.
    RPC Transport Issues.
    Security Issues.
    Techniques for Improving Performance.
    Crash Recovery.
  • 10. Terminal Handling.
    Terminal-Processing Modes.
    Line Disciplines.
    User Interface.
    The tty Structure.
    Process Groups, Sessions, and Terminal Control.
    RS-232 and Modem Control.
    Terminal Operations.
    Output Line Discipline.
    Output Top Half.
    Output Bottom Half.
    Input Bottom Half.
    Input Top Half.
    The stop Routine.
    The ioctl Routine.
    Modem Transitions.
    Closing of Terminal Devices.
    Other Line Disciplines.
    Serial Line IP Discipline.
    Graphics Tablet Discipline.


  • 11. Interprocess Communication.
    Interprocess-Communication Model.
    Use of Sockets.
    Implementation Structure and Overview.
    Memory Management.
    Storage-Management Algorithms.
    Mbuf Utility Routines.
    Data Structures.
    Communication Domains.
    Socket Addresses.
    Connection Setup.
    Data Transfer.
    Transmitting Data.
    Receiving Data.
    Passing Access Rights.
    Passing Access Rights in the Local Domain.
    Socket Shutdown.
  • 12. Network Communication.
    Internal Structure.
    Data Flow.
    Communication Protocols.
    Network Interfaces.
    Socket-to-Protocol Interface.
    Protocol User-Request Routine.
    Internal Requests.
    Protocol Control-Output Routine.
    Protocol-Protocol Interface.
    Interface between Protocol and Network Interface.
    Packet Transmission.
    Packet Reception.
    Kernel Routing Tables.
    Routing Lookup.
    Routing Redirects.
    Routing-Table Interface.
    User-Level Routing Policies.
    User-Level Routing Interface: Routing Socket.
    Buffering and Congestion Control.
    Protocol Buffering Policies.
    Queue Limiting.
    Raw Sockets.
    Control Blocks.
    Input Processing.
    Output Processing.
    Additional Network-Subsystem Topics.
    Out-of-Band Data.
    Address Resolution Protocol.
  • 13. Network Protocols.
    Internet Network Protocols.
    Internet Addresses.
    Broadcast Addresses.
    Internet Multicast.
    Internet Ports and Associations.
    Protocol Control Blocks.
    User Datagram Protocol (UDP).
    Control Operations.
    Internet Protocol (IP).
    Transmission Control Protocol (TCP).
    TCP Connection States.
    Sequence Variables.
    TCP Algorithms.
    Estimation of Round-Trip Time.
    Connection Establishment.
    Connection Shutdown.
    TCP Input Processing.
    TCP Output Processing.
    Sending of Data.
    Avoidance of the Silly-Window Syndrome.
    Avoidance of Small Packets.
    Delayed Acknowledgments and Window Updates.
    Retransmit State.
    Slow Start.
    Source-Quench Processing.
    Buffer and Window Sizing.
    Avoidance of Congestion with Slow Start.
    Fast Retransmission.
    Internet Control Message Protocol (ICMP).
    OSI Implementation Issues.
    Summary of Networking and Interprocess Communication.
    Creation of a Communication Channel.
    Sending and Receiving of Data.
    Termination of Data Transmission or Reception.


  • 14. System Startup.
    The boot Program.
    Kernel Initialization.
    Assembly-Language Startup.
    Machine-Dependent Initialization.
    Message Buffer.
    System Data Structures.
    Device Probing.
    Device Attachment.
    New Autoconfiguration Data Structures.
    New Autoconfiguration Functions.
    Device Naming.
    Machine-Independent Initialization.
    User-Level Initialization.
    System-Startup Topics.
    Kernel Configuration.
    System Shutdown and Autoreboot.
    System Debugging.
    Passage of Information To and From the Kernel.
  • Glossary.
Nous envoyer un courrier électronique