ops-class.org | Introduction to File Systems

Technical Women

Figure 1. Grace Hopper

Today

Hierarchical file systems.
On-disk data structures.

$ cat announce.txt

Please do ASST3! The first part is not that hard.

UNIX File Interface

Establishing relationships:

open("foo"): "I’d like to use the file named foo."
close("foo"): "I’m finished with foo."

Reading and writing:

read(2): "I’d like to perform a read from file handle 2 at the current position."
write(2): "I’d like to perform a write from file handle 2 at the current position."

Positioning:

lseek(2, 100): "Please move my saved position for file handle 2 to position 100.

What’s Missing?

dup2 is really about manipulating a processes names for files and doesn’t have much to do with the file system.

Files Together: File Organization

Each file has to have a unique name. No problem!

Letter to Mom: LetterToMom.txt
Letter to Suzanna: LetterToSuzanna.txt
Letter to Chuchu: WoofWoofWagWag.txt
Another letter to Suzanna: AnotherLetterToSuzanna.txt
A third letter to Suzanna: LetterToSuzanna22Mar2012.txt

Flat name spaces were actually used by some early file systems but naming gets gross fast…

Hierarchical Implications

Big idea: don’t look at everything all at once. Allows users to store and examine related files together.

letters/Mom/Letter.txt
letters/Chuchu/WoofWoofWagWag.txt
letters/Suzanna/Letters/1.txt
letters/Suzanna/Letters/2.txt
letters/Suzanna/Letters/2.txt

Each file should be stored in one place. (Although we’ll discuss exceptions to this rule.)

Location Implications

Location requires navigation and relative navigation is useful, meaning that locations (directories) can include pointers to other locations (other directories).
Finally, location is only meaningful if it is tied to a files name, so hierarchical file systems implement name spaces, which require that a file’s name map to a single unique location within the file system.

Why Trees?

File systems usually require that files be organized into an acyclic graph with a single root, also known as a tree.

Why?

What is the name of the file in the diagram below?

OK, I picked a root. What is the name of the file now?

OK, I eliminated the cycles. What is the name of the file now?

Tree Naming

Trees produce a single canonical name for each file on the system as well as an infinite number of relative names:

Canonical name: /you/used/to/love/well
Relative name: /you/used/to/love/me/../well
Relative name: love/me/../../love/me/../well

Sestina, by Clara Shuttleworth

You used to love me well.
Well, you— me— used love to…
to… well… love. You used me.
Me, too, used… well… you. Love,
love me. You, too well used,
used love well. Me, Too. You!
You used to love me well.

— Clara Shuttleworth

Files: Questions?

File System Design Goals

Efficiently translate file names to file contents.
Allow files to move, grow, shrink and otherwise change.
Optimize access to single files.
Optimize access to multiple files, particularly related files.
Survive failures and maintain a consistent view of file names and contents.

Three of These Things Are All Like Each Other

The file systems we will discuss all support the following features:

Files, including some number of file attributes and permissions.
Names, organized into a hierarchical name space.

This is the file interface and feature set we are all used to. The difference lie in the implementations and what happens on disk.

Implementing Hierarchical File Systems

Broadly speaking, two types of disk blocks:

Data blocks: contain file data.
Index nodes (inodes): contain not file data.

One of These Things Is Not Like the Others

What makes file systems different?

On-disk layout. How does the file system decide where to put data and metadata blocks in order to optimize file access?
Data structures. What data structures does the file system use to translate names and locate file data?
Crash recovery. How does the file system prepare for and recover from crashes?

File System Challenges

File systems are really maintaining a large and complex data structure using disk blocks as storage.
This is hard because making changes potentially requires updating many different structures.

Example `write`

Say a process wants to write data to the end of a file. What does the file system have to do?

Find empty disk blocks to use and mark them as in use.
Associate those blocks with the file that is being written to.
Adjust the size of the file that is being written to.
Actually copy the data to the disk blocks being used.

From the perspective of a process all of these things need to happen synchronously.
In reality, many different asynchronous operations are involved touching many different disk blocks. (Each operation above modifies at least one disk block.)
This creates both a consistency and a performance problem!

What Happens On Disk?

Let’s consider the on-disk structures used by modern file systems.

Specifically we are going to investigate how file systems:

translate paths to file index nodes or inodes.
find data blocks associated with a given inode (file).
allocate and free inodes and data blocks.

We’re going to try and keep this at a relatively high level, but examples are used for concreteness and drawn from the Linux ext4 file system.

Sectors, Blocks, Extents

Sector: the smallest unit that the disk allows to be written, usually 256 bytes.
Block: the smallest unit that the file system actually writes, usually 4K bytes.
Extent: a set of contiguous blocks used to hold part of a file. Described by a start and end block.

Why would file systems not write chunks smaller than 4K?

Because contiguous writes are good for disk head scheduling and 4K is the page size which affects in-memory file caching.

Why would file systems want to write file data in even larger chunks?

Because contiguous writes are good for disk head scheduling and many files are larger than 4K!

`ext4` inodes

1 inode per file.
256 bytes, so 1 per sector or 16 per block.

Contains:

Location of file data blocks (contents).
Permissions including user, read/write/execute bits, etc.
Timestamps including creation (crtime), access (atime), content modification (mtime), attribute modification (ctime) and delete (dtime) times.
Named and located by number.

`ext4` inodes

Locating inodes

How does the system translate an inode number into an inode structure?

All inodes are created at format time at well-known locations.

What are the consequences of this?

inodes may not be located near file contents. ext4 creates multiple blocks of inodes within the drive to reduce seek times between inodes and data.
Fixed number of inodes for the file system. Can run out of inodes before we run out of data blocks! ext4 creates approximately one inode per 16 KB of data blocks, but this can be configured at format time.

Directories

Simply a special file the contents of which map inode numbers to relative names.

File System Names are inode Numbers, Directories Are Files

Using `debugfs`

Using `debugfs`

`open`: Path Name Translation

open("/etc/default/keyboard") must translate "/etc/default/keyboard" to an inode number.

Get inode number for root directory. This is usually a fixed agreed-on inode number, like 2.
Open the directory with inode number 2. Look for "etc". Find "etc" with inode number 393218.
Open the directory with inode number 393218. Look for "default". Find "default" with inode number 393247.
Open the directory with inode number 393247. Look for "keyboard". Find keyboard with inode number 394692.
Open the file with inode number 394692.

Next Time

File system structures.
File system operations.
Caching.

← →

Technical Women

Today

$ cat announce.txt

UNIX File Interface

What’s Missing?

Files Together: File Organization

Hierarchical Implications

Location Implications

Why Trees?

Tree Naming

Sestina, by Clara Shuttleworth

Files: Questions?

File System Design Goals

Three of These Things Are All Like Each Other

Implementing Hierarchical File Systems

One of These Things Is Not Like the Others

File System Challenges

Example write

What Happens On Disk?

Sectors, Blocks, Extents

ext4 inodes

ext4 inodes

Locating inodes

Directories

File System Names are inode Numbers, Directories Are Files

Using debugfs

Using debugfs

open: Path Name Translation

Next Time

Example `write`

`ext4` inodes

`ext4` inodes

Using `debugfs`

Using `debugfs`

`open`: Path Name Translation