Technical Women

052
Figure 1. Chieko Asakawa

Today

  • Just files.

  • File metadata.

  • UNIX file system semantics.

  • Hierarchical file systems.

ASST3.1 Checkpoint

At this point:
  • If you have not started ASST3.1, you’re behind.

  • If your coremap allocator is mostly working, you’re OK.

ASST3.1 is due Friday at 5PM. No extensions.

Disks: Questions?

Disk Parts

  • Platter: a circular flat disk on which magnetic data is stored. Constructed of a rigid non-magnetic material coated with a very thin layer of magnetic material. Can have data written on both sides.

  • Spindle: the drive shaft on which multiple platters are mounted and spun between 4200–15,000 RPM.

  • Head: the actuator that reads and writes data onto the magnetic surface of the platters while rotating at tens of nanometers over the platter surface.

Disk Locations

  • Track: think of a lane on a race track running around the platter.

  • Sector: resembles a slice of pie cut out of a single platter.

  • Cylinder: imagine the intersection between a cylinder and the set of platters. Composed of a set of vertically-aligned tracks on all platters.

Spinning Disks Are Different

Spinning disks are fundamentally different from the other system components we have discussed so far.

  • Difference in kind: disks move!

  • Difference in degree: disks are slow!

  • Difference in integration: disks are devices, and less tightly-coupled to the abstractions built on top of them.

Sources of Slowness

Reading or writing from the disk requires a series of steps, each of which is a potential source of latency.

  1. Issue the command. The operating system has to tell the device what to do, the command has to cross the device interconnect (IDE, SATA, etc.), and the drive has to select which head to use.

  2. Seek time. The drive has to move the heads to the appropriate track.

  3. Settle time. The heads have to stabilize on the (very narrow) track.

  4. Rotation time. The platters have to rotate to the position where the data is located.

  5. Transfer time. The data has to be read and transmitted back across the interconnect into system memory.

The I/O Crisis

Disks are getting bigger but not faster.

Disks: Questions?

File Systems To The Rescue

Low-level disk interface is messy and very limited:
  • Requires reading and writing entire 512-byte blocks.

  • No notion of files, directories, etc.

File systems take this limited block-level device and create the file abstraction almost entirely in software.

  • Compared to the CPU and memory that we have studied previously more of the file abstraction is implemented in software.

  • This explains the plethora of available file systems: ext2,3 and 4, reiserfs, NTFS, jfs, lfs, xfs, etc.

  • This is probably why many systems people have a soft spot for file systems even if they seem a bit outdated these days.

What About Flash?

No moving parts! Great! We can eliminate a lot of the complexity of modern file systems. Yippee!

Except that…​
  • Have to erase an entire large chunk before we can rewrite it.

  • And it wears out faster that magnetic drives, and can wear unevenly if we are not careful.

Sigh…​ things are sounding complicated again.

Clarifying the Concept of a File

Most of us are familiar with files, but the semantics of file have a variety of sources what are worth separating:

  • Just a file: the minimum it takes to be a file.

  • About a file: what other useful information do most file systems typically store about files?

  • Files and processes: what additional properties does the UNIX file system interface introduce to allow user processes to manipulate files?

  • Files together: given multiple files, how do we organize them in a useful way?

Just a File: The Minimum

What does a file have to do to be useful?
  • Reliably store data. (Duh.)

  • Be located! Usually via a name.

image

Basic File Expectations

At minimum we expect that
  • file contents should not change unexpectedly.

  • file contents should change when requested and as requested.

These requirements seem simple but many file systems do not meet them!

03 Mar 2012: Bug Report–Serious file system corruption and data loss caused to other NTFS drives by Windows 8 CP

Basic File Expectations

Failures such as power outages and sudden ejects make file system design difficult and exposed tradeoffs between durability and performance.

  • Memory: fast, transient. Disk: slow, stable.

About a File: File Metadata

What else might we want to know about a file?
  • When was the file created, last accessed, or last modified?

  • Who is allowed to what to the file—read, write, rename, change other attributes, etc.

  • Other file attributes?

Where to Store File Metadata?

An MP3 file contains audio data. But it also has attributes such as:

  • title

  • artist

  • date

Where should these attributes be stored?
  • In the file itself.

  • In another file.

  • In attributes associated with the file and maintained by the file system.

Where to Store File Metadata?

In the file:
  • Example: MP3 ID3 tag, a data container stored within an MP3 file in a prescribed format.

  • Pro: travels along with the file from computer to computer.

  • Con: requires all programs that access the file to understand the format of the embedded metadata.

In another file:
  • Example: iTunes database.

  • Pro: can be maintained separately by each application.

  • Con: does not move with the file and the separate file must be kept in sync when the files it stores information about change.

In attributes:
  • Example: attributes have been supported by a variety of file systems including prominently by BFS, the BeOS file system.

  • Pro: maintained by the file system so can be queried and queried quickly.

  • Con: does not move with the file, and creates compatibility problems with other file systems.

Processes and Files: UNIX Semantics

Many file systems provide an interface for establishing a relationship between a process and a file.

  • "I have the file open. I am using this file."

  • "I am finished using the file and will close it now."

Why does the file system want to establish these process-file relationships?
  • Can improve performance if the OS knows what files are actively being used by using caching or read-ahead.

  • The file system may provide guarantees to processes based on this relationship, such as exclusive access.

  • Some file systems, particularly networked file systems, don’t even bother to establish these relationships. (What happens if a networked client opens a file exclusively and then dies?)

File Location: UNIX Semantics

UNIX semantics simplify reads and writes to files by storing the file position for processes.

  • This is a convenience, not a requirement: processes could be required to provide a position with every read and write.

UNIX File Interface

Establishing relationships:
  • open("foo"): "I’d like to use the file named foo."

  • close("foo"): "I’m finished with foo."

Reading and writing:
  • read(2): "I’d like to perform a read from file handle 2 at the current position."

  • write(2): "I’d like to perform a write from file handle 2 at the current position."

Positioning:
  • lseek(2, 100): "Please move my saved position for file handle 2 to position 100.

Midterm Review

Next Time

  • File system data structures and challenges.

  • Example path resolution.

  • How files grow and shrink.