Technical Women

005
Figure 1. Susan Landau

Today

  • How to read a research paper.

  • Redundant Arrays of Inexpensive Disks (RAID).

ASST3.2 Checkpoint

At this point:
  • If you have not started ASST3.2, you’re way behind.

  • If you have most things working and are debugging corner cases, you’re OK.

ASST3.2 is due Friday. Good luck finishing!

How To Read A Research Paper

  1. Don’t be afraid: you’re ready!

  2. Read good papers from the top conferences

  3. Skim to get the big picture, read for the details.

    • Most systems papers have one or two big ideas and a lot of implementation.

  4. Understand the places where papers get published

  5. Understand the kinds of papers

  6. Understand the parts of a paper

Places Where Papers Get Published

  1. Workshop papers: short (5-6 pages), usually containing only a provocative argument, system design, or very preliminary results.

  2. Conference papers: long (12-14 pages), enough space to describe and evaluate a complete novel system.

  3. Journal papers: longer (than conference papers), usually a published conference paper with extra material (frequently all of the unneccesary results that they removed to make the conference page limit).

    • My advice: read the conference paper.

Kinds of Papers

Clearly this is an incomplete list, but identifying the kind of paper can help you read it and understand its contributions.

  1. (Big) idea papers: presents a new approach to an existing problem or a new idea about how to build systems. Should convince you that the solution is (1) new, (2) works, and (3) is useful.

  2. Problem papers: presents a new problem and, usually, some ideas about how to solve it. Should convince you that the problem is (1), (2) matters and (3) that there are some ways to solve it.

  3. Data papers: present novel analysis or analysis of a novel data set that produces interesting insights. Should convince you that the results are useful to the design of future systems.

  4. New technology papers: describe some new hardware capability or device feature and why it’s interesting. Should convince you that the hardware can be used to build better systems.

  5. Wrong way papers: argue that the community is solving an existing problem incorrectly. Frequently these are workshop-length papers and eventually lead to idea papers. Should be able to convince you that everyone else is confused and misguided. (Good luck!)

We’ll read examples of some of these kinds of papers.

Parts of a Research Paper

Not all of these are included in every paper, and they are not always called the same thing. Many variants exist but these are the common elements.

The "why you should read this paper" parts:
  1. Abstract: an overview of the paper and its contribution. Great place to get the big picture.

  2. Introduction: an extension of the abstract. Usually contains:

    • A problem and solution statement (if appropriate)

    • Persuasive arguments to keep you reading

    • A preview of the interesting results ahead

    • Navigational information to guide you through the rest of the paper

  3. Motivation:

    • More arguments about why this is a problem, why people have been solving this problem the wrong way, or why this data is interesting—​depending on the type of paper.

The "what we did" parts:
  1. Design: presents the design of the system, usually at as high a level as possible.

  2. Implementation: presents details of the implementation and any interesting implementation challenges.

  3. Related work: put the work in context by comparing it to other systems. Important to establish novelty.

  4. Results: for data analysis papers, most of the paper is spent analyzing the data set that was collected and evaluated. This usually replaces an typical evaluation since there may not be a new idea.

The "did it work" part:
  1. Evaluation: measures things about the prototype system intended to demonstrate that it works.

Questions About Approaching A Research Paper?

Redundant Arrays of Inexpensive Disks

What kind of paper is this?
  • Big idea paper!

  • Spawned a commonly-used technology, an entire industry, and lots of similar approaches.

What is the big idea? (Hint: it has nothing to do with disks…​)
  • Several cheap things can be better than one expensive thing!

Where else do we see this idea applied today?
  • Multicore processors.

  • Google.

  • Crowdsourcing.

RAID: Problems

What is the problem that the RAID paper identifies?
  • Computer CPUs are getting faster…​

  • Computer memory is getting faster…​

  • Hard drives are not keeping up!

While we can imagine improvements in software file systems via buffering for near term I/O demands, we need innovation to avoid an I/O crisis.

What is the problem with the RAID solution?
  • Many cheap things fail much more frequently than one expensive thing.

  • So need a plan to handle failures.

RAID 1

RAID 1
RAID 1 (Mirroring)
  • Two duplicate disks.

  • Writes must go to both disks, reads can come from either.

  • Performance: better for reads.

  • Capacity: unchanged!

RAID 2

RAID2 arch
RAID 2
  • Byte-level striping, single error disk.

  • Hamming codes to detect failures and correct errors.

  • Most reads and writes require all disks.

  • Capacity: improved.

RAID 3

RAID 3
RAID 3
  • Only correct errors since disks can detect when they fail.

  • Byte-level striping, single parity disk.

  • Most reads and writes require all disks.

  • Capacity: improved.

RAID 4

RAID 4
RAID 4
  • Block-level striping, single parity disk.

  • Better distribution of reads between disks due to larger stripe size,

  • but all writes all must access the parity disk.

  • Performance: improved for reads.

RAID 5 (Full Victory)

RAID 5
RAID 5
  • Block-level striping

  • Multiple parity disks.

  • Better distribution of writes between disks.

  • Performance: improved for writes.

RAID 0 (Non-RAID)

RAID 0
  • Each disk stores half of the data.

  • No error correction or redundancy.

  • Performance: fantastic!

  • Capacity: fantastic!

  • Redundancy: ZERO!

RAID: Redundancy

  • RAID arrays can tolerate the failure of one (or more) disks.

  • Once a (or several) fail, the array is vulnerable to data loss.

  • An administrator must replace the disk(s) and then rebuild the array.

The RAID Aftermath

The RAID Aftermath

But perhaps our most enduring contribution is our experience demonstrating how a common intellectual framework and terminology, developed by researchers outside of the pressures and positioning of the marketplace, can allow engineers and technical developers to talk with each other, exchange ideas, and ultimately accelerate the development of what became a multibillion dollar industry sector.

— Randy Katz

RAID: Questions?

Next Week

  • Virtualization.