10 - File Management

Summary

File structure Directory structure File information tables Buffers
Byte array Single-level Per-process file descriptors Read buffer
Fixed length records Tree-structured System-wide open-files Write buffer
Variable length records DAG System-wide V-nodes  
  General graph    

File system

Physical memory is volatile, so external/secondary storage is needed to store persistent info.

Where to find information (storage media) on a device when moved?

Criteria

Desired file structure

  1. Metadata is contained within the file itself
  2. The files in a folder may not be in a contiguous region (similar fragmentation problems)
  3. Data of a file is in contiguous region

Comparison with memory management

  Memory Management File System Management
Storage RAM Disk
Access speed Constant Variable disk I/O time
Unit of addressing Physical memory address Disk sector
Usage Address space of process, implicit during process execution Non-volatile data, explicit access
Organization Paging/Segmentation ext* (Linux), FAT/NTFS (Windows), HFS* (MacOS)

Every time OS has to access a different file system, the file management has to be replaced.

File

Definition: A logical unit of information created by a process

Linux vs Windows FS

  Linux Windows
File names Case-sensitive Case-insensitive
Special characters Allowed Not allowed
File type Magic number Determined by file extension
Directory entry Variable (depends on file name length) Fixed (32/64-bit)

File type: a description of the information contained in the file (based on a certain format etc.)

File extension: part of the file name after the dot, used in Windows etc. to identify file type

Magic number: set of k-byte (usually $k=2$​) identifiers at start of file.

Access Control List (ACL):

File system architecture

File data structure options:

Access methods:

The operation of open has an important role to keep references to the all the pointers necessary to access a physical disk, removing the need to access the full filepath every time. It also tracks the file offset.

File operations:

Operation Description
Create New file, no data
Open Prepare necessary information for later operations
Read Read data from current position
Write Write data to current position
Seek Move position to a new location
Truncate Remove data from current position to end of file

File information (3 tables):

The three file information tables

Op. type: If a file is open for reading only but has an attempt to write to it, it will not be allowed.

  1. Case 1: Same file, 2 processes, independent offsets.
  2. Case 2: Same file, 1 parent and 1 child process (just forked, no mod)
Different FD table Same FD table

Directories

Purpose: Provide a logical grouping (user view), keep track of files (actual system usage)

Possible data structures:

Directory read write access: This does not extend to the contents of the directory!

Disk scheduling

In HDD, two fundamental movements to access a particular disk location: rotate disk and move head. There are more sectors than data blocks (clusters).

Traditional Disk scheduling algorithms to minimize latency necessary:

SCAN

Newer Disk scheduling algorithms

Buffering

Motivation: File operations are inherently expensive.

  1. Each file operation requires a syscall requiring execution mode (user -> kernel) change
  2. High disk access latency

Maintains intermediate storage of information to be read/written into a file.

Typically implemented with circular array (see stdin, stdout, stderr).

Other benefits include: error checking, packing/unpacking of datatypes.

Buffered read/write operations read/write to the buffered memory region until they are full or flushed (via system call).

// Buffered read example:
bufread(file, outputArr, reqSize):
	if buf.availCount < reqSize:
		read(file, buf, buf.size - buf.availCount)
		buf.availCount = buf.size
	memcpy(buf, outputArr, reqSize)
	advance buf pointer by reqSize
	buf.availCount -= reqSize