This project involves creating a Distributed File System (DFS), a server-side application that enables multiple clients to access and manipulate shared files concurrently over a network. The distributed file system provides operations for reading, writing, and deleting files using an HTTP/REST API. It draws inspiration from Amazon’s S3, a popular storage system.
Key learning goals include:
- Understanding on-disk structures for file systems.
- Exploring the internals of file systems.
- Learning about distributed storage systems.
The project is implemented in three parts:
- Reading on-disk storage: Using command-line utilities.
- Creating a local file system: Organizing disk blocks into a hierarchical structure.
- Building a distributed file system: Extending the local file system to provide network-accessible APIs.
- Persistent storage with crash consistency.
- Directory and file management.
- HTTP/REST API for client interactions.
- Error handling to ensure filesystem integrity.
The DFS uses HTTP methods to interact with files and directories:
- Create/Update: Use
PUTwith the file path and contents in the request body. - Read: Use
GETwith the file path to retrieve its contents. - Delete: Use
DELETEwith the file path.
- Create: Created implicitly during file creation.
- List Contents: Use
GETwith the directory path to list entries, sorted alphabetically. Entries for subdirectories include a trailing/. - Delete: Use
DELETEwith the directory path. Deletion of non-empty directories is an error.
# Create or update a file
curl -X PUT -d "file contents" http://localhost:8080/ds3/a/b/c.txt
# Read a file
curl http://localhost:8080/ds3/a/b/c.txt
# List directory contents
curl http://localhost:8080/ds3/a/b/
# Delete a file
curl -X DELETE http://localhost:8080/ds3/a/b/c.txtThe local file system uses the following on-disk structures:
- Super Block: Metadata about the file system.
- Bitmaps: Tracks allocated inodes and data blocks.
- Inode Table: Metadata for files and directories.
- Data Blocks: Stores file and directory contents.
- Each directory has an inode and one or more data blocks.
- Directories include two special entries:
.(self)..(parent)
- Unused directory entries have an inode number of
-1.
To maintain consistency, all disk writes are carefully ordered. Transactions ensure that:
- Errors do not modify the file system.
- Changes are committed or rolled back atomically.
Errors return appropriate HTTP responses:
- Resource not found:
ClientError::notFound() - Insufficient storage:
ClientError::insufficientStorage() - Conflict:
ClientError::conflict()(e.g., creating a directory where a file exists). - Bad request:
ClientError::badRequest()
Three command-line utilities are provided to debug and inspect the file system:
Lists all files and directories in a disk image recursively, starting from the root. Entries are printed in a depth-first order.
Prints the contents of a file to standard output. It includes both the file’s data blocks and actual data.
Displays metadata about the file system, including the super block and inode/data bitmaps.
- API handlers are implemented in
DistributedFileSystemService.cpp. - Use provided tools like
mkfsto create file system images. - Test APIs with
cURLor similar HTTP clients.
- Use
LocalFileSystemfor read/write operations. - Implement transaction handling using
beginTransaction,commit, androllback.
This project demonstrates the fundamental principles of distributed storage systems, enabling scalable and efficient file management across a network.