Principles Of Distributed Database Systems Exercise Solutions

This essay explores the core principles of distributed database systems (DDBS) by analyzing common architectural challenges and their standard exercise solutions. Distributed databases manage data across multiple physical locations while appearing as a single logical unit to the user, necessitating complex solutions for transparency, consistency, and reliability. The Principle of Distribution Transparency

A primary goal of a DDBS is to hide the complexities of data distribution from the user. Exercise solutions in this area typically focus on Location Transparency and Fragmentation Transparency.

Problem: How can a user query a table without knowing it is split across servers in New York and London?

Solution: Systems use a Global Conceptual Schema (GCS) that maps logical tables to physical fragments. Solutions often involve "Transparent Mapping," where the query optimizer automatically decomposes a global query into sub-queries targeted at specific nodes. This ensures that the user's SQL remains identical regardless of where the data resides. Data Fragmentation and Allocation

Efficiency in a distributed system depends on how data is divided. Exercises often ask for the best way to fragment a database based on access patterns.

Horizontal Fragmentation: Dividing a relation into subsets of tuples (rows). Solutions usually involve using selection predicates (e.g., WHERE City = 'Chicago') to keep data close to its most frequent users.

Vertical Fragmentation: Dividing a relation into subsets of attributes (columns). Solutions focus on grouping attributes that are frequently accessed together to reduce unnecessary I/O across the network.

Allocation: The "Materialization" of these fragments. Exercise solutions typically apply the "Locality of Reference" principle—placing data where it is most frequently accessed to minimize communication costs. Distributed Query Processing

Querying across multiple nodes introduces the "Join" problem. Since moving large tables across a network is expensive, solutions prioritize minimizing data transfer.

Semijoin Optimization: A classic exercise solution to reduce communication cost. Instead of sending an entire Table A to Table B’s site for a join, the system sends only the joining column of A. Table B filters its rows against this column and sends back only the matching records. This drastically reduces the volume of data crossing the network. Concurrency Control and Consistency

Maintaining data integrity across sites is perhaps the most difficult aspect of DDBS. Exercises often center on the CAP Theorem (Consistency, Availability, Partition Tolerance) and the Two-Phase Commit (2PC) protocol.

Two-Phase Commit (2PC): To ensure atomicity (all or nothing), solutions follow a "Prepare" phase and a "Commit" phase. A coordinator asks all participants if they are ready; if even one node fails or votes "No," the entire transaction is rolled back.

Deadlock Detection: In distributed systems, deadlocks can occur across sites. Solutions often involve a "Global Wait-For Graph" (GWFG) or timestamp-based techniques like "Wait-Die" or "Wound-Wait" to prevent circular dependencies between remote transactions. Reliability and Replication

Replication ensures that if one node fails, the system remains operational. However, keeping replicas synchronized is a major hurdle.

Exercise Solution: Solutions often utilize a Primary Copy or Voting algorithm. In a Primary Copy setup, all updates go to one master node first. In Voting, a transaction must write to a "quorum" (majority) of replicas to be considered successful, balancing the trade-off between high availability and strict consistency. Conclusion

The study of distributed database system exercises reveals a consistent theme: the trade-off between performance and transparency. Solutions to these problems—ranging from semijoins for query optimization to two-phase commits for integrity—demonstrate the necessity of rigorous protocols to manage the inherent "noise" and latency of networked environments. Understanding these principles is essential for building scalable, resilient modern applications.

Finding reliable exercise solutions for Principles of Distributed Database Systems

(by M. Tamer Özsu and Patrick Valduriez) usually involves looking for the official instructor manual or community-verified repositories.

Here is a breakdown of the core principles typically covered in those exercises, along with how to find specific solutions: Key Principles Covered in Exercises

If you are working through the textbook, most problems focus on these four pillars: Fragmentation & Distribution Design: Exercises often ask you to perform Horizontal Fragmentation (using predicates) or Vertical Fragmentation

(using affinity matrices). The goal is to minimize irrelevant data access. Query Decomposition & Optimization:

You'll likely encounter problems on converting Calculus to Algebra and using the Iterative Dynamic Programming algorithms to find the lowest-cost join order across sites. Distributed Concurrency Control: Solutions here focus on 2-Phase Locking (2PL) Timestamp Ordering

. A common exercise involves detecting "Global Deadlock" using a Distributed Wait-For Graph. Reliability & 2-Phase Commit (2PC):

Problems usually simulate a site or link failure and ask you to determine if the coordinator or participants can reach a decision (Commit/Abort) based on their logs. Where to Find Solutions University Course Pages:

Many professors (e.g., from Waterloo, ETH Zurich, or Georgia Tech) post "Assignment Solutions" for this specific curriculum. Searching for CS 448 solutions Distributed Databases syllabus PDF often yields direct answer keys. GitHub Repositories:

Search GitHub for "Özsu Valduriez solutions." Several graduate students have uploaded their worked-out LaTeX solutions for the 3rd and 4th editions. Publisher Resources:

If you have an instructor's login, the official Pearson or Springer site provides the "Instructor’s Guide," which contains the definitive answers to every end-of-chapter problem. Quick Tip for Solving "Joins" When solving distributed join exercises, always check if a

is more efficient than a standard join. Reducing the size of the relation before shipping it across the network is almost always the "correct" answer in this textbook's context. specific problem (like a fragmentation matrix or a query tree)? AI responses may include mistakes. Learn more

Official exercise solutions for the textbook "Principles of Distributed Database Systems" by M. Tamer Özsu and Patrick Valduriez are primarily reserved for instructors who teach courses using the book. However, select resources and examples of specific solutions are available through academic platforms and institutional sites. Official Instructor Resources

Access to the full, authorized solution manual is typically restricted to educators to maintain the integrity of student assessments:

Official Book Site: The Principles of Distributed Database Systems site notes that solutions are only available to verified instructors.

Requesting Access: If you are an instructor, you can often request these materials directly from the publisher or through the University of Waterloo CS faculty portal. Publicly Accessible Solution Samples

For students looking for practice or specific problem breakdowns, some chapters and problems have been shared online:

Fragmentation Exercise (Ch 3): A detailed solution for Primary Horizontal Fragmentation (Exercise 3.2) is available, illustrating how to derive minterm predicates for distributed design.

Technical Summaries: Platforms like GitHub host community-generated study notes that summarize key principles like CSMA/CD, network topologies (Bus, Star, Ring, Mesh), and data distribution strategies.

Assignment Banks: Academic sites like Scribd and Course Hero often host student-uploaded assignments and partial solution sets covering query processing and concurrency control. Key Concepts Covered in Exercises

Most solutions focus on the following foundational distributed principles:

Fragmentation & Allocation: Dividing relations into horizontal or vertical fragments and placing them across nodes.

Transparency: Exercises often ask to define or apply levels of transparency (location, fragmentation, replication).

Distributed Transactions: Implementation of ACID properties (Atomicity, Consistency, Isolation, Durability) across multiple sites.

Concurrency Control: Managing simultaneous data access using distributed locking or timestamp ordering.

Query Optimization: Calculating the cost of moving data versus local processing for global queries.

Are you working on a specific chapter or exercise number from the book that you need help with? Principles of Distributed Database Systems, Third Edition

Mastering the Core: Principles of Distributed Database Systems Exercise Solutions

Distributed database systems (DDBS) are the backbone of modern, globalized computing. From social media feeds to international banking, the ability to manage data across multiple physical locations is essential. However, the complexity of these systems—covering fragmentation, replication, query optimization, and transaction management—can be daunting.

Working through exercise solutions is often the only way to bridge the gap between abstract theory and technical implementation. This article explores the fundamental principles of DDBS through the lens of common problem sets and their solutions. 1. Data Fragmentation and Allocation

One of the first challenges in a distributed environment is deciding how to split data (fragmentation) and where to put it (allocation). Horizontal vs. Vertical Fragmentation

Horizontal Fragmentation: Dividing a relation into subsets of tuples (rows). Solutions usually involve defining selection predicates (e.g., WHERE City = 'New York').

Vertical Fragmentation: Dividing a relation into subsets of attributes (columns). Solutions focus on grouping attributes frequently accessed together, often using an Attribute Affinity Matrix. Common Exercise Scenario:

Problem: Given a global schema and specific site queries, determine the optimal fragments.

Solution Tip: Use Minterm Predicates. By combining all simple predicates from applications, you create non-overlapping fragments that satisfy the "completeness" and "disjointness" rules. 2. Distributed Query Processing

In a distributed system, the cost of moving data over a network often outweighs the cost of local disk I/O. Localization and Optimization

Query processing solutions typically follow a four-step process:

Query Decomposition: Rewriting the calculus query into an algebraic one.

Data Localization: Replacing global relations with their fragments.

Global Optimization: Finding the best join order and communication strategy. Local Optimization: Selecting the best local access paths. Common Exercise Scenario:

Problem: Calculate the cost of a join between two tables located at different sites using a Semi-join. This essay explores the core principles of distributed

Solution Tip: Remember that a semi-join reduces the size of the operand before it is sent across the network. If Size(Semi-join result) + Cost(Moving result) < Size(Original Table), the semi-join is more efficient. 3. Distributed Concurrency Control

Ensuring consistency when multiple users access data across sites requires sophisticated locking and ordering mechanisms. Locking and Timestamping

Distributed 2-Phase Locking (2PL): Managing "lock" and "unlock" phases across multiple nodes. Solutions often deal with Global Deadlock Detection, where a cycle exists in the Wait-For-Graph across different sites.

Timestamp Ordering: Assigning unique timestamps to transactions to ensure serializability without explicit locking. 4. Reliability and the Two-Phase Commit (2PC)

How do we ensure that a transaction either commits at every site or aborts at every site? The 2PC Protocol

Voting Phase: The coordinator asks participants if they are ready to commit.

Decision Phase: Based on the votes, the coordinator sends a "Global Commit" or "Global Abort" message. Common Exercise Scenario:

Problem: What happens if the coordinator fails after sending a "Prepare" message but before receiving all votes?

Solution Tip: This leads to a "blocked" state. Participants cannot decide on their own because they don't know the global outcome, highlighting a major weakness of basic 2PC (the need for 3PC or recovery protocols). 5. Parallel Database Systems

While distributed systems focus on geographic separation, parallel systems focus on performance via multiple processors and disks. Architectures Shared Memory: Fast but limited scalability.

Shared Disk: Good for clusters but suffers from communication overhead.

Shared Nothing: The gold standard for massive scalability (e.g., MapReduce, Hadoop). Conclusion: How to Approach Exercise Solutions

When studying "Principles of Distributed Database Systems," don't just look for the answer. Focus on the correctness rules: Completeness: No data is lost during fragmentation.

Reconstruction: You can rebuild the original relation from fragments.

Disjointness: Data isn't unnecessarily duplicated (unless specifically replicated for availability).

By mastering these mathematical and logical foundations, you move beyond rote memorization and toward designing resilient, high-performance distributed architectures.

The flickering neon sign of "The Partitioned Plate," a diner known for its chaotic yet surprisingly efficient service, hummed with a low-frequency buzz. Inside, Elara, a database architect with a penchant for solving unsolvable puzzles, sat hunched over a worn copy of "Principles of Distributed Database Systems."

She wasn't just reading; she was wrestling with a phantom. A phantom named "The Inconsistent State."

For weeks, her team's distributed transaction system had been plagued by phantom reads and lost updates. Every time they thought they had the concurrency control figured out, a new anomaly would ripple through the nodes like a digital seismic wave.

"Trouble with the exercise sets again, Elara?" a voice rasped from across the counter. It was Silas, the diner's owner, a man whose wisdom was as deep as his coffee was black.

Elara sighed, pushing the book toward him. "Exercise 12.4. Reliability and Fault Tolerance. I can't seem to find the right balance between replication and performance. Every time I increase the replication factor to handle node failures, the write latency skyrockets."

Silas leaned in, his eyes twinkling. "Think of this diner, Elara. We've got three kitchens, right? All serving the same menu. If one kitchen goes down, the others pick up the slack. But if we try to make sure every single chef in every kitchen knows exactly what every customer ordered the second they order it, nothing would ever get cooked."

Elara frowned. "But we need consistency, Silas. We can't have one customer getting their pancakes while another is told they're out of stock when they're not."

"Exactly," Silas said, tapping the book. "The key isn't perfect synchronization. It's about

consistency. You don't need every node to be identical every millisecond. You just need them to agree on the final state before the bill is paid."

He pointed to a specific diagram in the exercise set—a complex web of message exchanges and heartbeat protocols. "Look at the quorum-based protocols. They don't require everyone to agree, just a majority. It's like my staff. If three out of five servers say we're out of blueberry muffins, we're out of blueberry muffins. We don't need to wait for the other two to check the pantry."

Elara's eyes widened. She began to see the logic. The exercise wasn't about finding a single, perfect solution; it was about understanding the trade-offs. The "answer" wasn't a formula, but a strategy.

She spent the rest of the night scribbling notes, mapping out quorum systems and failure-aware commit protocols. The solutions weren't just lines of code; they were a blueprint for a resilient, distributed world.

As the sun began to peek over the horizon, Elara finally closed the book. The phantom of inconsistency hadn't vanished, but it was no longer a threat. She had the principles. She had the solutions. And most importantly, she had a fresh perspective, courtesy of a diner owner and a very challenging exercise set.

She left a generous tip, not just for the coffee, but for the clarity. The "Principles of Distributed Database Systems" were no longer just abstract concepts; they were the tools she would use to build something truly robust. And as she stepped out into the crisp morning air, she knew that even in a world of distributed systems and inevitable failures, consistency, eventually, would always prevail.

Introduction

Distributed database systems are designed to store and manage data across multiple sites or nodes, which can be geographically dispersed. The primary goal of a distributed database system is to provide a unified view of the data, while ensuring that the data is consistent, reliable, and easily accessible. In this write-up, we will discuss the principles of distributed database systems and provide solutions to exercises that illustrate these principles.

Principles of Distributed Database Systems

Fragmentation: Fragmentation involves dividing a large database into smaller, more manageable pieces called fragments. Each fragment is stored at a different site, and the fragments are combined to provide a unified view of the data.
Replication: Replication involves maintaining multiple copies of data at different sites to improve availability and reliability. Each copy of the data is called a replica.
Distribution: Distribution involves storing data across multiple sites, which can be geographically dispersed.
Autonomy: Autonomy refers to the ability of each site to operate independently, making decisions about data management and consistency.
Transparency: Transparency refers to the ability of the system to hide the distribution of data from the users, providing a unified view of the data.

Exercise Solutions

Exercise 1: Fragmentation and Replication

Consider a distributed database system that stores information about customers, orders, and products. The database is fragmented into three fragments:

Fragment 1: Customers (Customer_ID, Name, Address)
Fragment 2: Orders (Order_ID, Customer_ID, Order_Date)
Fragment 3: Products (Product_ID, Product_Name, Price)

Each fragment is replicated at two sites: Site A and Site B.

Fragment 1: Site A and Site C
Fragment 2: Site B and Site D
Fragment 3: Site A and Site B

Draw a diagram showing the fragmentation and replication of the database.

Solution

The diagram below shows the fragmentation and replication of the database:

          +---------------+
          |  Fragment 1  |
          |  (Customers)  |
          +---------------+
                  |
                  |
                  v
+---------------+       +---------------+
|  Site A      |       |  Site C      |
|  (Replica 1) |       |  (Replica 2) |
+---------------+       +---------------+
+---------------+
          |  Fragment 2  |
          |  (Orders)    |
          +---------------+
                  |
                  |
                  v
+---------------+       +---------------+
|  Site B      |       |  Site D      |
|  (Replica 1) |       |  (Replica 2) |
+---------------+       +---------------+
+---------------+
          |  Fragment 3  |
          |  (Products)  |
          +---------------+
                  |
                  |
                  v
+---------------+       +---------------+
|  Site A      |       |  Site B      |
|  (Replica 1) |       |  (Replica 2) |
+---------------+       +---------------+

Exercise 2: Distribution and Autonomy

Consider a distributed database system that stores information about employees and departments. The database is distributed across three sites: Site A, Site B, and Site C. Each site has its own local database and is autonomous.

Site A: Employees (Employee_ID, Name, Department_ID)
Site B: Departments (Department_ID, Department_Name)
Site C: Employee_Department (Employee_ID, Department_ID)

Describe how the system ensures autonomy and distribution.

Solution

The system ensures autonomy by allowing each site to operate independently, making decisions about data management and consistency. Each site has its own local database, which can be updated independently.

The system ensures distribution by storing data across multiple sites. The data is fragmented and distributed across the three sites, providing a unified view of the data.

For example, if a new employee is added at Site A, the employee's information is stored in the local database at Site A. If the employee's department is updated at Site B, the updated information is stored in the local database at Site B. The system ensures that the data is consistent across all sites by using distributed transactions and concurrency control.

Exercise 3: Transparency

Consider a distributed database system that stores information about customers and orders. The database is fragmented and replicated across multiple sites. Describe how the system provides transparency.

Solution

The system provides transparency by hiding the distribution of data from the users, providing a unified view of the data. The users interact with the system through a global schema, which provides a single, unified view of the data.

For example, a user can submit a query to retrieve all customers who have placed an order. The system will automatically determine which sites have the relevant data, retrieve the data, and provide the result to the user. The user is not aware of the fragmentation and replication of the data, and the system provides a unified view of the data.

Conclusion

In conclusion, distributed database systems are designed to store and manage data across multiple sites or nodes. The principles of distributed database systems include fragmentation, replication, distribution, autonomy, and transparency. By understanding these principles and how they are applied, we can design and implement effective distributed database systems that provide a unified view of the data, while ensuring that the data is consistent, reliable, and easily accessible.

Introduction

Distributed database systems are designed to store and manage large amounts of data across multiple sites or nodes. The data is typically replicated or partitioned across multiple nodes to improve performance, reliability, and scalability. In this write-up, we will discuss the principles of distributed database systems and provide solutions to common exercises. the principles of fragmentation

Principles of Distributed Database Systems

Distribution: The data is divided into smaller fragments and stored across multiple nodes.
Autonomy: Each node operates independently and makes its own decisions about data management.
Heterogeneity: Nodes may have different hardware, software, and data models.
Transparency: The distribution of data is transparent to users, who can access data without knowing its location.

Types of Distributed Database Systems

Client-Server Systems: A centralized server manages data and clients access data through queries.
Peer-to-Peer Systems: All nodes are equal and can act as both clients and servers.
Federated Systems: Multiple autonomous databases are integrated to provide a unified view.

Exercise Solutions

Exercise 1: Design a Distributed Database Schema

Suppose we have a distributed database system for a university with three nodes: Node A ( New York), Node B (Chicago), and Node C (Los Angeles). The database has two relations: Students and Courses.

Solution

We can design a distributed database schema as follows:

Node A (New York): Students relation with attributes Student_ID, Name, Age
Node B (Chicago): Courses relation with attributes Course_ID, Course_Name, Credits
Node C (Los Angeles): Enrollments relation with attributes Student_ID, Course_ID, Grade

Exercise 2: Fragmentation and Allocation

Suppose we have a relation Orders with attributes Order_ID, Customer_ID, Order_Date, and Total. We want to fragment this relation into two fragments: Orders_1 and Orders_2. We also want to allocate these fragments to two nodes: Node A and Node B.

Solution

We can fragment the Orders relation based on the Order_Date attribute:

Orders_1: Orders with Order_Date between 2020 and 2022
Orders_2: Orders with Order_Date between 2023 and 2025

We can allocate these fragments to nodes as follows:

Node A: Orders_1
Node B: Orders_2

Exercise 3: Distributed Query Processing

Suppose we have a query to retrieve the names of students who are enrolled in a course with a specific course ID.

Solution

We can process this query in a distributed manner as follows:

Node A (New York) receives the query and sends a subquery to Node C (Los Angeles) to retrieve the Student_IDs of students enrolled in the course.
Node C (Los Angeles) executes the subquery and sends the Student_IDs back to Node A.
Node A (New York) receives the Student_IDs and sends another subquery to Node A to retrieve the names of students with those Student_IDs.
Node A (New York) executes the subquery and sends the names of students back to the user.

Conclusion

Distributed database systems are complex systems that require careful design, implementation, and management. Understanding the principles of distributed database systems, including distribution, autonomy, heterogeneity, and transparency, is crucial for designing and implementing efficient and scalable systems. The exercise solutions provided in this write-up demonstrate how to apply these principles to real-world problems.

References:

[1] M. T. Özsu and P. Valduriez, "Principles of Distributed Database Systems", 3rd ed., Springer, 2011.
[2] S. C. B. Tan, "Distributed Database Systems: A Tutorial", Prentice Hall, 2001.

Official exercise solutions for Principles of Distributed Database Systems

(by M. Tamer Özsu and Patrick Valduriez) are generally restricted to instructors. However, specific chapter solutions and study guides are available through academic platforms. 📖 Accessing Solutions

Official Instructor Site: The authors provide a dedicated portal for the 4th Edition and 3rd Edition. Access typically requires a verified teaching account.

Chapter-Specific Previews: Detailed solutions for Chapter 3 (Distributed Database Design), including fragmentation and join graph exercises, can be found on Studocu.

Academic Repositories: Full solution manuals are sometimes uploaded to student resource sites like Course Hero. 💡 Sample Exercise Solution: Horizontal Fragmentation

Below is a summary of a common exercise from the text regarding Primary Horizontal Fragmentation: Problem: Derive fragments for an employee relation ASGcap A cap S cap G based on two applications: Accesses employees by their role (RESP). Accesses employees by their assignment duration (DUR). Solution Steps: Define Simple Predicates: Form Minterm Predicates: Combine role and duration (e.g.,

Create Fragments: Each non-empty minterm defines a fragment of the database (e.g., 🛠️ Key Topics Covered in Manuals

Finding formal exercise solutions for the authoritative textbook Principles of Distributed Database Systems

(4th Edition, 2020) by M. Tamer Özsu and Patrick Valduriez can be challenging because the authors primarily restrict full solution manuals to instructors. University of Waterloo

However, you can access specific helpful resources and sample solutions through the following official and verified academic channels: 1. Official Textbook Resources The authors maintain a dedicated site at the University of Waterloo

for the 4th edition. While the full manual is restricted, this site is the most reliable source for: Solutions to Selected Exercises

: Links to specific PDFs containing verified answers for core chapters. Presentation Slides

: These often contain "in-class" examples and solved problems that mirror the exercises in the book.

: Crucial for ensuring you aren't trying to solve an exercise with a typo. Official Site Principles of Distributed Database Systems, 4th Ed 2. Verified Solutions for Key Concepts

Common exercises in this field often focus on specific algorithmic problems. You can find high-quality, solved examples for these topics on academic platforms: Data Fragmentation & Allocation

: Step-by-step solutions for vertical and horizontal fragmentation can be found on Distributed Query Optimization

: Look for solutions regarding join ordering and semijoin programs, which are frequently used in distributed systems homework. Concurrency Control

: Solutions involving Two-Phase Commit (2PC) and Paxos consensus algorithms are often provided in university course repositories like those at 3. Alternative Peer-to-Peer Learning

If official solutions are unavailable for a specific problem, these platforms host student-uploaded solution sets: CourseHero

: Hosts various versions of the "Principles of Distributed Database Systems Exercise Solutions" uploaded by students from institutions like GITAM University BITS Pilani Database System Concepts (Practice Site) : While for a different book, the Practice Exercises

by Silberschatz et al. provide publicly available solutions for overlapping topics like distributed transactions and deadlock. Course Hero

Principles of Distributed Database Systems

A distributed database system is a collection of multiple databases that are connected through a network, allowing users to access and share data across different locations. The main goals of a distributed database system are:

Improved data availability: Data is available at multiple sites, reducing the risk of data loss or unavailability.
Increased scalability: Distributed databases can handle large amounts of data and support a large number of users.
Enhanced performance: Data can be accessed from multiple sites, reducing the load on individual databases.

Key Concepts

Fragmentation: Breaking a large database into smaller fragments, each stored at a different site.
Replication: Maintaining multiple copies of data at different sites to improve availability and performance.
Distribution: Storing data across multiple sites, each with its own database management system.

Types of Distributed Database Systems

Client-Server Systems: A central server manages data, and clients access data through a network.
Peer-to-Peer Systems: All sites are equal, and each site can act as both a client and a server.

Exercise Solutions

Exercise 1: What are the main advantages of a distributed database system?

Solution: The main advantages of a distributed database system are:

Improved data availability
Increased scalability
Enhanced performance

Exercise 2: What is fragmentation in a distributed database system?

Solution: Fragmentation is the process of breaking a large database into smaller fragments, each stored at a different site.

Exercise 3: What is replication in a distributed database system?

Solution: Replication is the process of maintaining multiple copies of data at different sites to improve availability and performance.

Exercise 4: Consider a distributed database system with three sites: A, B, and C. Each site has a copy of a relation R. The relation R has the following tuples:

| ID | Name | Age | | --- | --- | --- | | 1 | John | 25 | | 2 | Jane | 30 | | 3 | Joe | 35 |

Site A has the following fragment of R:

| ID | Name | Age | | --- | --- | --- | | 1 | John | 25 | | 2 | Jane | 30 |

Site B has the following fragment of R:

| ID | Name | Age | | --- | --- | --- | | 2 | Jane | 30 | | 3 | Joe | 35 |

Site C has the following fragment of R:

| ID | Name | Age | | --- | --- | --- | | 1 | John | 25 | | 3 | Joe | 35 |

a. What is the fragmentation of R?

b. What is the replication factor of R?

Solution:

a. The fragmentation of R is:

R = R1 ∪ R2 ∪ R3

where R1, R2, and R3 are the fragments of R at sites A, B, and C, respectively.

b. The replication factor of R is 3, since there are three copies of R, one at each site.

Exercise 5: Consider a distributed database system with two sites: A and B. Site A has a relation R1, and site B has a relation R2. The relations R1 and R2 have the following tuples:

R1:

| ID | Name | Age | | --- | --- | --- | | 1 | John | 25 | | 2 | Jane | 30 |

R2:

| ID | Name | Age | | --- | --- | --- | | 3 | Joe | 35 | | 4 | Sarah | 20 |

Design a distributed query to retrieve all tuples from R1 and R2.

Solution:

The distributed query can be written as:

SELECT * FROM R1 UNION SELECT * FROM R2

This query retrieves all tuples from R1 at site A and R2 at site B, and combines them into a single result set.

Dr. Elara Vance stared at the error log. It wasn't just red; it was a deep, angry crimson that seemed to pulse on her terminal. Twenty-three nodes in her distributed database cluster, spread across three continents, were returning a "referential integrity anomaly." It was 3:00 AM. The CET-SAT simulation, a global test of their distributed financial ledger, had failed catastrophically.

"Not tonight," she whispered, kneading her temples. The exercise was simple in theory: execute a series of atomic transactions that moved virtual currency between accounts while maintaining ACID properties across the network. The solution, the beautiful theoretical proof on her whiteboard, had promised convergence. Reality, as always, had other plans.

The problem was a phantom read. A classic edge case in multi-version concurrency control (MVCC). Node Alpha in London and Node Gamma in Tokyo had both approved a withdrawal from the same phantom account within 50 milliseconds of each other. Their local timestamps had conflicted, and the global consensus protocol—a modified Paxos—had chosen both. Now the ledger was in a superposition of states: both rich and poor.

Elara pulled up her copy of the instructor's manual, Principles of Distributed Database Systems: Exercise Solutions. It wasn't a book she had written; rather, it was the accumulated wisdom of a hundred previous failures, curated by her mentor, Professor Hideo Tanaka. He called it "The Grimoire."

She flipped to Chapter 9: Global Commit Protocols. Exercise 9.4 read:

Problem: Two-phase commit (2PC) is blocking. Describe a scenario where a coordinator failure leads to an indefinite wait for subordinate nodes. Propose a remedy using three-phase commit (3PC) or Paxos.

The solution in the grimoire was clear. But her current problem wasn't just a blocking coordinator. It was a lying coordinator. Node Alpha's leader had crashed after sending "PREPARE" but before logging its decision. Upon recovery, it had no memory of the transaction. The other nodes, waiting for a "GLOBAL-COMMIT," had timed out and unilaterally aborted—except Node Gamma, which had already applied the withdrawal due to a rogue heuristic.

She reached for the physical, dog-eared copy of the Grimoire. Inside, a handwritten note from Professor Tanaka said: "The exercise is never the storm. The exercise is learning how to patch the hull while the storm is still raging."

The official solution to 9.4 was a Paxos-based replacement for 2PC. But Paxos assumes a fair leader. She didn't have a leader. She had anarchy.

So she closed the book. She would not follow the solution. She would extend it.

She opened a new terminal window and began to write a corrective algorithm. She called it the "Phoenix Commit."

Step 1 (Detect): Run a distributed diff on the write-ahead logs of all 23 nodes. Find the anomaly: transaction #A442.

Step 2 (Quarantine): 2PC is blocking. 3PC is non-blocking but assumes no network partitions. Phoenix Commit would assume a byzantine failure—a node that lies about its state. She instructed each node to broadcast not just its vote, but its entire log hash since the last global checkpoint.

Step 3 (Reconcile): Use a quorum of 15 nodes (a strict majority + 2) to rebuild the true sequence of events. The majority spoke: Node Gamma had acted alone. The withdrawal from account #LK-99 was invalid.

Step 4 (Heal): Issue a compensating transaction. Not a rollback (that would violate isolation in their current read-committed snapshot), but a reverse transfer with a zero-value timestamp. A ghost transaction that would cancel the error without ever having existed in the official timeline.

She typed the final command:

EXECUTE PHOENIX_COMMIT ('A442', 'HEAL');

Silence.

Then, one by one, the nodes turned from angry red to calm green. Node London. Node Singapore. Node São Paulo. Finally, Node Tokyo. All 23 nodes reported STATE: CONSISTENT. The ledger re-converged. The virtual accounts balanced. The CET-SAT simulation passed with a score of 99.9999%—the 0.0001% being the ephemeral trace of the ghost transaction, a scar that only Elara would ever know to look for.

She leaned back, exhausted. The principles from the textbook—atomicity, consistency, isolation, durability—weren't commandments. They were constraints. And the exercise solutions weren't recipes. They were starting points.

Professor Tanaka's voice echoed from a memory: "The best solution to a distributed systems problem is the one you don't have to deploy. The second best is the one that survives first contact with the enemy—which is always the network, the clock, or your own hubris."

Elara looked at her whiteboard, at the beautiful theoretical proof. Then she looked at her terminal, at the ugly, elegant, 47-line Phoenix Commit patch.

She saved the patch as exercise_9.4_vance_solution.pdf and added a new note to the Grimoire:

Addendum: The official solution works for 99% of failures. For the other 1%, you must be willing to forget the exercise and solve the principle. The principle is not "don't fail." The principle is "fail in a way you can survive."

Outside, dawn bled over the data center. The distributed database hummed, its 23 hearts beating in silent agreement. And Elara Vance, for the first time that night, smiled.

The storm had passed. The hull was patched. And the ledger was true.

4. Use formal tools where useful

Two-phase locking (2PL): show locking sequences and check for deadlocks.
Timestamp ordering: show timestamps, compare read/write rules, and show conflicts.
Conflict/serialization graphs: draw nodes = transactions, directed edges = conflicting ops.
Vector clocks: show vectors at events to prove causality or detect concurrent updates.

Exercise 1.1: Horizontal Fragmentation

Problem:
A global relation EMPLOYEE(EmpID, Name, Dept, Salary, Location) has two sites:

Site A: handles queries with Dept = ‘Sales’
Site B: handles queries with Dept = ‘Eng’
Design a horizontal fragmentation. Write the min-term predicates.

Solution:
Horizontal fragmentation splits a relation into subsets of tuples based on a predicate.

Step 1 – Identify simple predicates:
p1 : Dept = ‘Sales’
p2 : Dept = ‘Eng’

Step 2 – Determine min-term predicates (conjunction of simple predicates or their negations):

m1 : Dept = ‘Sales’
m2 : Dept = ‘Eng’
m3 : Dept != ‘Sales’ AND Dept != ‘Eng’ (e.g., ‘HR’ or ‘IT’)

Step 3 – Assign fragments to sites:

Fragment F1 (Dept = ‘Sales’) → Site A
Fragment F2 (Dept = ‘Eng’) → Site B
Fragment F3 (other) → stored either at a central site or replicated.

Answer:
F1 = σ_Dept=‘Sales’(EMPLOYEE)
F2 = σ_Dept=‘Eng’(EMPLOYEE)
F3 = σ_Dept≠‘Sales’ ∧ Dept≠‘Eng’(EMPLOYEE)

5. Replication & consistency

For replicated-data exercises, list the protocol (primary-backup, quorum, eventual replication, Paxos/Raft).
For quorum proofs: use read quorum R, write quorum W, and N such that R + W > N to ensure read-write overlap (show set intersection).
For eventual consistency: demonstrate convergence by showing monotonic updates/CRDT commutativity or a reconciliation rule.
For strong consistency: show protocol steps that enforce total order (e.g., leader serializes writes, consensus ballot numbers).

The Problem Type

Given local wait-for graphs from two or three sites, construct the global WFG and identify deadlocks. Then determine if a centralized or hierarchical detector would find them.

Key Principles for Solutions

Horizontal Fragmentation (HF): Uses selection predicates (e.g., DeptID = 10). Ensure completeness (every tuple goes to some fragment) and disjointness (a tuple belongs to at most one HF fragment, unless replication is used).
Vertical Fragmentation (VF): Uses projection over attribute subsets. Must include the primary key in every fragment for reconstruction. Ensure lossless-join property.

Mastering the Principles of Distributed Database Systems: A Comprehensive Guide to Exercise Solutions

Distributed Database Systems (DDBS) represent a core pillar of modern data management. From Google Spanner to Amazon DynamoDB, the principles of fragmentation, replication, distributed query processing, and concurrency control are essential knowledge for any data professional. However, the theoretical rigor of courses like Principles of Distributed Database Systems (often based on the classic textbook by Özsu and Valduriez) means that exercises can be challenging.

This article provides a structured approach to solving common exercises in this domain. We will break down solutions by topic, explain the underlying reasoning, and offer strategies to tackle problems ranging from fragmentation to distributed deadlock detection.