Week 4 (cont.)#
Lecturer: Barsha Mitra, BITS Pilani, Hyderabad Campus
Date: 22/Aug/2021
Topics Covered#
- Terminology and Basic Algorithms
- Notations and Definitions
 - Synchronous Single-Initiator Spanning Tree Algorithm using Flooding
- Algorithm Design
- Struct for each process \(P_i\)
 - Algorithm pseudo code
 - Algorithm in action
 
 
 - Algorithm Design
 - Broadcast and Convergecast Algorithm on a Tree
- Broadcast Algorithm
 - Convergecast Algorithm
 - Complexity
 
 
 - Message Ordering
 
Terminology and Basic Algorithms#
Notations and Definitions#
- Undirected unweighted graph \(G = (M, L)\), represents topology of an example graph such as the one shown below:

- Vertices are nodes
 - Edges are channels
 - \(n = |N|\), Cardinality of set of all nodes \(N = [A, B, C, D, E, F]\)
 - \(l = |L|\), Cardinality of set of all edges \(L = [AB, BC, CF, DE, EF, AD, AE, CE]\)
 - diameter of a graph :
- minimum number of edges that need to be traversed to go from any node to any other node
 - \(Diameter = max_{i, j \in N}\), Length of the shortest path between i and j where j belongs to the set N. We basically find all the minimum distances from i to all j and the diameter is the maximum value
 - In the above example let \(i = A\) and \(j = [A, B, C, D, E, F]\) then we can say that the max value of the lengths is 2 (In case of the path length for A to F), so the \(diameter = 2\)
 
 
 - A spanning tree is a tree where the graph does not have any edges that will cause cycles in it. An example for a spanning tree for the above example is shown below:

- As a rule of thumb, a spanning tree for a graph with \(n\) nodes, will have \(n - 1\) edges
 - So in the example graph has 6 nodes and the number of edges in the spanning tree is 5.
 
 
Synchronous Single-Initiator Spanning Tree Algorithm using Flooding#
- Algorithm executes steps synchronously (When a message is sent, the sender waits till all the other nodes receives the messages)
 - Root of the graph initiates the algorithm (An arbitrary node can be selected as the root)
 - QUERY messages are flooded. This means that first the root node will send a message to all its nearby nodes, and inturn those nodes will send it to its neighbors and so on.
 - The final spanning tree will have the root node as the initial arbitrary node selected in the graph.
 - Each process \(P_i (P_i \ne root)\) should output its own parent for the spanning tree. In distributed computing we interchangeably use nodes and processes
 
Algorithm Design#
Struct for each process \(P_i\)#
| Variables maintained at each \(P_i\) | Initial variable values at each \(P_i\) | 
|---|---|
| int visited | 0 | 
| int depth | 0 | 
| int parent | NULL | 
| set of int Neighbors | set of neighbors | 
Algorithm pseudo code#
Algorithm for Pi
When Round r = 1
if Pi = root then
    visited = 1
    depth = 0
    send QUERY to Neighbors
if Pi receives a QUERY message then
    visited = 1
    depth = r
    parent = root
    plan to send QUERY to Neighbors at the next round
When Round r > 1 and r <= diameter
if Pi planned to send in previous round then
    Pi sends QUERY to Neighbors
if Pi receives QUERY messages then
    visited = 1
    depth = r
    parent = any randomly selected nodes from which Query was receives plan to send QUERY to Neighbors but not to any nodes from which send was received
Algorithm in action#
Round 1 :
Root \(A\) sends \(QUERY\) to neighbors \([B, F]\)

\(B\) and \(F\) set \(A\) as parent and plans to send \(QUERY\) to its neighbors

Round 2 :
\(B\) sends \(QUERY\) to neighbors \([C, E]\) and \(F\) sends \(QUERY\) to neighbors \([E]\)

\(E\) randomly chooses \(F\) as parent and \(C\) chooses \(B\) as parent and both \(E\) and \(F\) plans to send \(QUERY\) to its neighbors

Round 3 :
\(E\) sends \(QUERY\) to neighbors \([C, D]\) and \(C\) sends \(QUERY\) to neighbors \([E, D]\)

Since \(C\) and \(E\) are already visited, the \(QUERY\) is ignored in those cases and \(D\) randomly chooses \(C\) as parent over \(E\)

The spanning tree generated is as follows :

The above spanning tree has 6 nodes and 5 edges
Broadcast and Convergecast Algorithm on a Tree#
A spanning tree is useful for distributing (via Broadcast) and collecting information (via Convergecast) to and from all the nodes
Broadcast Algorithm#

- BC1:
    -   The root sends the information to be sent to all its children
- BC2:
    - When a non root node receives information from its parent, it copies it and forwards it to its children
Convergecast Algorithm#

- CVC1:
    -   Leaf node sends what it needs to report to its parent
- CVC2:
    - At a non leaf node that is not root, a report is received from all the child nodes, the collective report is sent to its parent
- CVC3:
    - When a root node receives information from its child nodes, the global function is evaluated using the reports
Complexity#
- Each broadcast and each convergecast requires \(n - 1\) messages.
 - Each broadcast and each convergecast requires time equal to the maximum height \(h\) of the tree which is \(\mathcal{O}(n)\)
 
Message Ordering#
Group Communication#
- Broadcast - Sending a message to all members in the distributed system
 - Multicasting - A message is sent to a certain subset, identified as a group, of the processes in the system.
 - Unicasting - Point-to-point message communication
 
Causal Order#
- Causal Order is explained here: Week3DC#Causal Ordering Model
 - 2 criteria must be satisfied by causal ordering protocol
- Safety:
- A message M arriving at a process may need to be buffered until all system wide messages sent in the causal past of the send(M) event to the same destination have already arrived
 - Distinction is made between:
- Arrival of messages at a process
 - Event at which the message is given to the application process
 
 
 - Liveness:
- A message that arrives at a process must be eventually be delivered to the process
 
 
 - Safety:
 
Raynal-Schiper-Toueg Algorithm#
- Each message M should carry a log of
- All other messages
 - Send causally before M's send event, and sent to the same destination dest(M)
 
 - Log can be examined to ensure when it is safe to deliver a message
 - Channels are assumed to be FIFO
 
Local Variables#
- array of int \(SENT[1 .... n, 1 .... n]\) (n x n array)
- Where \(SENT_i[j, k]\) = no. of messages sent by \(P_j\) to \(P_k\) as known to \(P_i\)
 
 - array of int \(DELIV[1 .... n]\)
- Where \(DELIV_i[j]\) = no. of messages from \(P_j\) that have been delivered to \(P_i\)
 
 
The Algorithm#
- Message Send Event, where \(P_i\) wants to send message \(M\) to \(P_j\):
- \(send(M, SENT)\) to \(P_j\) (Here \(SENT\) array is the log which will be used to determine if the message needs to be buffered or not)
 - \(SENT[i, j] = SENT[i, j] + 1\)
 
 - Message Arrival Event, when \((M, SENT_j)\) arrives at \(P_i\) from \(P_j\):
- deliver \(M\) to \(P_i\) when for each process \(x\),
 - \(DELIV_i[x] \ge SENT_j[x, i]\)
 - \(\forall x, y, SENT_i[x, y] = max(SENT_i[x, y], SENT_j[x, y])\)
 - \(DELIV_i[j] = DELIV_i[j] + 1\)
 
 
Algorithm Complexity#
- Space complexity at each process: \(\mathcal{O}(n^2)\) integers
 - Space overhead per message: \(n^2\) integers
 - Time complexity at each process for each send and deliver event: \(\mathcal{O}(n^2)\)
 
Algorithm in action#
Assume following steps have occurred till now
- \(P_1\) sent 3 messages to \(P_2\)
- \(P_1\) sent 4 messages to \(P_3\)
- \(P_2\) sent 5 messages to \(P_1\)
- \(P_2\) sent 2 messages to \(P_3\)
- \(P_3\) sent 4 messages to \(P_2\)
Variable state
Assuming \(SENT\) of all \(P\) is aware of all the other values of \(SENT\), \(SENT\) of \(P_1\):
| 0   | 3   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
Assume the following values for \(DELIV\) arrays
\(DELIV_1 = [0, 4, 0]\)
\(DELIV_2 = [3, 0, 4]\)
\(DELIV_3 = [3, 2, 0]\)
MESSAGE EVENTS:
Now if, \(P_1\) sends \(m_1\) to \(P_2\)
1. \(SEND\) event from \(P_1\)
    1. \(SENT_1\)
    
| 0   | 3   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
2. Send \((m_1, SENT_1)\)
3. Updated \(SENT_1\)
| 0   | 3+1 | 4   |    | 0   | 4   | 4   |
| 5   | 0   | 2   | => | 5   | 0   | 2   |
| 0   | 4   | 0   |    | 0   | 4   | 0   |
2. \(RECEIVE\) \(m_1\) from \(P_1\) at \(P_2\)
1. \(DELIV_2[1] \ge SENT_1[1, 2]\)
2. \(DELIV_2[3] \ge SENT_1[3, 2]\)
3. Deliver \(m_1\) to \(P_2\)
4. \(DELIV_2[1] = DELIV_2[1] + 1 = 4\)
5. \(DELIV_2 = [4, 0, 4]\)
6. \(SENT_2\)
| 0   | 3   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
Now if, \(P_3\) sends \(m_2\) to \(P_2\)
1. \(SEND\) event from \(P_3\)
    1. \(SENT_3\)
    
| 0   | 3   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
2. Send \((m_2, SENT_3)\)
3. Updated \(SENT_3\)
| 0   | 3   | 4   |    | 0   | 3   | 4   |
| 5   | 0   | 2   | => | 5   | 0   | 2   |
| 0   | 4+1 | 0   |    | 0   | 5   | 0   |
2. \(RECEIVE\) \(m_2\) from \(P_3\) at \(P_2\)
1. \(DELIV_2[1] \ge SENT_3[1, 2]\)
2. \(DELIV_2[3] \ge SENT_3[3, 2]\)
3. Deliver \(m_2\) to \(P_2\)
4. \(DELIV_2[3] = DELIV_2[3] + 1 = 5\)
5. \(DELIV_2 = [4, 0, 5]\)
6. \(SENT_2\)
| 0   | 3   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
Now if, \(P_1\) sends \(m_3\) to \(P_3\)
1. \(SEND\) event from \(P_1\)
    1. \(SENT_1\)
    
| 0   | 4   | 4   |
| 5   | 0   | 2   |
| 0   | 4   | 0   |
2. Send \((m_2, SENT_3)\)
3. Updated \(SENT_3\)
| 0   | 4   | 4+1 |     | 0   | 4   | 5   |
| 5   | 0   | 2   | =>  | 5   | 0   | 2   |
| 0   | 4   | 0   |     | 0   | 5   | 0   |
2. \(RECEIVE\) \(m_3\) from \(P_1\) at \(P_3\)
1. \(DELIV_3[1] \ge SENT_1[1, 3]\) (FALSE)
2. \(DELIV_3[2] \ge SENT_1[2, 3]\) (TRUE)
3. Put \(m_3\) in buffer since the first condition failed, only process it after the other \(P_1\) messages are delivered to \(P_3\)