## Dynamic Connectivity

**Goal**: Given a series of **pairwise** connectedness declarations, determine if two items are connected.

Two operations:

`connect(p, q)`

: Connect items`p`

and`q`

`isConnected(p, q)`

: return if`p`

and`q`

are connected

Note: There’s no way to add items, or unlink items.

## The Disjoint Sets ADT

```
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
```

**Goal**: Design an efficient DisjointSets implementation.

- Number of elements
`N`

can be huge. - Number of method calls
`M`

can be huge. - Calls to methods may be interspersed.

### The Naive Approach

- Connect two things: Record every single connecting line in some data structure.
- Cheking connectedness: do some sort of iteration over the lines to see if one thing can be reached from another.

It turns out this approach is more complex than it needs to me.

### Better Approach – Connected Components

Rather an writing down every single line, we can probably just record down which “club” each item belongs to. We don’t have to explicitly state that `1`

, `3`

are connected, we can have a way to denote that they belong to the same set.

For example, `{0, 1, 3, 5}, {2, 4}, {6}, {7}, {8}`

Downside: We can’t disconnect, but that’s not something we’re trying to do here.

### Quick Find

**Question**: What data structure should we pick to support the tracking of sets?

**Solution**: Using an array, **each index represent an item**, and the **value of the index represent the group** in which it is in.

- Use
`int[] id`

where the i^th^ entry is the set number of the item`i`

`connect(p, q)`

: Change entries that equal`id[p]`

to`id[q]`

**Downside**: Connecting two items takes linear time, which is too slow.

### Quick Union

**Idea**: Assign each node a parent (instead of an id).

Suppose we want to `connect(5, 2)`

, what is the array going to end up being?

We could change the **boss of 5** to **2**. However, this makes the tree more spindly.

So, we should change the **boss of 5** to the **boss of 2** instead, like this:

And this is the Quick Union approach.

**Downsides**:

- Finding the boss is expensive, especially for more spindly trees.
- For N items, the worst case runtime for
`connect(p, q)`

is . - The worst case runtime for
`isConnected(p, q)`

is .

Is there any way we can avoid getting tall, spindly trees? Yes there is!

#### Weighted Quick Union

**Idea**:

- Track tree size (number of elements)
- When connecting, link root of
**smaller**tree to**larger**tree.

One question to ponder: You might be thinking, why not link the root of the **shorter** tree to the **taller** tree (*Heighted Quick Union*)? It turns out, both works, and both give the same performance, but weighing by size is easier to implement.

##### Implementing Weighted Quick Union

Very few changes need to be made based on Quick Union. We only need another `int[]`

array called `size`

, and change `connect(p, q)`

to update `size[]`

every time we connect.

```
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) { parent[i] = j; size[j] += size[i]; }
else { parent[j] = i; size[i] += size[j]; }
}
```

This new approach still requires time that is proportional to the depth of the tree.

However, what’s amazing is that with this approach, the **maximum depth of any item is always **.

Implementation | Constructor | `connect` |
`isConnected` |
---|---|---|---|

QuickFindDS | |||

QuickUnionDS | |||

WeightedQuickUnionDS |

Performing M operations on a DisojointSet object with N elements:

- Runtime ranges from to .

## Path Compression: A Clever Idea

**Idea**: Whenever we do `isConnected(p, q)`

, as we go, we tie all the nodes we’ve seen to the root.

In other words, **when doing any kind of query involving a node, set that node’s parent to the root**.

This results in a union/connected operation that is **very very close to amortized constant time**.

- M operations on N nodes is ( is the
**Iterative Logarithm**) - is less than 5 for any realistic input.
- A tighter bound:

The end result of our iterative design process is the standard

Hey, kudos for making it this far! Wanted to let you know that if you liked this, you might also like tmux Cheatsheet and Shortcuts.