Dynamic Connectivity: Input is a sequence of integer pairs, where each integer represents an object of some kind and we interpret the pair p q as "p is connected to q". We assume that "being connected to" is an equivalence relationship:
- Symmetrical: if
pis connected toq, thenqis connected top; - Transitive: if
pis connected toqandqis connected tor, thenpis connected tor; - Reflective:
pis connected top.
An equivalence relationship partitions objects into equivalence classes or related components (or only components or groups). In this case, two items are in the same component if and only if they are connected.
We can identify both items and components by integers between 0 and N-1.
Initially, there are N independent components (no connections), with each item in its own component. The identifier of a component is one of the items that compose it. Two items have the same component identifier if and only if they are part of the same component.
The objective is to write a program to filter external pairs in a sequence:
when the program reads a pair p q from standard input, it must write the pair itself that was read from standard output only if the pairs read so far do not imply that p is connected to q. If the pairs read so far imply that p is connected to q, so the program should not print anything and proceed to the next pair.
Below is the Union-Find API. It encapsulates the basic operations we need:
UF(int N) // initializes N items with integer names (0 to N-1)
void union(int p, int q) // add connection between p and q
int find(int p) // identifies the component of p (0 to N-1)
boolean connected(int p, int q) // returns true if p and q are in the same component
int count() // returns the number of componentsThe identifier of a component can change only from a call to the union method. This identifier cannot be changed by the find, connected or count methods.
To test the above API, the main() method in the UF2.java file resolves the problem of dynamic connectivity. The challenge here is to implement the other functions (find(), connected() and union()) to get the best results possible (in this case best results are printing the right pairs, counting the components and finishing in the shortest time possible).
Data to test the program is also available:
The tinyUF.txt file contains 11 connections and at the end is supposed to have 2 components The mediumUF.txt file contains 900 connections and at the end is supposed to have 3 components The largeUF.txt file contains millions of connections and at the end is supposed to have 6 components;
The algorithm is based on the following principles:
- Using the idea of the concept of pointers in C (in this case, an element of a group will "point" to another element that will be used as a "reference" within the group).
- Identifying which group each element belongs to does not need to be up to date all the time, but needs a mechanism that enables identification based on the element that serves as the group's reference
- Within a group, the reference value that serves as the group identification is the value of the smallest element in the group and therefore the implementation of the functions is always based on the smallest value.
- The array that is created and initialized in the algorithm is the very universe of elements and groups: the array indices represent each element and the array content at each position is an integer value representing which group such element is part of.
- For two elements to be connected, they just have to be in the same group, i.e.
id[p] == id[q](takingid[]as the int array that is created at the beginning of the algorithm).
Explanation of Methods (Functions):.
As we assume that it is not necessary to keep updated in the array the number that identifies which group that element is part of, but to have this value updated only when necessary (i.e. when the find(p) function is called), this function 'find()' implements logic to update this group information of the element being fetched. The only element that always needs to be up to date is the element being used as a reference in a group, because if it is no longer the reference (when that group is linked to a lower value element), the other elements of the group must have a way of knowing what the new group value is. Therefore, the main mechanisms in the algorithm are:
-
'
union(int p, int q)' method: checks which of the elements passed as an argument have the lowest group identifier value, so that it will be possible to change the identifying value of the right group (whether this is a unitary group or a group with multiple elements ). Before changing the group ID ofporqthis method makes the same change to the element that was referencing that element to be changed. For example: suppose there is a group with elements3and4and8. In this group the group identifier is3, soid[3] == 3,id[4] == 3andid[8] == 3. Assuming that a link between2(which belongs to its own unitary group2) and4is made, the method will verify that2has the smallest stored value and will change the reference to4(i.e.id[4]will be equal to2). But before doing so, the function changes the value of the reference element of4, i.e. theid[4]element, which has the value of the groupid[id[4]], which in this case would beid[3], ie3. Therefore, the element3that used to reference the group and "pointed to itself" will now point to2, i.e.id[3] == id[2]which is equal to2. Thus, element4had its value changed and the reference element also had its value changed. One detail is that the element8, which is part of the group has not yet been updated and will only be when it is requested to identify the group (in the case of this algorithm when thefind()method is executed). -
connected(int p, int q)method: Within the framework of this algorithm, the usage of this method will ensure that only unconnected pairs are connected. In this method there is an initial check to know which of the two numbers,porq, has the smallest group identification. If passed to the function in reverse, i.e.id[q] < id[p]the method swaps, as this ensures that elements with smaller component identifiers are always updated before larger ones because components are always referenced with the lowest value. -
find(int p)method - What this method does is to check if the group identifier of the element passed to the method is equal to the group identifier that the element is referencing (i.e. the element that is supposed to be the smallest in the group and therefore the one that identifies the group). If not, it means that the reference has been changed (due to some connection that has occurred) and the new group reference should be searched. For this happens, it calls the method again recursively, but then passing this value that was once the reference. When the method "arrives" at the group reference, it starts thereturnby passing the number of this current reference that updates all group elements. Here's an example:
Starting from the 10-position array, just after it is started, that is, the value of the array in each index is the index value itself (each element within its own unitary group), the following connections occur:
8and4:id[4] == 4andid[8] == 4;3and4:id[3] == 3andid[4] == 3;4and5:id[4] == 3andid[5] == 3;2and5:id[2] == 2andid[5] == 2and, in addition, theunionmethod causes the old reference of5to update as well, i.e.id[3] == 2. It is important to notice that the other elements of the group have not been updated. So at this moment we have:2,3,4,5, and8all in the same group (logically speaking), but in practice their group identifier values are as follows:id[2] == 2,id[3] == 2,id[ 4] == 3,id[5] == 2andid[8] == 4;- Assuming now that it is checked if the
8and2are connected. This will call theconnected(2, 8)method which will call thefindmethod for both numbers. When callingfind(2)the function verifies that2points to itself, i.e.id[2] == 2and thereforeid[id[2]] == 2and simply returns the value2. But when calling the methodfind()passing8as an argument, i.e.find(8)what happens is as follows: id[8] == 4and then compare withid[id[8]], i.e. compare withid[4].- Like
id[4] == 3and thereforeid[8]! = id[id[8]]the methodfind()is called again, but this time passingid[8]as an argument, i.e.4. - When calling
find(4), it checks thatid[4]! = id[id[4]], sinceid[4] == 3andid[id[4]](ieid[3]) is2. Therefore, it recursively calls thefindmethod, but this time passingid[4], i.e.3. - When calling
id[4](3in this case) the method checks that ifid[3] == id[id[3]], i.e.3is up to date with the group reference, which in this case is2. Thus, it starts the "cascade"returnsby returning2forid[4]and consequently forid[8]and thus updating the group values starting from that initial value (8) and checking smaller ones. Note: Assuming that before all the connections listed in this example, there had been a connection between8and9, i.e.id[9] == 8, whenfind(8)was called, the reference of9would not be updated, but, if afterfind(8)were executed afind(9)would be updated with only one iteration.
After reading the total value of components at the beginning of the algorithm, a count variable counts how many groups are left. Each time a connection is made, this value decreases by 1. Therefore, the control of the total components at the end is left to the count attribute.
At the end of the algorithm, there will logically be the number of groups indicated by count, but in practice the identification of groups could point to another value, that is, assuming that there are 3 components left (according to count) if the group indication was verified in each one would have more groups because group IDs only update when you need to know the group of an element. That is, if after the algorithm were asked which group each element is part of, the universe of this algorithm would have been updated as expected, i.e., in this hypothetical situation, there would be 3 components.
This algorithm is for analysis purposes, so it is not failsafe. This means that if it is not executed with a .txt file (like the ones in this repository), the standard input will be the keyboard and the program will stop working if char or string inputs differ from numbers or if a number beyond the array index is passed to make the union.
To run the code it is important to point one of the files provided as the stdin. To do this simple run in the terminal:
java UF2 < name_of_the_file.txt. For example: java UF2 < largeUF.txt