5/31/19
5/10/19
Simplified Algorithm to find largest Color Blob in a Grid (in Python and C++)
I have taught how to use algorithms and data structures for years as a professor. And many students think that just learning about algorithm and data structure is about choosing the correct implementation of a queue, search, standard graph algorithm, etc. from the either a library or canned code they learned in their class.
The best students can use a deep knowledge of how each algorithm and data structure works and then design variations or combinations of the original structures and algorithms to solve a problem. This shows an real application of knowledge and the ability to think out of the box that interviewers look for.
This article will show how I created a custom algorithm that combines a 2d array, and linked list, and some other 'tricks' to solve the problem of given a 2d grid where the cells of the grid are colored(based in integer in grid). I believe this is a very efficient, and with a small amount of code to solve this problem.
I was inspired to develop this method after the article: Bet you can't solve this Google interview question! by Kevin Ghadyani and then also by the Geeksforgeeks article: Largest connected component on a grid .
The Problem:
Find the size of the largest contiguous color area (the number cells in the color blob). For example given this grid with 4 different colors
[[1, 4, 4, 4, 4, 3, 3, 1],
[2, 1, 1, 4, 3, 3, 1, 1],
[3, 2, 1, 1, 2, 3, 2, 1],
[3, 3, 2, 1, 2, 2, 2, 2],
[3, 1, 3, 1, 1, 4, 4, 4],
[1, 1, 3, 1, 1, 4, 4, 4]]
... which represents this colored grid:
You can see the biggest color group are the underlined 1's, there are 9 of them
The approach:
There are a lot of ways to approach this problem. Most articles will create a graph and do depth first search recursivly or bredth first search. You need to keep track of what cells have been visited and have a way to pick the next un-processed cell.
When every you are working a problem with a grid that is represented by a 2d array, you don't need to use a generalized graph alorithm such as a graph library that keeps a map of nodes and list of adjacent nodes. The indexes to the 2d array serve as a constant time lookup, and the adjacentcy list is just the four directions left, right, top or bottom limited just limited at the outer bounds of the gird. You do need an effencient way to keep track of visited and to pick a unvisited cell to continue. You can easily do this by having each cell(node) have composit data of not just the color number, but a visited flag.
Below is both a python 3.6 version and a C++ version:
Python 3.6:
""" | |
Finding largest contiguous region of same color (or integer value) in grid. | |
bfs is run to find all connected cells for each region, | |
each cell is linked into a double linked list builtin to grid and counted(visited) flag | |
""" | |
class Node: | |
def __init__(self, x, y, color, prev, next): | |
self.x = x | |
self.y = y | |
self.color = color | |
self.prev = prev # double linked list of unvisited cells | |
self.next = next # so we can choose next cell from this list | |
self.counted = False # keeps track of visited cells | |
class LargestColorBlob: | |
def __init__(self, grid): | |
self.n = len(grid) | |
self.m = len(grid[0]) | |
self.head = None | |
self.tail = None | |
self.grid = [] | |
prev = None | |
for x in range(self.n): # build grid and double linked list | |
self.grid.append(list()) | |
for y in range(self.m): | |
self.grid[-1].append(Node(x, y, grid[x][y], prev, None)) | |
n = self.grid[x][y] | |
if prev: | |
prev.next = n | |
prev = n | |
self.head = self.grid[0][0] | |
self.tail = self.grid[-1][-1] | |
def delete_linked(self, n): # delete node in double linked list | |
before = n.prev | |
after = n.next | |
if before: | |
before.next = after | |
else: | |
self.head = after | |
if after: | |
after.prev = before | |
else: | |
self.tail = before | |
# note that if you are solving such large grids that the stack would overflow, then you can change | |
# to a non-recursive dfs using a list to hold the adjacent cells to be explored from. | |
# in fact you could unlink the cell from the big double linked list and string them together to form the list | |
def dfs(self, n, c): # dfs to find all connected colors and count them | |
if n.color != c or n.counted: | |
return 0 | |
count = 1 | |
self.delete_linked(n) | |
n.counted = True | |
# explore adjacent cells, not edge conditions are checked before | |
if n.x > 0: count += self.dfs(self.grid[n.x-1][n.y], c) | |
if n.x < self.n-1: count += self.dfs(self.grid[n.x+1][n.y], c) | |
if n.y > 0: count += self.dfs(self.grid[n.x][n.y-1], c) | |
if n.y < self.m-1: count += self.dfs(self.grid[n.x][n.y+1], c) | |
return count | |
def biggest_blob(self): | |
max_blob = 0 # 1 for starting cell | |
while self.head: # each time though loop will collect a color blob count | |
m = self.dfs(self.head, self.head.color) # do next item in not counted list | |
max_blob = max(m, max_blob) # only need to keep track of max count | |
return max_blob | |
data = [[1, 4, 4, 4, 4, 3, 3, 1], | |
[2, 1, 1, 4, 3, 3, 1, 1], | |
[3, 2, 1, 1, 2, 3, 2, 1], | |
[3, 3, 2, 1, 1, 2, 2, 2], | |
[3, 1, 3, 1, 1, 2, 4, 4], | |
[1, 1, 3, 1, 1, 4, 4, 4]] | |
lcb = LargestColorBlob(data) | |
print("largest size: ", lcb.biggest_blob()) | |
C++:
#include <iostream> | |
using namespace std; | |
// Finding largest contiguous region in grid or same color in value. | |
// bfs is used run to find all connected cells for each region, | |
// each cell is linked into a double linked list builtin to grid and counted(visited) flag | |
// this also shows how to do it outside of the standard C++ libraries for data structures | |
// which in mostly a exercise in learning to develop customize implementations of algorithms that | |
// can be a little faster. Not in a big O way but just less instructions cycles or memory. | |
// I could have writen a vector based version, but instead I keep the nodes for the grid | |
// in a linear array and use a macro I(X,Y) to calculate the linear index from x and y indexes | |
struct Node { | |
int color; | |
int x, y; | |
Node *next; | |
Node *prev; | |
bool visited; | |
}; | |
struct G { | |
int n, m; | |
Node *head; | |
Node *tail; | |
Node *grid; | |
} g; | |
// macro for indexing into g.grid | |
#define I(X,Y) (Y)+g.m*(X) | |
void setup(int a[], int n, int m) { | |
g.n = n; | |
g.m = m; | |
g.grid = new Node[n*m]; // index as [x*g.m + y] with I(X,Y) macro | |
Node * prev = NULL; | |
for(int x = 0; x < n; x++) | |
for(int y = 0; y < m; y++) { // build grid of nodes and linked list | |
Node* node = &g.grid[I(x,y)]; | |
node->color = a[I(x,y)]; | |
node->prev = prev; | |
node->x = x; | |
node->y = y; | |
if(prev != NULL) prev->next = node; | |
prev = node; | |
} | |
g.head = &g.grid[I(0,0)]; | |
g.tail = &g.grid[I(n-1,m-1)]; | |
}; | |
void delete_linked(Node* n) { | |
Node *before = n->prev; | |
Node *after= n->next; | |
if(before != NULL) { | |
before->next = after; | |
} else { | |
g.head = after; | |
} | |
if(after != NULL) { | |
after->prev = before; | |
} else { | |
g.tail = before; | |
} | |
} | |
int dfs(Node* node, int color) { // find count of color blob starting at node | |
if(node->color != color || node->visited) { | |
return 0; | |
} | |
delete_linked(node); | |
node->visited = true; | |
int count = 1; | |
// explore adjacent cells filter if at edge of grid | |
if(node->x > 0) | |
count += dfs(&g.grid[I( node->x-1, node->y)], color); | |
if(node->x < g.n-1) | |
count += dfs(&g.grid[I( node->x+1, node->y)], color); | |
if(node->y > 0) | |
count += dfs(&g.grid[I( node->x, node->y-1)], color); | |
if(node->y < g.m-1) | |
count += dfs(&g.grid[I( node->x, node->y+1)], color); | |
return count; | |
} | |
int max_color() { | |
int max_blob = 0; // biggest blob size | |
while(g.head != NULL) { | |
int blob_max = dfs(g.head, g.head->color); // current blob size | |
if(blob_max > max_blob) max_blob = blob_max; | |
} | |
return max_blob; | |
} | |
int main() { | |
int n = 6, m = 8; | |
int a[] = { // note that I am using linear array to hold grid! | |
1, 4, 4, 4, 4, 3, 3, 1, | |
2, 1, 1, 4, 3, 3, 1, 1, | |
3, 2, 1, 1, 2, 3, 2, 1, | |
3, 3, 2, 1, 1, 2, 2, 2, | |
3, 1, 3, 1, 1, 2, 4, 4, | |
1, 1, 3, 1, 1, 4, 4, 4 | |
}; | |
setup(a,n,m); | |
std::cout << "max: " << max_color() << std::endl; | |
} |
Hope you enjoyed this. -- Professor Jenkins
Check out my youtube channel https://www.youtube.com/user/gjenkinslbcc