Data structures are a specialized means of organizing and storing data on computers in such a way that we can perform operations on stored data more efficiently. Data structures have a wide and diverse scope of use in the fields of computer science and software engineering.
Data structures are used in almost every software program or system that has been developed. In addition, data structures are included in the fundamentals of computer science and software engineering. It’s a key issue when it comes to Software Engineering interview questions. Therefore, as developers, we must have a good knowledge about data structures.
In this article, I will briefly explain 8 commonly used data structures that every programmer should know.
An array is a fixed-size structure, which can contain elements of the same data type. It can be an array of integers, an array of floating-point numbers, an array of strings, or even an array of arrays (such as 2-dimensional arrays). The arrays are indexed, which means that random access is possible.
- Traverse: Walk through the elements and print them
- Search: Searches for an element in the array. You can search
- : Update the value of an existing item at a particular index
.
for the item by its value or its index Update
Inserting elements into an array and deleting elements from an array cannot be done immediately, because arrays have a fixed size. If you want to insert an element into an array, you will first need to create a new array with larger size (current size + 1), copy the existing elements, and add the new element. The same goes for deleting with a new small-sized array.
Matrix Applications
It is used as
- building blocks to construct other data structures, such as lists of arrays, heaps, hash tables, vectors, and arrays
- It is used for different sorting algorithms, such as insertion sorting, quick sorting, bubble sorting, and merge sorting.
.
A linked list It is a sequential structure consisting of a sequence of elements in linear order that are linked together. Therefore, you must access the data sequentially and random access is not possible. Linked lists provide a simple and flexible representation of dynamic sets.
Consider the following terms with respect to linked lists. You can get a clear idea by referring to Figure 2.
Items in a linked list are known as
- nodes
- contains a key and a pointer to its successor node, known as next
- attribute named head points to the first
- The last item in the linked list is known as the queue.
. Each node
. The
item in the linked list.
Below are the various types of linked lists available
. Individually linked list
- : The traversal of items can only be done in the forward direction.
- Doubly linked list : the path of the elements can be done both forward and backward. The nodes consist of an additional pointer known as a prev, which points to the previous node.
- Linked circular lists: Linked lists where the front pointer of the head points to the tail and the next pointer of the tail points to the head.
Linked list operations
Find: Finds the first item with the key k in the given linked
- list using a simple linear lookup and returns a pointer to this item Insert: Inserts a key into the linked
- list. An insertion can be done in 3 different ways; Insert at the beginning of the list, insert at the end of the list, and insert in the middle of the list.
- Delete: Deletes an x item from a given linked list. You cannot delete a node in one step. A removal can be done in 3 different ways; Delete from top of list, delete from bottom of list, and delete from center list.
Linked list applications
It
- is used for the management of symbol tables in the design of the compiler. It
- used to switch between programs using Alt + Tab (implemented using Circular Linked List).
is
A stack is a LIFO (Last In First Out) structure that can be commonly found in many programming languages. This structure is called a “stack” because it resembles a real-world stack, a stack of plates.
Stack
operations
Below are the 2 basic operations that can be performed on a stack. See Figure 3 to better understand stack operations.
- Push: Inserts an item at
- Pop: Delete the top item and return it.
the top of the stack.
In addition, the following additional functions are provided for a stack to check its status
.
- Peek: Returns the top item in the stack without deleting it
- isEmpty: Check if the stack is empty
- isFull: Check if the stack is full.
.
.
Stack applications
It is used for the evaluation of expressions (for
- example: yard algorithm for analyzing and evaluating mathematical expressions).
- It is used to implement function calls in recursion
programming.
A queue is a FIFO (First In First Out) structure that can be commonly found in many programming languages. This structure is called a “queue” because it resembles a real-world queue: people waiting in a queue.
operations
Below are the 2 basic operations that can be performed on a queue. See Figure 4 to better understand queue operations.
- Enqueue: Inserts an item at the end of the queue. Dequeue:
- Deletes the item from the beginning of the queue.
Applications
- It is used to manage threads in multithreading
- Used to implement queuing systems (for example: priority queues).
.
A hash table is a data structure that stores values that have keys associated with each of them. In addition, it supports search efficiently if we know the key associated with the value. Therefore, it is very efficient in insertion and search, regardless of the size of the data.
Direct addressing uses one-to-one mapping between values and keys when stored in a table. However, there is a problem with this approach when there are a large number of key-value pairs. The table will be huge with so many registers and may be impractical or even impossible to store, given the memory available on a typical computer. To avoid this problem we use hash tables.
Hash
function
A special function called a hash function (h) is used to overcome the problem mentioned above in direct addressing
.
In the shortcut, a value with a k key is stored in the k slot. Using the hash function, we calculate the index of the table (slot) to which each value goes. The value calculated using the hash function for a given key is called the hash value that indicates the index of the table to which the value is assigned.
h(k) = k %
m
- h: Hash function
- k: Key whose hash value must be determined
- m: Size of the hash table (number of available slots). A prime value that is not close to an exact power of 2 is a good choice for m.
)Consider the hash
function h(k) = k % 20, where the size of the hash table is 20. Given a set of keys, we want to calculate the hash value of each to determine the index where it should go in the hash table. Consider that we have the following keys, the hash and the index of the hash table.
- 1 → 1%20 → 1
- 5 → 5%20 → 5
- 23 → 23%20 → 3
- 63 → 63%20 → 3
From the last two examples given above, we can see that collision can arise when the hash function generates the same index for more than one key. We can solve collisions by selecting a suitable h hash function and using techniques such as chaining and open addressing.
Hash table applications
Used to implement
- database indexes
- associative arrays
- Used to implement the “set”
. Used to implement
.
data structure.
A tree is a hierarchical structure where data is organized hierarchically and linked together. This structure is different from a linked list, whereas in a linked list, items are linked in a linear order.
Several types of trees have been developed over the last decades, in order to adapt to certain applications and meet certain limitations. Examples include the binary search tree, the B-tree, the treap, the red-black tree, the splay tree, the AVL tree, and the n-ary tree.
A
binary search tree (BST), as the name suggests, is a binary tree where data is organized into a hierarchical structure. This data structure stores values in ordered order.
Each node in a binary search tree comprises the following attributes.
key: The value stored in the node. left: the pointer to the left child. right: the pointer to the
- correct child.
- p: the pointer to the parent node
- .
A binary search tree exhibits a unique property that distinguishes it from other trees. This property is known as the binary-search-tree property.
Let x be a node in a binary search tree.
If y is a node in the left subtree of x, then y.key ≤ x.key
- If y is a node in the right subtree of x, then
- y.key ≥ x.key
Applications
Binary trees: Used
- to implement expression parsers and expression solvers.Binary
- tree: Used in many search applications where data is constantly coming in and out.
- Heaps – Used by JVM (Java Virtual Machine) to store Java objects.
- Treaps: used in wireless networks.
search
Check out my articles below on 8 useful tree data structures and self-balancing binary search trees.
A lot is a special case of a binary tree where parent nodes are compared with their children with their values and organized accordingly.
Let’s see how we can represent heaps. Heaps can be represented using trees and matrices. Figures 7 and 8 show how we can represent a binary heap using a binary tree and a matrix.
Heaps can be of 2 types
.
- Min Heap — the father’s key is less than or equal to that of his children. This is called the min-heap property. The root will contain the minimum value of the heap.
- Max Heap — the father’s key is greater than or equal to those of his children. This is called the max-heap property. The root will contain the maximum value of the heap.
- Heaps Applications Used in the heapsort algorithm
- It is used to implement priority queues, because priority values can be sorted according to the heap property where the heap can be deployed using an array
- Queue functions can be implemented using heaps within O(log n) time.
- It is used to find the smallest (or largest) k-th value in a given matrix.
.
.
See my article below on implementing a heap using the python heapq module.
A graph consists of a finite set of vertices or nodes and a set of edges connecting these vertices.
The order of a graph is the number of vertices in the graph. The size of a chart is the number of edges in the chart.
Two nodes are said to be adjacent if they are connected to each other by the same edge.
Directed graphs
A graph G is said to
be a directed graph if all its edges have a direction indicating
which is the initial vertex and which is the final vertex.
We say that (u, v) is incident from or leaves vertex you and is incident to or enters vertex v.
Auto loops: edges of a vertex to itself.
Undirected graphs
A graph G is said to be an undirected graph if all its edges have no direction. It can go both ways between the two vertices.
If a vertex is not connected to any other node in the graph, it is said to be isolated.
You can read more about charting algorithms in my article 10 Graphing Algorithms Explained Visually
.
Graphics applications
- It is used to represent social networks. Each user is a vertex, and when users connect they create an edge.
- It is used to represent web pages and links by search engines. Web pages on the Internet are linked to each other by hyperlinks. Each page is a vertex, and the hyperlink between two pages is an edge. It is used for page ranking in Google.
- It is used to represent locations and routes in GPS. Locations are vertices, and the paths connecting locations are edges. It is used to calculate the shortest route between two locations.
A cheat sheet for the time complexities of data structure operations can be found at this link. Also, check out my article below where I’ve implemented some common data structures from scratch using C++.
Finally, I would like to thank Mr. A Alkaff Ahamed for providing valuable comments and suggestions to improve this article.
I hope you found this article useful as a simple introduction to data structures. I’d love to hear your thoughts. 😇
Thank you very much for reading.
Cheers! 😃
[
1] Introduction to Algorithms, Third Edition by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein.
😊
[2] List of Wikipedia data structures (https://en.wikipedia.org/wiki/List_of_data_structures)