B-Tree
Introduction
A B-Tree is a self-balancing multi-way search tree in which every node can contain multiple keys and multiple child nodes. Unlike a Binary Search Tree, where every node has at most two children, a B-Tree allows many children, making it highly efficient for handling very large datasets.
B-Trees are widely used in database management systems, file systems, and storage devices because they minimize disk accesses while maintaining sorted data.
B-Tree Illustration
Figure 1 : Structure of a B-Tree of Order 4
Why Do We Need a B-Tree?
When millions of records are stored in databases, ordinary Binary Search Trees become inefficient because their height may increase significantly. A B-Tree stores multiple keys in one node, reducing the tree height and minimizing disk reads.
| Binary Search Tree | B-Tree |
|---|---|
| Maximum 2 children. | Multiple children. |
| Large height. | Small height. |
| Used mainly in memory. | Used for external storage. |
| Less efficient for databases. | Ideal for databases. |
Properties of a B-Tree
| Property | Description |
|---|---|
| Balanced Tree | All leaf nodes are at the same level. |
| Multiple Keys | Each node may contain multiple keys. |
| Multiple Children | A node may have more than two child nodes. |
| Sorted Keys | Keys inside every node remain sorted. |
| Efficient Operations | Searching, insertion and deletion require logarithmic time. |
Terminology
| Term | Description |
|---|---|
| Order (m) | Maximum number of child nodes allowed. |
| Root Node | The topmost node of the B-Tree. |
| Internal Node | A node having one or more child nodes. |
| Leaf Node | A node without children. |
| Key | A value stored inside a node. |
Solved Example
Construct a B-Tree of Order 4 by inserting the following keys:
Initially, the tree is empty. Keys are inserted one by one. Whenever a node becomes full, it is split into two nodes and the middle key is promoted to the parent node. This process continues until all keys are inserted while maintaining the balanced structure of the B-Tree.
Working Principle
- Search the correct leaf node.
- Insert the new key in sorted order.
- If the node overflows, split the node.
- Move the middle key to the parent.
- If the parent also overflows, continue splitting upward.
- The root may split and create a new level.
B-Tree Insertion Algorithm
Insert(Key)
Step 1 : Search the appropriate leaf node.
Step 2 : Insert the key in sorted order.
Step 3 : If the node is not full,
Insertion is complete.
Step 4 : If the node becomes full,
Split the node into two parts.
Step 5 : Move the middle key to the parent node.
Step 6 : Repeat the splitting process if the parent also overflows.
Step 7 : If the root splits,
Create a new root.
Searching in a B-Tree
Searching starts from the root node. The required key is compared with all keys present in the current node. If the key is found, the search terminates. Otherwise, the algorithm follows the appropriate child pointer and repeats the process until the key is found or a leaf node is reached.
Insertion Example
| Step | Operation |
|---|---|
| 1 | Insert 10. |
| 2 | Insert 20. |
| 3 | Insert 5. |
| 4 | Insert 6 (Node Overflow). |
| 5 | Split the node and promote key 10. |
| 6 | Continue inserting remaining keys. |
Node Splitting Illustration
Figure 2 : Splitting a Full Node in B-Tree
Searching Example
Suppose the key 17 is to be searched.
- Start from the Root node.
- Compare 17 with available keys.
- Move to the appropriate child.
- Continue until the key is found.
- Searching finishes successfully.
Complexity Analysis
| Operation | Time Complexity | Explanation |
|---|---|---|
| Searching | O(log n) | Tree height remains very small. |
| Insertion | O(log n) | May require node splitting. |
| Deletion | O(log n) | May require merge or borrowing. |
| Traversal | O(n) | Visits every key. |
| Space Complexity | O(n) | Storage for all keys. |
Advantages of B-Tree
- Maintains a balanced tree automatically.
- Supports efficient searching, insertion and deletion.
- Reduces disk accesses.
- Suitable for storing millions of records.
- Height remains very small even for huge datasets.
Limitations of B-Tree
- Implementation is more complex than Binary Search Tree.
- Node splitting and merging increase implementation complexity.
- Consumes extra memory for child pointers.
- Not suitable for very small datasets.
Applications of B-Tree
| Application | Description |
|---|---|
| Database Systems | Database indexing. |
| File Systems | NTFS, FAT and Ext file systems. |
| Search Engines | Large-scale indexing. |
| Cloud Storage | Fast data retrieval. |
| Operating Systems | Directory management. |
B-Tree vs Binary Search Tree
| B-Tree | Binary Search Tree |
|---|---|
| Multi-way tree. | Binary tree. |
| Multiple keys per node. | One key per node. |
| Balanced. | May become skewed. |
| Small height. | Height depends upon insertion order. |
| Preferred for databases. | Preferred for in-memory searching. |
B-Tree vs B+ Tree
| B-Tree | B+ Tree |
|---|---|
| Keys and records stored in all nodes. | Records stored only in leaf nodes. |
| Leaf nodes are not linked. | Leaf nodes are linked together. |
| Searching is efficient. | Range searching is more efficient. |
| Used in databases. | Widely used in modern databases. |
Summary
A B-Tree is a balanced multi-way search tree that stores multiple keys in each node. It minimizes tree height and significantly reduces disk access, making it highly suitable for database management systems and file systems. By splitting overflowing nodes and promoting middle keys, the B-Tree maintains balance and guarantees efficient searching, insertion and deletion in O(log n) time.