Hadoop DataNodes with Dynamic Storage using LVM

5 min readJan 1, 2021

Hadoop is an open-source, Java-based framework used for storing and processing big data. Hadoop is used by Tech giants like Facebook to manage data and moreover, it has the largest HDFS cluster. This short article covers using DataNodes over a dynamically allocated storage using Logical Volume Management (LVM).

A Hadoop Cluster is a special type of computational cluster designed specifically for storing and analyzing huge amounts of unstructured data in a distributed computing environment. An HDFS cluster consists of a number of Datanodes and a NameNode. The cluster has a master-slave topology.

Logical volume management (LVM) is a form of storage virtualization that offers system administrators a more flexible approach to managing disk storage space than traditional partitioning.

The article focuses on the practical for creating an HDFS cluster where the data nodes shared directory will have a specific sized volume. Also, if needed the volume of the directory could be changed on-the-air. This architecture will have near-zero downtime if the size of the shared volume is to be increased or decreased.

We would be using the Oracle VirtualBox Hypervisor. For the purpose of the demonstration, we would create a single DataNode architecture. The OS used for the NameNode as well as the DataNode is Red Hat Enterprise Linux 8.

Attaching a Virtual Hard-drive to the Datanode

To create Dynamic Storage, let's attach a virtual hard drive to the Datanode. The storage we are attaching to the Virtual Machine is of 100Gb.

Create a VDI and attach it to the DataNode

Next, we boot up the Machine and Check the Hard-Disk.

Creating the Logical Volume

To create a logical volume we need to perform a series of steps:

Convert the Storage device to a physical volume and display the created physical volume

$ sudo pvcreate /dev/sdb 
  Physical volume "/dev/sdb" successfully created.

Creating and Displaying Volume Group

$ sudo vgcreate datanode_lv_vol /dev/sdb
  Volume group "datanode_lv_vol" successfully created

Creating and Displaying Logical Volume

A Logical volume is created by creating a partition in the Volume Group. We can make any number of the partition using the Volume Group, unlike the Physical Partition which can have at most 4 partitions.

$ sudo lvcreate --size 50G --name vol_01 datanode_lv_vol
  Logical volume "vol_01" created.

If we now take a look at the Volume Group again, we can see that the Current LV attached is 1.

Format the newly created Logical Volume.

IN order to insert data into any partition, we need to format the partition. We will be using the ext4 file system to format the logical partition.

$ mkfs.ext4 /dev/datanode_lv_vol/vol_01

Our Logical Volume is now ready to be mounted on the shared directory of the data node.

Create a directory and mount the hard-disk on the directory

It is assumed that the Machine is the first time being configured as a DataNode. Let’s create a directory to store the data of the client when pushed.

$ sudo mkdir -pv /servera_data
mkdir: created directory '/servera_data'

Configure Hadoop to store data in the /servera_data directory

This is the configuration file locating the directory to store the data in the data node. Set the configuration value to the /servera_data directory.

And that’s it! Let’s connect the Datanode to the cluster.

$ sudo hadoop-daemon.sh start datanode
starting datanode, logging to /var/log/hadoop/servera/hadoop-servera-datanode-datanode.out

Checking the admin-report of the cluster

Let’s now try increasing the size of the volume to 80Gigs.

Increase the size of the volume

$ sudo lvextend --size +30G /dev/datanode_lv_vol/vol_01
  Size of logical volume datanode_lv_vol/vol_01 changed from 50.00 GiB (12800 extents) to 80.00 GiB (20480 extents).
  Logical volume datanode_lv_vol/vol_01 successfully resized.

Resize the partition

$ sudo resize2fs /dev/datanode_lv_vol/vol_01 
resize2fs 1.44.3 (10-July-2018)
Filesystem at /dev/datanode_lv_vol/vol_01 is mounted on /servera_data; on-line resizing required
old_desc_blocks = 9, new_desc_blocks = 10
The filesystem on /dev/datanode_lv_vol/vol_01 is now 20971520 (4k) blocks long.

Let’s check the admin report again,

Check that the size of the storage had increased by almost 30 Gb without even restarting the service. Using the resize2fs command we just format and append the storage on the device which has not been written yet, this way we do not lose any of the data. We can also reduce the size of the partition in a similar fashion. This is how we create a Hadoop DataNodes with Dynamic Storage. This method is very useful if we are not sure of the scale of the data that would be pushed by the client.

Thank You!

Hadoop DataNodes with Dynamic Storage using LVM

Attaching a Virtual Hard-drive to the Datanode

Creating the Logical Volume

Configure Hadoop to store data in the /servera_data directory

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Prathamesh Mistry

No responses yet

More from Prathamesh Mistry

Confusion Matrix — Case Study

The following article will help you get over the confusion created due to Confusion Matix in a Classification Problem.

Catch ’em all with Kubernetes: Pokemon Go case Study

For all the millennials out here, pokemon has been close to our hearts since childhood. As Pokemon was made by Nintendo Co., Pokemon…

AWS Case Study: SUPERCELL

If you are a Start-up or an Enterprise, Cloud computing has things for everyone. Cloud computing is the on-demand availability of computer…

Machine Learning Model inside a Docker Container

The need for Integration of Technologies from different domains could be witnessed more than ever since the past decade. These fields which…

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Understanding Jobs, CronJobs, Etcd, and Restart Policies in Kubernetes

Kubernetes is a powerful container orchestration platform that supports diverse workload types, including one-time jobs, recurring tasks…

Lists

General Coding Knowledge

Staff picks

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

SpaceX Has Finally Figured Out Why Starship Exploded, And The Reason Is Utterly Embarrassing

This should never have happened.

Goodbye Obsidian