Friday, March 13, 2015

Changing Storage in Hadoop

After I installed a HDP 2.1.2 cluster, I noticed that all the nodes were not using the drive partition planned for storage. The Linux boxes had OS partition and data data partition. Assigned during OS install one set to OS and other for data storage.

Somehow the data storage was not available on cluster installation most probably since it was not mounted. Following are the steps performed to change HDFS storage location, along with any drive configuration needed.

First format and optimized the partition or drive.
mkfs -t ext4 -m 1 -O dir_index,extent,sparse_super /dev/sdb

Create a mount directory
mkdir -p /disk/sdb1

Mount with optimized settings
mount -noatime -nodiratime /dev/sdb /disk/sdb1

Append to fstab file so that the partition is mounted on boot (very critical)
echo "/dev/sdb /disk/sdb1 ext4 defaults,noatime,nodiratime 1 2" >> /etc/fstab

Add folder for hdfs data
mkdir -p /disk/sdb1/data

Location to store Namenode data
mkdir -p /disk/sdb1/hdfs/namenode

Location to store Secondary Namenode
mkdir -p /disk/sdb1/hdfs/namesecondary

Set these in hdfs-site.xml or through Ambari
dfs.namenode.name.dir = /disk/sdb1/hdfs/namenode
dfs.namenode.name.dir = /disk/sdb1/hdfs/namesecondary
dfs.datanode.data.dir = /disk/sdb1/data

Set permissions
sudo chown -R hdfs:hadoop /disk/sdb1/data

Format namenode
hadoop namenode -format

Start namenode through ambari or CLI
hadoop namenode start

Start all nodes and services. The new drive should be listed.

References:
http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud


No comments:

Post a Comment