Skip to content

Hadoop 2.6.3 dynamically add/remove DataNode

Assuming the operating system of the cluster is CentOS 6.7 x64 and the Hadoop version is 2.6.3.

Dynamically Adding DataNode

Prepare a new DataNode machine, configure SSH trust, and copy the authorized_keys and id_rsa files from the .ssh directory of an existing DataNode directly.
Copy the Hadoop running directory, HDFS directory, and temporary (tmp) directory to the new DataNode.
Start Hadoop on the new DataNode.

./sbin/hadoop-daemon.sh start datanode
./sbin/yarn-daemon.sh start datanode

Refresh the nodes on the NameNode.

./bin/hdfs dfsadmin -refreshNodes
./sbin/start-balancer.sh

To facilitate the next startup, add the domain name and IP of the new DataNode to /etc/hosts.

Dynamically Deleting DataNode

Configure the hdfs-site.xml of the NameNode, appropriately reduce the dfs.replication replicas, and increase the dfs.hosts.exclude configuration.

<property>
    <name>dfs.hosts.exclude</name>
    <value>/usr/local/hadoop2/etc/hadoop/excludes</value>
</property>

Create an excludes file under the corresponding path (/etc/hadoop/), and write the IP or domain name of the DataNode to be deleted.
Refresh all DataNodes on the NameNode.
At this point, you can observe the DataNode gradually becoming Dead on the web detection interface (IP:50070).
After the DataNode is completely Dead, stop the Hadoop service on the DataNode.

Leave a message