Skip to content

Hadoop 2.6.3 dynamically add/remove DataNode

homepage-banner

Assuming the operating system of the cluster is CentOS 6.7 x64 and the Hadoop version is 2.6.3.

Dynamically Adding DataNode

  • Prepare a new DataNode machine, configure SSH trust, and copy the authorized_keys and id_rsa files from the .ssh directory of an existing DataNode directly.
  • Copy the Hadoop running directory, HDFS directory, and temporary (tmp) directory to the new DataNode.
  • Start Hadoop on the new DataNode.
./sbin/hadoop-daemon.sh start datanode
./sbin/yarn-daemon.sh start datanode
  • Refresh the nodes on the NameNode.
./bin/hdfs dfsadmin -refreshNodes
./sbin/start-balancer.sh
  • To facilitate the next startup, add the domain name and IP of the new DataNode to /etc/hosts.

Dynamically Deleting DataNode

  • Configure the hdfs-site.xml of the NameNode, appropriately reduce the dfs.replication replicas, and increase the dfs.hosts.exclude configuration.
<property>
    <name>dfs.hosts.exclude</name>
    <value>/usr/local/hadoop2/etc/hadoop/excludes</value>
</property>
  • Create an excludes file under the corresponding path (/etc/hadoop/), and write the IP or domain name of the DataNode to be deleted.
  • Refresh all DataNodes on the NameNode.
  • At this point, you can observe the DataNode gradually becoming Dead on the web detection interface (IP:50070).
  • After the DataNode is completely Dead, stop the Hadoop service on the DataNode.
Leave a message