Skip to content

Hadoop 2.6.3 dynamically add/remove DataNode


Assuming the operating system of the cluster is CentOS 6.7 x64 and the Hadoop version is 2.6.3.

Dynamically Adding DataNode

  • Prepare a new DataNode machine, configure SSH trust, and copy the authorized_keys and id_rsa files from the .ssh directory of an existing DataNode directly.
  • Copy the Hadoop running directory, HDFS directory, and temporary (tmp) directory to the new DataNode.
  • Start Hadoop on the new DataNode.
./sbin/ start datanode
./sbin/ start datanode
  • Refresh the nodes on the NameNode.
./bin/hdfs dfsadmin -refreshNodes
  • To facilitate the next startup, add the domain name and IP of the new DataNode to /etc/hosts.

Dynamically Deleting DataNode

  • Configure the hdfs-site.xml of the NameNode, appropriately reduce the dfs.replication replicas, and increase the dfs.hosts.exclude configuration.
  • Create an excludes file under the corresponding path (/etc/hadoop/), and write the IP or domain name of the DataNode to be deleted.
  • Refresh all DataNodes on the NameNode.
  • At this point, you can observe the DataNode gradually becoming Dead on the web detection interface (IP:50070).
  • After the DataNode is completely Dead, stop the Hadoop service on the DataNode.

  • Welcome to visit the knowledge base of SRE and DevOps!
  • License under CC BY-NC 4.0
  • Made with Material for MkDocs and improve writing by generative AI tools
  • Copyright issue feedback, replace # with @