Rowanto Luo


Just another blog. or log.


Why Datanode is Denied Communication With Namenode

So for those trying to setup HDFS out there, and are struggling with this kind of error where it said datanode denied communication with namenode:

2015-05-07 08:04:19,694 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1184863888-172.17.0.3-1430984962919 (Datanode Uuid null) service to dockernamenode.zanox.com/172.17.0.3:8020 beginning handshake with NN
2015-05-07 08:04:19,697 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1184863888-172.17.0.3-1430984962919 (Datanode Uuid null) service to dockernamenode.zanox.com/172.17.0.3:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=172.17.0.5, hostname=172.17.0.5): DatanodeRegistration(0.0.0.0, datanodeUuid=f8c97ad9-3cb3-49a1-9e4e-bc2341779f19, infoPort=50075, ipcPort=50020, storageInfo=lv=-56;cid=CID-9ab30cb5-ada8-4201-9d94-6400e83af6e4;nsid=1471492370;c=0)
    at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:904)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5042)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1069)
    at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
    at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27329)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

It could actually be because of many configurations problem, but in the end it boils to one thing and that's what I am going to tell you. It costed me one day just to fix this problem when I was running test-kitchen to provision hadoop servers.

So we have a namenode up and running, now we boot up a datanode. We have the defaultFS correct, so the datanode knows where the namenode is. It tries to connect to namenode. The connection happens through IP, so namenode only see the datanode's ip. The problem is, in this process. Namenode has a list of blacklisted hostnames, which should not connect to it. This list can be empty, but namenode will keep checking it anyway, so it will try to do a reverse dns lookup to see which hostname the ip has. If it fails, then namenode will throw the exception you saw, and this it the source of all problems.

So basically, we have a few options to fix this:
1. Fix the reverse dns thing in namenode, so that it could do the reverse dns lookup properly.
2. Think that it doesn't make sense to block anything in your case and turn off the checking.

Well, if your cluster is in a private network (which usually is), what is the chance that the datanode which is trying to connect to your namenode is not your datanodes? Not big. So let's just turn it off.

You will have to add this setting in namenode's hdfs-site.xml

<property>
  <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
  <value>false</value>
</property>

If for whatever reason, you want to still have the check. You could put an entry of the ip and hostname of the datanode in /etc/hosts in whichever machine your namenode is running.

It was really confusing for me at first, because it is not clear whether denied communication was a result of misconfiguration in datanode or namenode. There are some stackoverflow entry about this, but they are single machine setup, and confused me even more because the solution is not working. I hope if you are experiencing same trouble, this post has helped you to understand a bit of the picture.