Re-IPing Hadoop Nodes (Don't do it)

(I’m writing this story a bit after the fact, so events are a bit fuzzy…)

We recently discovered (read: were told by a member of our Network Engineering team that was auditing systems) that ~half our hadoop nodes had been configured in the wrong VLANs. While we didn’t need to fix it immediately, we should attempt to get the hosts moved to the correct VLAN (which meant an IP change) as soon as possible.

Hadoop hosts use DNS names for most services, so re-iping shouldn’t be too big a problem as long as DNS updates quickly. And for hosts only running simpler portions of the stack (datanodes, regionservers, yarn nodemanagers), this held true. It didn’t take much work to fix the hosts to their new IPs. The challenge came when we attempted to re-ip hosts running the namenode services and the hosts running zookeeper…

There were two major challenges in this migration:

Java caches DNS aggressively, meaning many services need a restart to pick up DNS changes even when they shouldn’t.
Zookeeper and Journalnode processes both (appear to) rely on IP addresses instead of DNS.

Addressing problem one means every process that caches too aggressively needs a restart to pick up the IP change.

Addressing problem two means that every time we change a ZK or Journalnode IP, we have to restart the entire service for the IP change to be properly picked up.

Both of these problems meant that, of the 11 hosts we had to re-ip, 7 of them would go rather smoothly (they only had “client” processes running: datanodes, regionservers, etc.). Naturally, we did most of our initial testing for the move with some of these 7 hosts. And it went fairly well! We took the opportunity to reboot the nodes to make sure kernel patches/etc. were also applied (which had the added benefit of ensuring all services were restarted on the box), and everything was going swimmingly. In fact, when we reached the first ZK node, things still went smoothly because we didn’t check the ZK logs to see that the node didn’t rejoin the cluster properly.

It wasn’t ’til we re-iped one of the namenode servers (and consequently the journalnode/hbase master processes) that we saw problems. The namenode didn’t properly re-join and we got to have a bit of a panic.

After a bit of digging, we found that the namenode wasn’t properly joining the cluster because it wasn’t talking to zookeeper correctly. This led us to finding that ZK hadn’t registered our new instances properly, and there was actually only one properly running ZK host. We restarted the other Zookeeper instances and the failing host joined the cluster.

We moved to the second namenode and found that the namenodes weren’t joining properly. This came down to the journalnodes not being able to create a proper quorum now that one of the nodes had a new IP. Restarting the journalnodes fixed this, and we could properly start up the namenode again.

This was, generally, far more stressful than I would have liked. It was nice to have people around who could help in the situation. But my primary take-away is: don’t change IPs on hadoop nodes. If you need to change the IP, it likely just as easy to change the DNS name, too.