It is a recommended practice to deploy big data cluster in a multi homed network. Multi homing is process of connecting a node to two different networks. The big data nodes are connected to a private network and we will have a management node interfacing with both private and public/corporate network as shown in following figure
There are two reasons for multi-homing Bigdata cluster
1. To reduce the congestion of corporate network since big data jobs are network intensive.
2. By isolating the network, data flow is secured within the private network.
The client components are deployed on management node so that end user can login and run the analytics.
Services running on management node can bind to both the interfaces like Zookeeper 0.0.0.0 or it can bind to private interface like Hbase master running on 22.214.171.124 making it inaccessible from public network unless you login to management node.
Lets take a scenario where hbase java client from corporate/public network wants to connect to HBase master bound to private network interface, the connection fails since client has no visibility to private network.
To overcome this drawback we can enable port forwarding technique on management node using iptables command .
The following command has been tested working on Redhat Linux 6.x
iptables -t nat -A PREROUTING -i eth1 -p tcp –dport 60000 -j DNAT –to 172.16.150.173:60000
The command redirects traffic from public ethernet interface eth1 port 60000 onto private interface or IP 172.16.150.173.
If you are not aware which is the public interface use the following command to identify,
ip route show | grep default