Hadoop file upload utility for secure clusters running on cloud using webhdfs and Knox gateway.

Hadoop clusters hosted on a secure cloud, does not allow direct connections to HDFS ports. Instead connections from outside world applications has to be routed via a  gateway like Knox.

Apache Knox Gateway provides authentication support and a single rest interface to access several Bigdata services namely HDFS ,AMBARI , HIVE . Hence Knox is a perfect choice for routing the external traffic to Bigdata clusters.

External applications, can connect to the Knox Gateway using any Http client and interact with Hadoop by invoking REST API Calls. More details about the Webhdfs is documented in Apache Hadoop documentation
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html

In this article I have made an attempt to show users how to build their own upload manager for uploading files to HDFS. The logic can be embedded in any desktop or mobile application allowing users to interact with their Bigdata cluster remotely .

File upload utility.

The utility uses Apache HttpClient library , release 4.5.3 and supports file sizes of upto 5 MB.

The project can be downloaded from my git repo https://github.com/bharathdcs/hadoop_fileuploader

The Application expects a properties file as input , the format and a sample is shown following

knoxHostPort=bluemixcluster.ibm.com:8443
knoxUsername=guest
knoxPassword=guest-password
hdfsFileUrl=/tmp/hdfsfile.txt
dataFile=/Users/macadmin/Desktop/input.txt

Most of the parameters are self explanatory .

The file is created with default permission of world writeable if a different permissions is desired pass the octal value using the following parameter in the properties file

hdfsFilePermission=440

The steps to run the application is as follows .

mvn exec:java -Dexec.mainClass="twc.webhdfs.App" -Dexec.args="/Users/macadmin/Desktop/input.properties"

Once finished you should see the following message confirming the file creation

File creation successful