Custom Spark Streaming Adapter for reading Weather data from IBM The Weather Company API.

Most of the real world price models use weather as one of the dimensions. Weather patterns can influence the pricing decisions for commodities. But how do you obtain the weather data in realtime to use it in spark analytics or persist for offline BI analysis .

In this article I will be discussing about IBM Weather Company API from Bluemix and how to read the data from the API for realtime analytics in Spark .

IBM Bluemix offers a free subscription plan for Weather Company API https://console.ng.bluemix.net/catalog/ .

Screen Shot 2017-03-16 at 3.58.15 pm

TWC API has various REST Urls one can use to acccess weather data releated to particular geo location, post code and so on.. More details is documented  here .

Reading the JSON string in Spark Streaming.

Spark streaming does not have out of box support for reading JSON from remote REST API. Hence users should create a custom adapter . Custom adapters should extend Receiver class from Spark streaming and override start() and stop() function .

Once the adapter is created we have to use the same in SparkStreamingContext as shown in following snippet

JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(20));

JavaReceiverInputDStream<String> lines = ssc.receiverStream(new WcReceiver(args[0]));

Here WCReceiver is my custom HTTP adapter to read weather json data. The complete project is available in my git repo

The JSON data is further persisted to HDFS so that external bigsql or hive tables can be mounted on it and used as a dimension for analytics .

Steps to run the project

  • Compile the program using following maven command

           mvn compile package

  • Copy the generated jar file from target folder to spark cluster
    target/wc-0.0.1-SNAPSHOT.jar
  • To test the program launch it using spark-submit local mode.The program accepts two parameters Weather data API URL and HDFS output folder as shown following.

spark-submit –class “spark.wc.AnalyzeWeather” –master local[4] wc-0.0.1-SNAPSHOT.jar  <weather data url> /tmp/output.

 

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s