Most of the real world price models use weather as one of the dimensions. Weather patterns can influence the pricing decisions for commodities. But how do you obtain the weather data in realtime to use it in spark analytics or persist for offline BI analysis .
In this article I will be discussing about IBM Weather Company API from Bluemix and how to read the data from the API for realtime analytics in Spark .
IBM Bluemix offers a free subscription plan for Weather Company API https://console.ng.bluemix.net/catalog/ .
TWC API has various REST Urls one can use to acccess weather data releated to particular geo location, post code and so on.. More details is documented here .
Reading the JSON string in Spark Streaming.
Spark streaming does not have out of box support for reading JSON from remote REST API. Hence users should create a custom adapter . Custom adapters should extend Receiver class from Spark streaming and override start() and stop() function .
Once the adapter is created we have to use the same in SparkStreamingContext as shown in following snippet
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, Durations.seconds(20));
JavaReceiverInputDStream<String> lines = ssc.receiverStream(new WcReceiver(args));
Here WCReceiver is my custom HTTP adapter to read weather json data. The complete project is available in my git repo
The JSON data is further persisted to HDFS so that external bigsql or hive tables can be mounted on it and used as a dimension for analytics .
Steps to run the project
- Compile the program using following maven command
mvn compile package
- Copy the generated jar file from target folder to spark cluster
- To test the program launch it using spark-submit local mode.The program accepts two parameters Weather data API URL and HDFS output folder as shown following.
spark-submit –class “spark.wc.AnalyzeWeather” –master local wc-0.0.1-SNAPSHOT.jar <weather data url> /tmp/output.