Making investment decisions like a data scientist

Buying an investment property is a major financial decision, given the current record low interest rates and inflated property prices there is a lot at stake when deciding the areas to invest in.The idea of this exercise is to show how we can do our own research with available data and tools in aiding this major financial decision . You dont have to be a major statistician or a computer scientist to perform this task, by using IBM Data Science Experience on cloud most of the complexities in building analytics infrastructure, managing and analyzing the data is taken out of picture. So you as the end user can focus on the problem at hand which is buying an investment property. 🙂

The following notebook was authored and exported from IBM Data Science Experience

Analyze Sydney Rental data

This notebook analyses Sydney rental price growth for the last ten years. We will compare the estimated average raise against actual rental raises experienced in the areas. We will plot the relative difference in actual and estimated rental returns.

Table of contents

  1. Data preprocessing
  2. Load data
  3. Explore historical rental growth
  4. Calculate simple moving average
  5. Calculate the variance between forecast and actual rental returns
  6. Summary

1. Data preprocessing

In this notebook, we will explore and compare the rental growth for several Sydney suburbs

The raw  data set is from NSW government rent and sales report.

I have only extracted the data related to Sydney Apartment rental prices and data dimensions has been reformatted to assist the analysis.

I have uploaded the data to IBM Bluemix Spark Object store which we will load and analyze in subsequent sections.

2. Load Data

In this section we will load the Sydney rental data set from object store and extracts it  for analysis

In [12]:
import requests, StringIO, pandas as pd, json, re

def get_file_content(credentials):
"""For given credentials, this functions returns a StringIO object containing the file content."""

url1 = ''.join([credentials['auth_url'], '/v3/auth/tokens'])
data = {'auth': {'identity': {'methods': ['password'],
'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
'password': credentials['password']}}}}}
headers1 = {'Content-Type': 'application/json'}
resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
resp1_body = resp1.json()
for e1 in resp1_body['token']['catalog']:
if(e1['type']=='object-store'):
for e2 in e1['endpoints']:
if(e2['interface']=='public'and e2['region']==credentials['region']):
url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']])
s_subject_token = resp1.headers['x-subject-token']
headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
resp2 = requests.get(url=url2, headers=headers2)
return StringIO.StringIO(resp2.content)

#Credentials for reading from the object store.
credentials = {
'auth_url':'https://identity.open.softlayer.com',
'project':'----',
'project_id':'--',
'region':'dallas',
'user_id':'--',
'domain_id':'--',
'domain_name':'1123361',
'username':'--',
'password':"""--""",
'filename':'Sydney_unit_rent.csv',
'container':'notebooks',
'tenantId':'s5ec-15c742976f006e-2278ae9ffa81'
}

content_string = get_file_content(credentials)
rental_df = pd.read_csv(content_string)
rental_df.head()
Out[12]:
Month-Of-Year GREATER SYDNEY Inner Ring Ashfield Botany Bay Lane Cove Leichhardt Marrickville
0 Mar-90 165 190 160 160 190 170 150
1 Jun-90 170 195 165 160 190 170 150
2 Sep-90 170 200 165 160 190 190 150
3 Dec-90 170 190 160 160 190 180 150
4 Mar-91 170 195 165 160 190 175 150
Set the index for the data frame which will be useful for charting

 

In [13]:
rental_df = rental_df.set_index(rental_df["Month-Of-Year"])
rental_df.drop(['Month-Of-Year'], axis=1, inplace=True)
rental_df.head()
Out[13]:
GREATER SYDNEY Inner Ring Ashfield Botany Bay Lane Cove Leichhardt Marrickville
Month-Of-Year
Mar-90 165 190 160 160 190 170 150
Jun-90 170 195 165 160 190 170 150
Sep-90 170 200 165 160 190 190 150
Dec-90 170 190 160 160 190 180 150
Mar-91 170 195 165 160 190 175 150

3. Explore data

Now lets explore the historical rental growth in the areas.

Pandas data frame has a default plot function to chart the frame data.

In [14]:
%matplotlib inline
rental_df.ix[+5:].plot(figsize=(15, 6))
Out[14]:

 

 

 

4. Simple moving average forecast

Lets calculate rolling average on the rental data, with window of 10

In [15]:

 

rental_df_mean=pd.rolling_mean(rental_df,10)
rental_df_mean.ix[+10:].head()
Out[15]:
GREATER SYDNEY Inner Ring Ashfield Botany Bay Lane Cove Leichhardt Marrickville
Month-Of-Year
Sep-92 170 195.0 162.5 160.5 188.5 178.3 150
Dec-92 170 195.0 162.0 160.5 187.5 178.3 150
Mar-93 170 194.8 161.5 160.5 186.5 177.3 150
Jun-93 170 195.8 161.0 160.5 186.0 177.8 150
Sep-93 170 196.3 160.5 160.5 186.0 178.6 150

5. Calculate the variance between forecast and actual rental returns

Lets calculate the variance between predicted rental returns v/s actual and plot the variance.

In [16]:
rental_df_var=rental_df[+10:]-rental_df_mean[+10:]
rental_df_var.ix[+10:].plot(figsize=(15, 6))
Out[16]:

5. Summary
In summary, Botany Bay experiences too much variance in rental returns ,Leichhardt comes next. The rest of the areas have had a steady rental returns. This blog is an attempt to show how various tools can be used to aid with major financial decisions in life.<br />

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s