Plotting recent house price trends with R

IT IS TIME FOR AN UPDATE ON HOUSE PRICE TRENDS AROUND THE UNITED STATES.

I have been experimenting with some new visualizations and updating some old favorites. Let’s collect them here.

This post will be an extension of my Visual Meditations on House Prices series from last year. Check out those posts for additional visualizations.

Data

We’ll use the recently updated Freddie Mac House Price Index (link to source) data and use R to create some plots.

You can download spreadsheets containing (.xls links) U.S. national and state and metro area trends. Let’s presume you’ve saved those data in a folder, I call mine data.

We’ll use readxl to wrangle some data in Excel spreadsheets. See this post for more examples using readxl with global house price data.

Also, a little later we will be combining our house price data with unemployment data from the U.S. Bureau of Labor Statistics. In order to merge our data, we’ll need a simple crosswalk file (.txt) file:

Let’s load the state and national data and plot recent trends.

#####################################################################################
## Load Libraries ##
#####################################################################################
library(data.table)
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:data.table':
## 
##     first, last
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## between():   dplyr, data.table
## filter():    dplyr, stats
## first():     dplyr, xts, data.table
## lag():       dplyr, stats
## last():      dplyr, xts, data.table
## transpose(): purrr, data.table
library(viridis)
## Loading required package: viridisLite
library(readxl)
library(cowplot)
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
library(ggbeeswarm)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggjoy)
library(forcats)
library(kandinsky)
library(geofacet)

#####################################################################################
## Load data ##
#####################################################################################


df<-read_excel("data/states.xls", 
               sheet = "State Indices",
               range="B6:BB516" )

df$date<-seq.Date(as.Date("1975-01-01"),as.Date("2017-06-01"),by="1 month")

#####################################################################################
## Manipulate data data ##
#####################################################################################
df.state<-df %>% gather(geo,hpi,-date) %>% mutate(type="state",state=geo) %>%
  group_by(geo) %>%
  mutate(hpa=hpi/shift(hpi,12)-1,
         hpa1=hpi/shift(hpi,1)-1,
         hpilag12=shift(hpi,12,fill=NA),
         hpimax12=rollmax(hpi,13,align="right",fill=NA),
         hpimin12=-rollmax(-hpi,13,align="right",fill=NA)) %>% ungroup() %>%
  group_by(date) %>%
  mutate(us.hpa=hpa[geo=="United States not seasonally adjusted"],
         us.hpi=hpi[geo=="United States not seasonally adjusted"]) %>%
  ungroup() %>% mutate(up=ifelse(hpa>us.hpa,hpa,us.hpa),
                       down=ifelse(hpa<=us.hpa,hpa,us.hpa),
                       dlabel=paste(as.character(date,format="%B-%Y")," \n ")) %>% 
  ungroup() %>%
  filter( !( state %in% c("United States not seasonally adjusted",
                         "United States seasonally adjusted"))) %>%
  group_by(state) %>% mutate(id = row_number()) %>% ungroup()

#####################################################################################
## Create U.S. only data ##
#####################################################################################

df.us<-df %>% select("United States seasonally adjusted",date) %>% 
  gather(geo,hpi,-date) %>% mutate(type="US",state=geo) %>% 
  mutate(hpa=hpi/shift(hpi,12)-1,         hpa1=hpi/shift(hpi,1)-1)

#####################################################################################
## Plot U.S. trends ##
#####################################################################################
plot.hpi<-
  ggplot(data=filter(df.us,year(date)>1999), aes(x=date,y=hpi,label=round(hpi,0)))+
  geom_line()+theme_minimal()+
  labs(x="", y="",
       title="U.S. house price index",
       subtitle="House price index: Dec 2000 = 100, seasonally adjusted",
       caption="@lenkiefer Source: Freddie Mac House Price Index")+
  theme(plot.title=element_text(size=14,face="bold"),
        plot.subtitle=element_text(size=10,face="italic"),
        plot.caption=element_text(hjust=0,size=8),
        axis.ticks.length=unit(0.25,"cm")) + 
  geom_point(data=tail(df.us,1),color="red",size=3,alpha=0.82)+
    geom_hline(yintercept=tail(df.us,1)$hpi,color="red",linetype=2)

plot.hpa<-
  ggplot(data=filter(df.us,year(date)>1999), aes(x=date,y=hpa,label=round(hpa,0)))+
  geom_line()+theme_minimal()+
  scale_y_continuous(labels=percent)+
  labs(x="", y="",
       title="12-month percent change",
       subtitle="U.S. House price index",
       caption="")+
  theme(plot.title=element_text(size=14,face="bold"),
        plot.subtitle=element_text(size=10,face="italic"),
        plot.caption=element_text(hjust=0,size=8),
        axis.ticks.length=unit(0.25,"cm")) + 
  geom_point(data=tail(df.us,1),color="red",size=3,alpha=0.82)+
    geom_hline(yintercept=tail(df.us,1)$hpa,color="red",linetype=2)+
    geom_hline(yintercept=0,color="black",linetype=2)

plot_grid(plot.hpi, plot.hpa)

Note that I used the cowplot library to arrange multiple plots.

From this chart we can see that house prices are above their pre-recession peak and that the rate of growth is approaching seven percent on an annual basis. How do trends look at the state level? And how do they compare to the U.S. trend we plotted?

First, let’s look at state house price appreciation. We’ll create two small multiples, first a rectangular grid of state appreciation. Then we’ll use geo_facet for different layout. We’ll also use the ggjoy package for some sweet gradient shading.

df.state$statef<-fct_reorder(as.factor(df.state$state),
                              df.state$hpa,fun=last,.desc = T)

ggplot(data=filter(df.state,state !="DC" & year(date)>1999), 
         aes(x=date,y=hpa,label=round(hpa,0)))+
  geom_line()+theme_minimal()+
    geom_ridgeline_gradient(aes(fill=hpa,y=0,height=hpa),min_height=-10)+
    scale_fill_viridis(option="C")   +
    facet_wrap(~statef,ncol=5)  +
  scale_y_continuous(labels=percent)+
  labs(x="", y="",
       title="12-month percent change in house prices",
       subtitle="House price index: Dec 2000 = 100, seasonally adjusted",
       caption="@lenkiefer Source: Freddie Mac House Price Index through June 2017")+
  theme(plot.title=element_text(size=14,face="bold"),
                legend.position="none",
        plot.subtitle=element_text(size=10,face="italic"),
        plot.caption=element_text(hjust=0,size=8),
        axis.ticks.length=unit(0.25,"cm")) +
    geom_hline(yintercept=0,color="black",linetype=2)

Now let’s try a geo facet layout:

ggplot(data=filter(df.state, year(date)>1999), 
         aes(x=date,y=hpa,label=round(hpa,0)))+
  geom_line()+theme_minimal()+
    geom_ridgeline_gradient(aes(fill=hpa,y=0,height=hpa),min_height=-10)+
    scale_fill_viridis(option="C")   +
  scale_y_continuous(labels=percent)+
  scale_x_date(date_labels="%y",date_breaks="5 years")+
  labs(x="", y="",
       title="House price appreciation by state: Jan 2000 - Jun 2017",
       subtitle="12-month percent change in house price index",
       caption="@lenkiefer Source: Freddie Mac House Price Index through June 2017")+
  theme(plot.title=element_text(size=14,face="bold"),
        legend.position="none",
        plot.subtitle=element_text(size=10,face="italic"),
        plot.caption=element_text(hjust=0,size=8),
        axis.ticks.length=unit(0.25,"cm")) +
    geom_hline(yintercept=0,color="black",linetype=2)+
    facet_geo(~ state, grid = "us_state_grid2")

Joyswarm

Let’s now turn to metro area data and make a joyswarm plot (see the post for more on joyswarm plots). We can get monthly house price indices for over 300 metro areas. They are stored in two separate worksheets in the metro spreadsheet. Let’s load the data, compute 12-month appreciation rates by metro area and plot a joyswarm plot.

###############################################################################
#### Load metro data
###############################################################################
df2<-read_excel("data/msas_new.xls", 
                sheet = "MSA Indices A-L",
               range="B6:HG516" )
df2$date<-seq.Date(as.Date("1975-01-01"),as.Date("2017-06-01"),by="1 month")

df3<-read_excel("data/msas_new.xls", 
                sheet = "MSA Indices M-Z",
                range="B6:FM516" )
df3$date<-seq.Date(as.Date("1975-01-01"),as.Date("2017-06-01"),by="1 month")

###############################################################################
#### merge data and compute 12-month % change
###############################################################################

df4<-left_join(df2,df3,by="date")
df.metro <- df4 %>% gather(geo,hpi,-date) %>% mutate(type="metro")
df.metro
## # A tibble: 194,820 x 4
##          date         geo      hpi  type
##        <date>       <chr>    <dbl> <chr>
##  1 1975-01-01 Abilene, TX 44.61498 metro
##  2 1975-02-01 Abilene, TX 45.06790 metro
##  3 1975-03-01 Abilene, TX 45.86827 metro
##  4 1975-04-01 Abilene, TX 47.03022 metro
##  5 1975-05-01 Abilene, TX 47.38393 metro
##  6 1975-06-01 Abilene, TX 46.81175 metro
##  7 1975-07-01 Abilene, TX 46.41641 metro
##  8 1975-08-01 Abilene, TX 46.35436 metro
##  9 1975-09-01 Abilene, TX 46.44499 metro
## 10 1975-10-01 Abilene, TX 46.57779 metro
## # ... with 194,810 more rows
df.metro <- df.metro %>% mutate(state=substr(geo,nchar(geo)-1,nchar(geo)))
df.metro<-df.metro %>% group_by(geo) %>% 
  mutate(hpa=hpi/lag(hpi,12)-1) %>% ungroup()

###############################################################################
#### create joyswarm
###############################################################################

ggplot(data=filter(df.metro,year(date)>2003 & month(date)==6), aes(x=hpa,y=forcats::fct_reorder(factor(year(date)),date,.desc=T),color=hpa,fill=hpa))+
  geom_joy(alpha=0.25,rel_min_height=0.005)+
  geom_quasirandom(alpha=0.8,size=0.2)+
  scale_fill_viridis(option="C")+
  scale_color_viridis(option="C")+
  theme_minimal()+
  theme(legend.position="none",
        plot.caption=element_text(hjust=0))+
  scale_x_continuous(labels=percent)+
  labs(x="12-month percent change in house price",y="",
       title="Distribution of metro area house price appreciation",
       subtitle="12-month percent change in June of each year",
       caption="@lenkiefer Source: Freddie Mac House Price Index. Each dot a metro area, curve density over metros.")
## Picking joint bandwidth of 0.0112

This plot shows the distribution of house price increases has shifted higher over the past couple of years, but remains below the 2013 levels (when housing markets were recovering from the Great Recession) and the levels of last decade.

House price and unemployment

#####################################################################################
# add region crosswalk
#####################################################################################

region<-fread("data/region.txt")


#####################################################################################
# merge up data
#####################################################################################
ur.plot3<-merge(ur.plot2,select(region,statecode,statename),
                by.x="state",by.y="statename") %>%
  rename(statename=state)

dfhu<-merge(df.state,select(ur.plot3,-up,-down),
            by.x=c("geo","date"),by.y=c("statecode","date"))


ggplot(data=filter(dfhu,date==max(dfhu$date)),
       aes(x=ur,y=hpa,group=state,label=state))+
  geom_text(size=3,alpha=0.82)+
  theme_minimal()+
  scale_x_continuous()+
  scale_y_continuous(labels=percent)+
  scale_colour_gradient(low="red",high="blue",
                               name = "12-month percent change",
                               labels = percent  )+
  labs(y="12-month percent change in house prices", 
       x="Unemployment rate (percent)",
       title="House prices and unemployment",
       subtitle="June, 2017",
       caption="@lenkiefer Source: Freddie Mac house price index, U.S. Bureau of Labor Statistics")+
    theme(plot.title=element_text(size=18),
          plot.caption=element_text(size=7,hjust=0,vjust=1),
          legend.key.width=unit(1,"cm"),
          legend.position="top",
          panel.grid.major.y =element_blank())

This plot shows that in general, states with relatively higher unemployment rates tend to have lower house price growth.

Let’s see how the relationship evolves over time. We’ll trace out a pather using geom_path, first for just the state of California.

#####################################################################################
# make list of dates
#####################################################################################
dlist<-unique(filter(dfhu,year(date)>1999)$date)

#####################################################################################
# generate some functions (the utility will become apparent soon)
#####################################################################################

myf5<- function (s, slist="CA") { return(filter(dfhu, 
                                                date>as.Date("2000-11-01") &
                                                  date<=s & state %in% slist) )}

myplot5<-function(df3) {
  g<-
    ggplot(data=df3, 
           aes(x=ur,y=hpa,group=state,color=hpa1,label=state))+
    geom_point(data=filter(df3,date==max(df3$date)),size=2)+
    geom_path(aes(alpha=id))+
    guides(alpha=F,color=F)+
    theme_minimal()+
    geom_hline(yintercept=0,color="black",linetype=2)+
    geom_vline(xintercept=100,color="black",linetype=2)+
    scale_x_continuous(limits=c(0,15))+
    scale_y_continuous(labels=percent,limits=c(-.4,.4))+
    scale_colour_gradient(low="red",high="blue",name = "12-month percent change",
                          #limits=c(-.6,.6),
                          labels = percent  )+
    labs(y="12-month percent change in house prices", 
         x="Unemployment rate (percent)",
         title="House prices and unemployment",
         subtitle="December 2000 through June 2017",
         caption="@lenkiefer Source: Freddie Mac house price index, U.S. Bureau of Labor Statistics")+
    theme(plot.title=element_text(size=18),
          plot.caption=element_text(size=7,hjust=0,vjust=1),
          legend.key.width=unit(1,"cm"),
          legend.position="top",
          panel.grid.major.y =element_blank())+facet_wrap(~state)
  return(g) }

myplot5(myf5(dlist[210],"CA"))
## Warning: Removed 1 rows containing missing values (geom_vline).

Now let’s try it for several states:

st.list<- c("CA","TX","FL","NV","AZ","WA","CO","MI","NC")
myplot5(myf5(dlist[210],st.list))
## Warning: Removed 9 rows containing missing values (geom_vline).

Now let’s try to show it for all states.

ggplot(data=filter(dfhu, date>as.Date("2000-11-01")), 
       aes(x=ur,y=hpa,color=hpa1,label=state))+
  geom_path(aes(alpha=id))+
  
  guides(alpha=F,color=F)+
  theme_minimal()+
  geom_hline(yintercept=0,color="black",linetype=2)+
  geom_vline(xintercept=100,color="black",linetype=2)+
  scale_x_continuous(limits=c(0,15))+
  scale_y_continuous(labels=percent,limits=c(-.4,.4))+
  scale_colour_gradient(low="red",high="blue",name = "12-month percent change",
                        #limits=c(-.6,.6),
                        labels = percent  )+
  labs(y="12-month percent change in house prices", 
       x="Unemployment rate (percent)",
       title="House prices and unemployment",
       subtitle="December 2000 through June 2017",
       caption="@lenkiefer Source: Freddie Mac house price index, U.S. Bureau of Labor Statistics")+
  theme(plot.title=element_text(size=18),
        plot.caption=element_text(size=7,hjust=0,vjust=1),
        legend.key.width=unit(1,"cm"),
        legend.position="top",
        panel.grid.major.y =element_blank())+  # use facet_geo
  facet_geo(~ state, grid = "us_state_grid2")
## Warning: Removed 96 rows containing missing values (geom_vline).

Some animations

We can animate the swirly plot above for several states:

C:.github.com_aug_7_2017 redblueswirl9.gif

We can also animate this chart we made last year, but with updated data through June 2017.

Kandinsky plots

Finally, let’s make some Kandisky style plots (see e.g. this post)

The US

kandinsky(filter(df.us,year(date)>1999) %>% select(date,hpi))
  grid.text(label=paste("U.S. house price index Jan 2000 - Jun 2017\n@lenkiefer, made using R package Kandinsky"),
            gp=gpar(fontsize=12),
            x=.95,y=0.075,just="right")

Virginia, Texas, Ohio, Kentucky

kandinsky(filter(df.state, 
                   state %in% c("VA","TX","OH","KY")) %>% select(date,state,hpa))
  grid.text(label=paste("12-month percent change in house price index\nJan 1976 - Jun 2017\nVA, TX, OH, and KY\n@lenkiefer, made using R package Kandinsky"),
            gp=gpar(fontsize=12),
            x=.95,y=0.8,just="right")

 Share!