Pages

Tuesday, March 7, 2017

Assignment 3

Goals and Background

The purpose of the assignment is to provide us with experience calculating Z-Scores and Probability for a given data set. Additionally, the assignment will provide experience relating the calculated information (Z-Scores and Probability) to a given scenario for pattern analysis.

Terms

Z-Score: A Z-Score is the the precise number of a standard deviation which a specific observation lies on the standard deviation curve. If a numeric value lies between 1 and 2 standard deviations on the curve, using the Z-Score formula (Fig. 1) will calculate the exact location.

(Fig. 1) Z-Score formula. Zi: Z-Score, Xi: observation value, u: mean of the data, S: standard deviation of the data.


Probability: Probability can be described as the likelihood (by percent) a numeric valued event will occur. The probability is calculated from the z-score via a probability chart (Fig. 2). Z-Scores can also be calculated in Microsoft Excel and a host of other programs. Probability is calculated from absolute frequency of a specific event occurring compared to the all of the events in the data.

(Fig. 2) Probability chart based on Z-Scores.


The Scenario
     You have been hired by an independent research consortium to study the geography of foreclosures in a Dane County, Wisconsin.  County officials are worried about the increase in foreclosures from 2011 to 2012.  As an independent researcher you have been given the addresses of all foreclosures in Dane County for 2011 and 2012 and they have been geocoded and then added to the Census Tracts for Dane County.    While you realize that you cannot determine the reasons for foreclosures occurring, you do have the tools to analyze them spatially.  Specifically, you are interested to see how the patterns of these foreclosures have changed from one year to the next.  Explain what the patterns are and also provide some understanding as to the chance foreclosures will increase by 2013?  
A second question is to be answered after calculating the Z_Score for three specific Tracts located in Dane county.

If these patterns for 2012 hold next year in Dane County, based on this Data what number of foreclosures for all of Dane County will be exceeded 70% of the time?  Exceeded only 20% of the time?  

Methods

The first step was to create a map displaying the change between 2011 and 2012 for Dane County using ArcMap. I added a field to the attribute table and subtracted the 2012 foreclosure value from the 2011 value for each tract. The result was displayed using standard deviation classification (Fig. 4).

(Fig. 3) Display of locations of selected Census tracts 114.01, 122.01, and 31.



The next step the instructions was to calculate the Z-Score for 3 select tracts in the data (Fig. 3). I utilized ArcMap to extract the Mean and the Standard Deviation for both years of data. I then extracted the values for the specific tracts from both years and input all of the values in Microsoft Excel. The Z-score was then calculated using Excel (Fig. 5).

Results


(Fig. 3) Display of the foreclosure change between 2011-2012 by Census Tract in Dane County.
Examining the map you can decipher the areas which have had a significant increase in foreclosures are illustrated by the dark/bright red color. Alternatively, the areas in darker blue have seen an decrease in foreclosures since 2011. Areas with increased foreclosures seem to be outside of the downtown/capital area (see Fig. 3 for location), though there are a few outside the capital which have lower foreclosures. More information is required to decipher why this pattern is emerging.

(Fig. 4) Excel spreadsheet with the Z-Score calculation data and results.

(Fig. 5) Display of 2011 foreclosures by standard deviation classification.
Fig. 5 is a map displaying the foreclosure numerical value for 2011 with a standard deviation classification. The standard deviation calculates the Z-Score for each value and assigns it to the proper classification representation. Tract 114.01 has a higher amount of foreclosures compared to the average of the county. Tract 31 has a slightly higher amount of foreclosures compared to the average of the county. Tract 122.01 has a slightly lower number of foreclosures compared to the county average. These representations are correlated by the results I calculated in the Excel sheet (Fig. 4). Comparing these results to Fig. 3 shows a few note worthy observations. The central northern most tract displays a higher than average amount of foreclosures but in Fig. 3 it had a significant reduction in foreclosure numbers. The same observation can be made for the large tract east of Tract 114.01 but not as much of a significant change between 2011 and 2012.

(Fig. 6) Display of 2012 foreclosures by standard deviation classification.
Fig. 6 is a map displaying the foreclosure numerical value for 2012 with a standard deviation classification. The results are essentially the same between the 3 selected tracts. Though the Z-Score for Tract 31 decreased a fair amount from 2011 it still fell in the .5-1.5 Std. Dev. Same as the 2011 results there are a few anomalies which appear when comparing the change between the 2 years and the Z-Score map though not as easy to identify.

Finally I will answer the following question:
If these patterns for 2012 hold next year in Dane County, based on this Data what number of foreclosures for all of Dane County will be exceeded 70% of the time?  Exceeded only 20% of the time?  
Foreclosures with a Z-Score greater than a -.52 will be exceeded 70% of the time. This equates to 70% of the time the foreclosures for a given tract will exceed approximately 7.

Foreclosures with a Z-Score greater than .84 will be exceeded 20% of the time. This equates to 20% of the time the foreclosures for a given tract will exceed approximately 20.

Conclusion

A pattern of higher foreclosures seems to fall outside of the downtown/capital area of Dane county. While more information is need to fully analyze why this pattern is being displayed I have my own assumptions. There was a significant housing market crash around these years due to a dwindling economy. People whom moved from the inner city to more posh suburbs bought houses which they could no longer afford when they lost their jobs. Thus many homes went into foreclosure in these areas. Again, this is merely a guess and more research would be required to verify that claim.

Analyzing the change between years doesn't give you the full picture of what is going on with the data. Calculating the Z-Scores or creating a standard deviation classified map provides additional information which is critical when attempting to interpret data. Like in the case of the central most northern Tract in Dane county which showed a decrease in foreclosures but still was above the average for the county. The observation tells you there is more to investigate in the area to gain a full understanding of what is going on. These observations combine with further data from more recent years would be very beneficial to many government agencies in Dane County.