Terry's GIS Studies and Transition to a New Career

Wednesday, March 25, 2020

Module 4--Data Classification

In this module, I did a refresher on the four levels of measurement (nominal, ordinal, interval, and ratio) along with four common data classification methods to display on the map--equal interval, quantile, standard deviation, and natural break.

The exercise was very straightforward and built on skills previously learned. The task was to identify persons age 65 or over in the census tracts of Miami-Dade County. I then created two series of four maps looking at the data differently. The first series of maps displayed the four classification methods looking at the percentage of the population aged 65 or over in the census tracts. The second set of maps displayed the same methods but with the population normalized for square miles. As a reminder, choropleth maps should use normalized data.

The first set of maps is displayed below and shows the non-normalized percent of the population aged 65 or over:

Percentage of Persons Age 65+, Non-Normalized

As you can see, the data represents the percentage of the population for those census tracts presented in the four classification methods. In my opinion, the least useful is the equal interval method as it imposes artificial data breaks and generalizes the data too much. The quantile method is more effective, but also imposes artificial breaks. However, it provides much more detail, which is consistent with natural breaks and standard deviation. The standard deviation method assumes that the data is normally distributed. Because the data may not be normally distributed or may be skewed, this might not be the best method. Additionally, the data will be placed over six intervals based on +/- 3 std dev. Therefore, the majority of the data will be clustered around the mean and could be influenced by outliers. I personally like the natural breaks (Jenks) method as it utilizes an algorithm to establish intervals that minimize intra-class variance while maximizing inter-class variance. I believe that this provides the best representation of the data and takes into account natural clusters.

The second set of maps is displayed below and shows numbers (not percents) of persons aged 65 or over, normalized based on square miles:

Number of Persons Aged 65+, Normalized for Sq. Miles

The normalized data provides the number of persons aged 65 or over, normalized for square miles. I will not repeat my assessment of the data classifications, as they did not change. However, I acknowledge that choropleth maps should use normalized data. However, in this exercise, I do not believe it is necessary, though the purpose of the map would determine this. With this study, it considers how the people are spread across the area. Therefore, if a large amount of people are spread over a large area, their distribution could appear much smaller than a smaller amount of people compressed over a smaller census tract. Therefore, the data will favor the smaller areas and may not provide accurate results. For instance, if a commission wanted to allocate services to the elderly population, the number of people would be important, but not necessarily how they are in comparison to the area. To further explain my point, if a commission was setting up medical centers, a large area with a larger population may require more providers than a smaller area with a smaller population, even though the normalization will display a higher density in the smaller area.

Again, this was an enjoyable exercise and provides more considerations when designing maps.


No comments:

Post a Comment