Wednesday, February 18, 2015

Cartography and Classification Methods

Good day GIS enthusiasts and other viewers,
         The topic of this post will be Classification Methods when visualizing map data. My completed assignment map below is based on the percentage of those above age 65 by census tract for Escambia County, FL. The data source originated from the 2010 Census, but was provided by UWF. The data and map were completely integrated / created in ArcMAP. Before we get into that, lets look at some of the learning objectives that drove this assignment.
  •  Demonstrate four common data classification methods.
  • Utilize ArcGIS to prepare a map with four data frames.
  • Symbolize a map for intuitive data acquisition.
  • Compare and contrast classification methods.
  • Identify classification method best suited to represent spatial data.
Based on the above let me give you a brief explanation on each of the classification methods you'll see in my finalized map.

Equal Interval:

The equal interval classification method presents your data by separating it into classes by dividing the total range (Max – Min value) by however many classes you’ve created. This method is one of the simplest and can be done by hand if necessary. However this method doesn’t take into account how the data falls along a number line and could lead to classes with no values in them. But, a positive to this is that there are no gaps in your legend data to get confused on.
Quantile:

Classifying your data into quantiles separates it into equal numbers of observations per class. This is done through rank ordering (lowest to highest number etc) until all data has been equally dispersed among your classes. This does not take into account any data clustering or “Natural Breaks” observed by seeing where values are in relation to each other on a number line. However unlike equal interval there is no chance of an empty class skewing your map. 
Standard Deviation:

This method of classification does take into account where the numbers lie on a number line. One of the big advantages and disadvantages to this method is that it relies on having normally distributed data. So in the event of having data that follows a Gaussian or standard bell curve this is an excellent choice. When not dealing with this style data that has roughly equal amounts of data points on either side of the mean standard deviation you’re likely to once again end up with empty classes, or skewed data. 
Natural Break:

This method as alluded to above takes into account where your data is along a number line and tries to group data items based on where they occur most frequently. This tries to keep like values together and unlike values in separate classes. This is the default for ArcMAP likely because it attempts to logically compute which values should be grouped (classed) together, and which should not. This is both editable and subjective by the person creating the map.  

Now that you have a little more understanding or refresher on the above classification methods please look at my finalized map below:

 One of the intended take-a-ways in creating this map was to explore the different classification methods but to also learn through comparison of how they interrelate or function independently. Upon some intense studying you can see that generally the lower percentages are all fairly similar across the maps, this is mostly true with the high percentages as well. So the real delineation of which is the best portrayal of the data comes from the middle classes. So it really comes down to which percentage point representation is best for the data. Looking only at the middle classes we have a range of  approximately 2.5% (Quantile), 3% (Natural Breaks), 4.5% (Standard Deviation (+/- .5)), or 5% (Equal Interval) respectively. So which is the best?
Well if you aren't statistically inclined I'd say lets forgo the Standard Deviation discussion, and focus on the best of the remaining worlds. The Natural Breaks. Not only does its name make it sound like the natural choice, the method in which it portrays the data makes it so as well. It does the best job of showing the natural groups associated with this data set. Once again going back to a few sentences ago, its all in the middle classes. You can see by the small percentage range in the middle class and relatively larger range on either side that the data is clustered about the center, and tapers off at the ends somewhat representative of a bell curve. This representation is more fluid than the other choices. 
This discussion could go on for quite some time, but above all its open to your interpretation of the data. I personally like how logical the Standard Deviation method presents the data. But it is in a separate class compared to the other three. So in an effort not to drone on about statistics we default to the optimum of the other three the Natural Break method. I hope you took something away from this other than a few lost minutes in reading. Thank you.


No comments:

Post a Comment