Saturday, February 28, 2015

GIS Data Quality and Data Search

Hello readers,
        The focus of this post is the half way point culminating midterm lab for my intro to GIS course. This was the biggest project to date for the class which focused on emphasizing as the title suggests, data searching and data quality. We were given select element criteria and a randomly assigned Florida county, and (with a few helpful hints) told Go!
         Here is a small break down of the process I went through for building to my maps:
  1. The Collection: I collected all of the useful data layers I already had on hand. For example, county boundaries. 
  2. Initial Search: I did a quick search to determine what I could find easily.
  3. Brainstorm: I brainstormed layout and data presentation, mostly by exploring the data I found by viewing it in ArcMAP. 
  4. The Search Continued: I continued to search for the more pesky data, or data to fill gaps in my brainstorming.
  5. The Big Conversion: convert all the data layers to a particular projection and Geographic Coordinate System
  6. Getting into the Nitty Gritty: Here was the bulk of the data organization, and the break down of what data should go on what map. 
  7. Own My Map: I made sure I had all the basic map design and cartographic processes covered to the best of my ability.
  8. I patented it, packaged it, and slapped it on a plastic lunch box. Just kidding, but I did compile the deliverable. 
So with all of the above said. Here are my two maps for Dixie County Florida.

Map 1

Essentially I wanted to start with a broad overview of the county through its land use scheme. This lumped in well with the digital elevation model being overall big background information for the entire county. You can also see the distribution of towns around this area, and how relatively non-populous the county is overall. you can also see that there is not much elevation change for the county with the to the north being 27 feet. This makes sense as the county does sit next to the Gulf Of Mexico. It does look almost heart shaped doesn't it?

Map 2


One of the requirements of this project was to incorporate one Digital Orthographic Quarter Quad (DOQQ), which in total is the image on the right. To make proper use of this requirement I chose to have one map zoomed in to the working area presented by one DOQQ. Given for comparison you have the other frame showing subdued priority ranking for the surrounding Strategic Habitat Conservation Areas. Note that, there isnt much conservation in the County seat town of Cross City, but relatively high priority surrounding it. The inset also gives you a good overview of where the priority areas are for this county are concentrated. Looking at the previous map you can compare this again to the fact that the majority of the conservation area is for agricultural land.

Overall I hope you have enjoyed this limited presentation on Dixie County. There are so many facets of mapping and data out there that could be applied to any county anywhere. It is amazing to see what is being produced now compared with 7 short weeks ago. Thank you.

Monday, February 23, 2015

Choropleth and Proportional Symbol Mapping... and Wine

Hello lovers of GIS and wine.
    This post is dedicated to Choropleth mapping, Proportional Vs Graduated symbols, and wine. We will be looking at my finalized assignment map which breaks down Europe by population density and gender as well as by wine consumed per capita. The data for all of this was graciously provided by UWF and is circa the European Census of 2013, with the Wine data coming from WineInstitute.Org from 2012. The project was created in ArcMAP with some minor dabbling in Corel Draw. Lets look at some of the learning objectives that drove this assignment:
  • Recognize when a choropleth map should be used
  • Compute varying standardization methods
  • Know what color schemes are appropriate for certain data types
  • Employ proper choropleth legend design   
  • Evaluate what type of proportional symbol map to use
  • Develop an appropriate strategy to work around proportional symbol challenges
  • Choose an appropriate color scheme for a choropleth and proportional symbols map
  • Create appropriate legend for classification scheme
  • Implement appropriate classification method for population data
Let us hit the big item definitions before we get to my map. 

What is a Choropleth map? Simply put its a map that displays data that is collected and generally contained within or abruptly changes when transitioning to a different enumeration unit (Counties, States, Countries, etc). This data is then displayed by being grouped into classes and assigning a color or varying shades of a color to represent it.

What is a proportional symbol map? This is a map that has a symbol representing data within an enumeration unit. The symbol is proportional in size to how much that particular data measurment occurs in that enumeration unit. 

How do graduated symbols differ from proportional symbols? These symbols are given a particular size to represent a range of the data rather than being scales as a ratio like the proportional symbol. 

What is wine? Wine is the delicious byproduct of collecting and fermenting grape juice. For me, the sweeter the better. 

Below you can see my conglomerated map for this assignment. Broken down into 3 data frames (Population density (big), Female Percentage, Male Percentage) with 4 data representations (4th being Wine Consumption proportional symbols).

Sequential (lighter to darker) color schemes abound in the 3 choropleth maps above. All three also utilize the Quantile classification method. You may remember from last weeks post that this method rank orders data and puts equal numbers of observations per class. It is quite interesting to see that with this method that the male and female population percentages are almost directly inverse of each other. Take Russia as an example, the data shows 53% of the population is women and happily the other map correspondingly shows 47% are male.
In regard to the wine, we have the backdrop of population density with wine consumption in liters per capita overlay. Proportional symbols show where the most and least consumption by country occurs. Central and western Europe have far more consumption that North and East. Plan accordingly for your next wine tour of Europe. 
Now I did try to refine the map by incorporating wine bottles, however with my current work constraints and the small scale of the population density map I had to forgo their use, as seen below. The overlap just doesn't look as clear and understandable. Thank you for your time.


Wednesday, February 18, 2015

Cartography and Classification Methods

Good day GIS enthusiasts and other viewers,
         The topic of this post will be Classification Methods when visualizing map data. My completed assignment map below is based on the percentage of those above age 65 by census tract for Escambia County, FL. The data source originated from the 2010 Census, but was provided by UWF. The data and map were completely integrated / created in ArcMAP. Before we get into that, lets look at some of the learning objectives that drove this assignment.
  •  Demonstrate four common data classification methods.
  • Utilize ArcGIS to prepare a map with four data frames.
  • Symbolize a map for intuitive data acquisition.
  • Compare and contrast classification methods.
  • Identify classification method best suited to represent spatial data.
Based on the above let me give you a brief explanation on each of the classification methods you'll see in my finalized map.

Equal Interval:

The equal interval classification method presents your data by separating it into classes by dividing the total range (Max – Min value) by however many classes you’ve created. This method is one of the simplest and can be done by hand if necessary. However this method doesn’t take into account how the data falls along a number line and could lead to classes with no values in them. But, a positive to this is that there are no gaps in your legend data to get confused on.
Quantile:

Classifying your data into quantiles separates it into equal numbers of observations per class. This is done through rank ordering (lowest to highest number etc) until all data has been equally dispersed among your classes. This does not take into account any data clustering or “Natural Breaks” observed by seeing where values are in relation to each other on a number line. However unlike equal interval there is no chance of an empty class skewing your map. 
Standard Deviation:

This method of classification does take into account where the numbers lie on a number line. One of the big advantages and disadvantages to this method is that it relies on having normally distributed data. So in the event of having data that follows a Gaussian or standard bell curve this is an excellent choice. When not dealing with this style data that has roughly equal amounts of data points on either side of the mean standard deviation you’re likely to once again end up with empty classes, or skewed data. 
Natural Break:

This method as alluded to above takes into account where your data is along a number line and tries to group data items based on where they occur most frequently. This tries to keep like values together and unlike values in separate classes. This is the default for ArcMAP likely because it attempts to logically compute which values should be grouped (classed) together, and which should not. This is both editable and subjective by the person creating the map.  

Now that you have a little more understanding or refresher on the above classification methods please look at my finalized map below:

 One of the intended take-a-ways in creating this map was to explore the different classification methods but to also learn through comparison of how they interrelate or function independently. Upon some intense studying you can see that generally the lower percentages are all fairly similar across the maps, this is mostly true with the high percentages as well. So the real delineation of which is the best portrayal of the data comes from the middle classes. So it really comes down to which percentage point representation is best for the data. Looking only at the middle classes we have a range of  approximately 2.5% (Quantile), 3% (Natural Breaks), 4.5% (Standard Deviation (+/- .5)), or 5% (Equal Interval) respectively. So which is the best?
Well if you aren't statistically inclined I'd say lets forgo the Standard Deviation discussion, and focus on the best of the remaining worlds. The Natural Breaks. Not only does its name make it sound like the natural choice, the method in which it portrays the data makes it so as well. It does the best job of showing the natural groups associated with this data set. Once again going back to a few sentences ago, its all in the middle classes. You can see by the small percentage range in the middle class and relatively larger range on either side that the data is clustered about the center, and tapers off at the ends somewhat representative of a bell curve. This representation is more fluid than the other choices. 
This discussion could go on for quite some time, but above all its open to your interpretation of the data. I personally like how logical the Standard Deviation method presents the data. But it is in a separate class compared to the other three. So in an effort not to drone on about statistics we default to the optimum of the other three the Natural Break method. I hope you took something away from this other than a few lost minutes in reading. Thank you.


Saturday, February 14, 2015

Projections Part 2, The Storage and Contamination Monitoring Project

Happy Valentines day wonderful page viewers.
What better way to herald Valentines day with a post about contaminated lubricants. More specifically, this post will give highlight to the second week of Projections work in Intro to GIS by discussing the Escambia County FL Petroleum Storage and Contamination Monitoring project I have created. First lets look at some of the learning objectives that led the way for this project. 
  • Explore and download aerials, topographic quadrangles, shapefiles and tabular xy data from two different online data sources for Florida
  • Define a spatial reference for an unknown data set
  • Reproject GIS data to a common coordinate system and projection (understanding the difference between defining and projecting the coordinate system)
  • Identify UTM and state plane zones for a specific area 
  • Decipher Federal Information Processing Standards (FIPS) code
  • Create/Convert x,y data using Microsoft Excel and import to ArcGIS
  • Locate important accuracy information regarding GIS/GPS data
  • Relate coordinate values to the appropriate earth hemisphere and double that check calculations make geographical sense
  • Generate a map displaying aerials, topographic quads, shapefiles and tabular xy data
 This project focused on being a guided trial run for the impending midterm. As such the key tasks were to gather imagery (Quarter Quads) from reputable sources, and then convert and project provided tabular data (excel spreadsheet) for ArcMAP use, and build a map (exclusively in ArcMAP) incorporating these and other base information while ensuring all of the various data and input layers were projected in a common coordinate system. My version of the project is seen below.

One of the particulars of the assignment was to incorporate two adjacent 7.5 minute quarter quad pictures. For my project I chose two North - South oriented quads, those for Pace and Pensacola proper, which give you an entire overview of eastern Pensacola as it rests on the Pensacola and Escambia Bay's. As such this very data populated area is the result. This provides a good overview for you to look at to see where major clusters of certain types of monitoring sites are. We could very easily make the scale larger and focus in on any central area to further explore this data, it also lends you to believe the areas not pictured to the west would be as populated whereas the eastern residential shore has less sites. Please enjoy this overview map. Thank you.


Monday, February 9, 2015

Cartography and Spatial Statistics

Good day all,
     Here we have a post about how Spatial Statistics interrelate into cartography. This specifically applies to the analyses of data on a map. The integration of these two items is seen through this weeks learning objectives below:
  • Define key spatial statistics terms
  • Know what questions to ask about your data before choosing an analysis tool.
  • Examine the spatial distribution of a data set to identify clusters and spatial relationships in the data.
  • Interpret a histogram to determine the frequency distribution data set.
  • Find outliers in your data using a semivariogram cloud, Voronoi map, histogram, and normal QQ plot.
  • Use a trend analysis graph to identify patterns in your data.
  • Assess which analysis tools are appropriate given the spatial distribution and values of your data. 
The majority of the above were exercised through Esri.com specific user training "Exploring Spatial Patterns in Your Data Using ArcGIS." The examples used in this weeks assignments covered map data for the depth of wells across an area, weather station & temperature readings, and murder rates across various cities. You can see that that data requiring statistical/spatial analyses can vary to just about any topic you can imagine. Below we will look at some of these particular tools that I delved into using ArcGIS this week. 

Lets start with some of the basics. The Mean, Median Center, and general distribution of data. 
Recall;
Mean = The average of all of your data points
Median Center= The middle value of all of your data
Distribution= how your data is graphically oriented in reference against a particular standard, Ie Normal distribution is judged against a bell curve.


The above data was provided by ESRI, with the map being created by myself exclusively in ArcGIS. I first took the data and had ArcGIS compute the Mean center, and Median center, which show there is only a slight difference in location between the two. This difference lends you to see how the more densely packed areas in the north and south, south-east drive the mean and median locations slightly apart. Also you can see an overall directional distribution computed to 1 standard deviation (meaning approximately 68% of the overall data falls in this area). We can see with this overlay that the station locations have a roughly east-west distribution. Overall this portion of the lab essentially tells us where and how the data is located.

The next thing I did with this data was to determine if it was normally distributed. This was done in two ways, showing a histogram and QQ Plot of the data. You can see these computations and graphics below, as generated in ArcGIS.


Luckily for me, ArcGIS does all of the heavy computations. The statistics side would make my brain hurt. Above we have a Histogram with potential outlier on the right side. But in general you can see that this does lend itself to a mostly normal distribution with only slight positive skew due in part to that potential outlier. 


Again, the beauty of ArcGIS is that it does the calculations and has these amazing displays to save the aspiring Cartographer / GIS Professional from having to do such things by hand, like the wonderful scientists coming before. Here we have the QQ Plot, whose data should predominately fall upon that central line to show if the data favors normal distribution or not. thankfully with few exceptions, this data is normally distributed. 

The scope of the this weeks lesson did go beyond the examples above. But those items are beyond the scope of this post for today. Spatial statistics are a huge portion of the mapping world which i have only scratched the surface. Thanks for viewing this little bit.