This is a very brief overview of how Python can be used to carry out common GIS tasks. We will begin with a quick outline of the advantages and disadvantages of using Python for GIS and then see Python in action with a small demonstration.
Traditional desktop GIS applications are powerful tools for reading, writing, and manipulating geospatial data. However, there are many compelling reasons for using programming languages such as Python for working with spatial data.
In this example, we'll pretend you're interested in learning about the distribution of Bigfoot sightings in mainland United States. We'll use the geopandas
Python library to read and manipulate relevant geospatial data. We'll also try to answer a very important research question: Is Bigfoot a cheesehead?
Note: Don't worry about trying to understand every detail of the code shown here. This is merely intended to give you an overview of what's possible in Python.
Loading relevant modules is often the first thing done when working in Python. Modules are collections of code that provide specialized functions. In this case, we will rely on the geopandas
module to perform geospatial functions.
Note: geopandas
is a powerful tool for reading and writing data, while also having capabilities for performing the geometric operations that are essential for geospatial work. However, see the Additional Links section for other great Python geospatial modules.
# Load the geopandas library, which will let us work with geospatial data
import geopandas
# This code helps us configure the appearance of our outputs (not important to understand right now)
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (12, 9)
Here, we load two shapefies by calling the read_file
function:
bigfoot_sightings.shp
: This is a point file where each point represents a Bigfoot sighting.cb_2016_us_state_5m.shp
: This is a polygon file of all states in the United States.sightings = geopandas.read_file('data/bigfoot_sightings.shp')
usa = geopandas.read_file('data/cb_2016_us_state_5m.shp')
Similar to desktop GIS, geopandas
allows us to view geospatial data using a table view as well as a graphical view.
The table view allows us to see the attributes that are associated with our features.
usa
The graphical view is helpful when we are particularly interested in the geometry of our features. We show the graphical view by using the plot
function.
usa.plot(color='white')
So far, we've only been looking at the usa
dataset, but we can also plot the sightings
layer on top of usa
.
base_layer = usa.plot(color='white')
sightings.plot(ax=base_layer, color='red')
From the plots above we can see that our usa
layer contains many states and territories that are not part of mainland US, but luckily geopandas
gives us the ability to filter our dataset by attribute. Since we only want to work with features within mainland US, we first create a list of features to filter out by inspecting the NAME
attribute shown in the table above. Then we use the query
function in geopandas
to keep only the features that have a NAME
attribute value that is not in our list.
# NAME values that are not mainland features
not_mainland_names = ['American Samoa', 'Guam', 'Puerto Rico', 'United States Virgin Islands',
'Commonwealth of the Northern Mariana Islands', 'Alaska', 'Hawaii']
# Use `query` to keep only features where the NAME values is NOT in our list
mainland_usa = usa.query('NAME not in @not_mainland_names')
Now, we can again use the plot
function to make sure our new dataset only contains features from mainland US. Looks good!
mainland_usa.plot(color='white')
Now we'll use a spatial join to determine how many Bigfoot sightings have occurred in each state.
In geopandas
we use the sjoin
function to perform a spatial join. (Note: Ignore the "CRS does not match!" warning for now.)
points_in_polygon = geopandas.sjoin(sightings, mainland_usa, how='inner', op='within')
Next we count the points in each state and create a new attribute called SIGHTINGS_COUNT
to hold this count for each state.
# These lines of code might look scary, but all you need to know is that this code aggregates
# all points within each state and creates a new attribute called SIGHTINGS_COUNT that reflects
# the number of sightings in that state.
counts = points_in_polygon.groupby('NAME').size()
counts = counts.to_frame('SIGHTINGS_COUNT').reset_index()
mainland_with_counts = mainland_usa.merge(counts)
We can now visialize our data as a choropleth map by telling the plot
function that we would like to use the SIGHTINGS_COUNT
attribute as our thematic variable.
mainland_with_counts.plot(column='SIGHTINGS_COUNT', cmap='Reds', scheme='quantiles', k=7, legend=True)
Finally, we can find out if Sasquatch is a cheesehead! This is a rather silly example, so we'll take an equally silly (and simplistic) approach to answering this question by merely finding out what percentage of the total mainland US sightings are in Wisconsin.
First we select the Wisconsin feature and get the SIGHTINGS_COUNT
value.
wi_with_count = mainland_with_counts.query('NAME == "Wisconsin"')
wi_sightings = wi_with_count.SIGHTINGS_COUNT.iloc[0]
Next we find the total number of sightings in mainland US and then use this value to find the percentage of sightings that occurred in Wisconsin.
mainland_total_sightings = mainland_with_counts.SIGHTINGS_COUNT.sum()
wi_percentage = 100 * wi_sightings / float(mainland_total_sightings)
We now simply print out the percentage.
print "{0:.2f} percent of sightings are in WI.".format(wi_percentage)
The idea for this notebook was inspired by work from Joshua Stevens and Timothy Renner.
bigfoot_sightings.shp
) came from Timothy Renner.cb_2016_us_state_5m.shp
) came from the Census Bureau.This is a short list of other useful Python geospatial modules.