An Introduction to Geoprocessing with Python

This is a very brief overview of how Python can be used to carry out common GIS tasks. We will begin with a quick outline of the advantages and disadvantages of using Python for GIS and then see Python in action with a small demonstration.

Link: https://ccchris-allen.github.io/

1. Python vs. Desktop GIS

Traditional desktop GIS applications are powerful tools for reading, writing, and manipulating geospatial data. However, there are many compelling reasons for using programming languages such as Python for working with spatial data.

Advantages of Python

  • Repeatability: Using Python can help you automate tedious geoprocessing tasks. Python scripts can create a repeatable workflow for loading, manipulating, and writing data.
  • Extensibility: Python has many great general purpose analysis modules, which can allow you to easily combine geoprocessing and non-spatial forms analysis (eg: data mining, statistical anlysis, text mining) into a single procedure. Setting up a similar workflow is often difficult (or impossible) when peforming geoprocessing in desktop GIS.
  • Affordability: Python is free!
  • Cross-Platform: Unlike some desktop GIS applications, Python can be run on macOS, Windows, Linux, etc.

Disadvantages of Python

  • Less Graphical: Desktop GIS software tends to prioritize the graphical view of geospatial data; by default, you see the geometry of your features. As we'll see below, you're able to view the geometry of GIS features in Python, but it's not as automatic as in desktop GIS.
  • Less Familiar: In general, desktop software is more familiar to most of us, since we carry out most of our daily computing tasks in desktop programs that have user-friendly interfaces. Because of this, there is a learning curve when using Python, since programming involves using text commands to carry out tasks (which we'll see below).

2. Is Bigfoot a Cheesehead? Or, Mapping Bigfoot Sightings using Python

Bigfoot

In this example, we'll pretend you're interested in learning about the distribution of Bigfoot sightings in mainland United States. We'll use the geopandas Python library to read and manipulate relevant geospatial data. We'll also try to answer a very important research question: Is Bigfoot a cheesehead?

Note: Don't worry about trying to understand every detail of the code shown here. This is merely intended to give you an overview of what's possible in Python.

Importing necessary Python modules

Loading relevant modules is often the first thing done when working in Python. Modules are collections of code that provide specialized functions. In this case, we will rely on the geopandas module to perform geospatial functions.

Note: geopandas is a powerful tool for reading and writing data, while also having capabilities for performing the geometric operations that are essential for geospatial work. However, see the Additional Links section for other great Python geospatial modules.

In [1]:
# Load the geopandas library, which will let us work with geospatial data
import geopandas

# This code helps us configure the appearance of our outputs (not important to understand right now)
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (12, 9)

Reading the GIS files

Here, we load two shapefies by calling the read_file function:

  1. bigfoot_sightings.shp: This is a point file where each point represents a Bigfoot sighting.
  2. cb_2016_us_state_5m.shp: This is a polygon file of all states in the United States.
In [2]:
sightings = geopandas.read_file('data/bigfoot_sightings.shp')
usa = geopandas.read_file('data/cb_2016_us_state_5m.shp')

Viewing the data

Similar to desktop GIS, geopandas allows us to view geospatial data using a table view as well as a graphical view.

The table view allows us to see the attributes that are associated with our features.

In [3]:
usa
Out[3]:
AFFGEOID ALAND AWATER GEOID LSAD NAME STATEFP STATENS STUSPS geometry
0 0400000US01 1.311737e+11 4.593686e+09 01 00 Alabama 01 01779775 AL (POLYGON ((-88.04374299999999 30.517423, -88.0...
1 0400000US02 1.477946e+12 2.453905e+11 02 00 Alaska 02 01785533 AK (POLYGON ((-133.655819 55.625617, -133.624921 ...
2 0400000US04 2.941986e+11 1.027346e+09 04 00 Arizona 04 01779777 AZ POLYGON ((-114.799683 32.593621, -114.809393 3...
3 0400000US08 2.684293e+11 1.175113e+09 08 00 Colorado 08 01779779 CO POLYGON ((-109.060253 38.599328, -109.059541 3...
4 0400000US09 1.254264e+10 1.815476e+09 09 00 Connecticut 09 01779780 CT POLYGON ((-73.72777499999999 41.100696, -73.69...
5 0400000US12 1.389242e+11 3.138604e+10 12 00 Florida 12 00294478 FL (POLYGON ((-80.751643 24.857254, -80.729063 24...
6 0400000US13 1.491698e+11 4.741101e+09 13 00 Georgia 13 01705317 GA POLYGON ((-85.605165 34.984678, -85.4741053158...
7 0400000US16 2.140429e+11 2.398670e+09 16 00 Idaho 16 01779783 ID POLYGON ((-117.242675 44.396548, -117.234835 4...
8 0400000US18 9.279055e+10 1.536767e+09 18 00 Indiana 18 00448508 IN POLYGON ((-88.09566199999999 37.905808, -88.08...
9 0400000US20 2.117533e+11 1.346236e+09 20 00 Kansas 20 00481813 KS POLYGON ((-102.051744 40.003078, -101.916696 4...
10 0400000US22 1.119049e+11 2.374630e+10 22 00 Louisiana 22 01629543 LA (POLYGON ((-88.88145399999999 30.053202, -88.8...
11 0400000US25 2.020444e+10 7.130621e+09 25 00 Massachusetts 25 00606926 MA (POLYGON ((-70.275526 41.310464, -70.260632 41...
12 0400000US27 2.062323e+11 1.892918e+10 27 00 Minnesota 27 00662849 MN POLYGON ((-97.239209 48.968684, -97.2380249999...
13 0400000US29 1.780527e+11 2.487575e+09 29 00 Missouri 29 01779791 MO POLYGON ((-95.773549 40.578205, -95.7685270000...
14 0400000US30 3.769650e+11 3.866987e+09 30 00 Montana 30 00767982 MT POLYGON ((-116.049155 48.481247, -116.04915652...
15 0400000US32 2.843293e+11 2.047351e+09 32 00 Nevada 32 01779793 NV POLYGON ((-120.005743 39.228664, -120.005142 3...
16 0400000US34 1.905076e+10 3.541928e+09 34 00 New Jersey 34 01779795 NJ POLYGON ((-75.559102 39.629056, -75.5594459999...
17 0400000US36 1.220530e+11 1.924310e+10 36 00 New York 36 01779796 NY (POLYGON ((-72.034958 41.255458, -72.029438 41...
18 0400000US38 1.787118e+11 4.399094e+09 38 00 North Dakota 38 01779797 ND POLYGON ((-104.048652 48.865734, -104.048824 4...
19 0400000US40 1.776637e+11 3.373836e+09 40 00 Oklahoma 40 01102857 OK POLYGON ((-103.002565 36.526588, -103.002188 3...
20 0400000US42 1.158822e+11 3.396806e+09 42 00 Pennsylvania 42 01779798 PA POLYGON ((-80.519891 40.906661, -80.5190910000...
21 0400000US45 7.786172e+10 5.070944e+09 45 00 South Carolina 45 01779799 SC POLYGON ((-83.35323800000001 34.728648, -83.34...
22 0400000US46 1.963484e+11 3.380783e+09 46 00 South Dakota 46 01785534 SD POLYGON ((-104.057698 44.997431, -104.05021 44...
23 0400000US48 6.766335e+11 1.902599e+10 48 00 Texas 48 01779801 TX POLYGON ((-106.645479 31.89867, -106.64084 31....
24 0400000US50 2.387347e+10 1.031125e+09 50 00 Vermont 50 01779802 VT POLYGON ((-73.43773999999999 44.045006, -73.43...
25 0400000US54 6.226560e+10 4.899028e+08 54 00 West Virginia 54 01779805 WV POLYGON ((-82.64299699999999 38.16956, -82.639...
26 0400000US72 8.868100e+09 4.923178e+09 72 00 Puerto Rico 72 01779808 PR (POLYGON ((-65.335701 18.349535, -65.329334 18...
27 0400000US05 1.347715e+11 2.960192e+09 05 00 Arkansas 05 00068085 AR POLYGON ((-94.61791900000002 36.49941400000001...
28 0400000US06 4.035011e+11 2.046672e+10 06 00 California 06 01779778 CA (POLYGON ((-118.603375 33.478098, -118.598783 ...
29 0400000US10 5.047195e+09 1.398721e+09 10 00 Delaware 10 01779781 DE (POLYGON ((-75.570798 39.626768, -75.559445999...
30 0400000US11 1.583650e+08 1.863340e+07 11 00 District of Columbia 11 01702382 DC POLYGON ((-77.119759 38.934343, -77.1045 38.94...
31 0400000US15 1.663410e+10 1.177770e+10 15 00 Hawaii 15 01779782 HI (POLYGON ((-156.05722 19.742536, -156.052315 1...
32 0400000US17 1.437887e+11 6.206694e+09 17 00 Illinois 17 01779784 IL POLYGON ((-91.512974 40.181062, -91.511073 40....
33 0400000US19 1.446676e+11 1.077808e+09 19 00 Iowa 19 01779785 IA POLYGON ((-96.639704 42.737071, -96.635886 42....
34 0400000US21 1.022663e+11 2.388771e+09 21 00 Kentucky 21 01779786 KY (POLYGON ((-89.405654 36.528165, -89.398685 36...
35 0400000US23 7.988522e+10 1.174876e+10 23 00 Maine 23 01779787 ME (POLYGON ((-68.37659099999999 44.113762, -68.3...
36 0400000US24 2.514775e+10 6.983312e+09 24 00 Maryland 24 01714934 MD (POLYGON ((-76.04861699999999 38.014843, -76.0...
37 0400000US26 1.464553e+11 1.040316e+11 26 00 Michigan 26 01779789 MI (POLYGON ((-84.009664 46.101332, -83.987684 46...
38 0400000US28 1.215299e+11 3.930506e+09 28 00 Mississippi 28 01779790 MS (POLYGON ((-88.510674 30.217021, -88.492378 30...
39 0400000US31 1.989727e+11 1.356291e+09 31 00 Nebraska 31 01779792 NE POLYGON ((-104.053514 41.157257, -104.052666 4...
40 0400000US33 2.318740e+10 1.028679e+09 33 00 New Hampshire 33 01779794 NH POLYGON ((-72.554232 42.860038, -72.555132 42....
41 0400000US35 3.141694e+11 7.557230e+08 35 00 New Mexico 35 00897535 NM POLYGON ((-109.050173 31.480004, -109.049843 3...
42 0400000US37 1.259213e+11 1.347006e+10 37 00 North Carolina 37 01027616 NC (POLYGON ((-75.72680699999999 35.935844, -75.7...
43 0400000US39 1.058331e+11 1.026460e+10 39 00 Ohio 39 01085497 OH (POLYGON ((-82.73570699999999 41.603361, -82.7...
44 0400000US41 2.486043e+11 6.195106e+09 41 00 Oregon 41 01155107 OR POLYGON ((-124.552441 42.840568, -124.500141 4...
45 0400000US44 2.677899e+09 1.323552e+09 44 00 Rhode Island 44 01219835 RI (POLYGON ((-71.36152 41.464831, -71.34707 41.4...
46 0400000US47 1.067977e+11 2.355189e+09 47 00 Tennessee 47 01325873 TN POLYGON ((-90.309877 35.00975, -90.30689699999...
47 0400000US49 2.128855e+11 6.999529e+09 49 00 Utah 49 01455989 UT POLYGON ((-114.052962 37.592783, -114.052472 3...
48 0400000US51 1.022573e+11 8.528606e+09 51 00 Virginia 51 01779803 VA (POLYGON ((-75.999658 37.848198, -75.996859 37...
49 0400000US53 1.721134e+11 1.255845e+10 53 00 Washington 53 01779804 WA (POLYGON ((-122.328343 48.021335, -122.321721 ...
50 0400000US55 1.402736e+11 2.936139e+10 55 00 Wisconsin 55 01779806 WI (POLYGON ((-86.956198 45.352006, -86.953389 45...
51 0400000US60 1.977591e+08 1.307244e+09 60 00 American Samoa 60 01802701 AS (POLYGON ((-168.163532 -14.536407, -168.159438...
52 0400000US66 5.435583e+08 9.343350e+08 66 00 Guam 66 01802705 GU (POLYGON ((144.642331 13.241898, 144.649492 13...
53 0400000US69 4.722767e+08 4.644268e+09 69 00 Commonwealth of the Northern Mariana Islands 69 01779809 MP (POLYGON ((146.033197 16.004517, 146.05429 16....
54 0400000US78 3.480131e+08 1.550245e+09 78 00 United States Virgin Islands 78 01802710 VI (POLYGON ((-64.634323 17.795891, -64.622142 17...
55 0400000US56 2.514649e+11 1.861273e+09 56 00 Wyoming 56 01779807 WY POLYGON ((-111.056888 44.866658, -111.055629 4...

The graphical view is helpful when we are particularly interested in the geometry of our features. We show the graphical view by using the plot function.

In [4]:
usa.plot(color='white')
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x106dc0f0>

So far, we've only been looking at the usa dataset, but we can also plot the sightings layer on top of usa.

In [5]:
base_layer = usa.plot(color='white')
sightings.plot(ax=base_layer, color='red')
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x127402e8>

Filtering our dataset by attribute

From the plots above we can see that our usa layer contains many states and territories that are not part of mainland US, but luckily geopandas gives us the ability to filter our dataset by attribute. Since we only want to work with features within mainland US, we first create a list of features to filter out by inspecting the NAME attribute shown in the table above. Then we use the query function in geopandas to keep only the features that have a NAME attribute value that is not in our list.

In [6]:
# NAME values that are not mainland features
not_mainland_names = ['American Samoa', 'Guam', 'Puerto Rico', 'United States Virgin Islands', 
                       'Commonwealth of the Northern Mariana Islands', 'Alaska', 'Hawaii']

# Use `query` to keep only features where the NAME values is NOT in our list
mainland_usa = usa.query('NAME not in @not_mainland_names')

Now, we can again use the plot function to make sure our new dataset only contains features from mainland US. Looks good!

In [7]:
mainland_usa.plot(color='white')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1d133160>

How many sightings are there per state?

Now we'll use a spatial join to determine how many Bigfoot sightings have occurred in each state.

In geopandas we use the sjoin function to perform a spatial join. (Note: Ignore the "CRS does not match!" warning for now.)

In [8]:
points_in_polygon = geopandas.sjoin(sightings, mainland_usa, how='inner', op='within')

Next we count the points in each state and create a new attribute called SIGHTINGS_COUNT to hold this count for each state.

In [9]:
# These lines of code might look scary, but all you need to know is that this code aggregates
# all points within each state and creates a new attribute called SIGHTINGS_COUNT that reflects 
# the number of sightings in that state.
counts = points_in_polygon.groupby('NAME').size()
counts = counts.to_frame('SIGHTINGS_COUNT').reset_index()
mainland_with_counts = mainland_usa.merge(counts)

We can now visialize our data as a choropleth map by telling the plot function that we would like to use the SIGHTINGS_COUNT attribute as our thematic variable.

In [10]:
mainland_with_counts.plot(column='SIGHTINGS_COUNT', cmap='Reds', scheme='quantiles', k=7, legend=True)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x20b85f98>

What percentage of sightings are in Wisconsin?

Finally, we can find out if Sasquatch is a cheesehead! This is a rather silly example, so we'll take an equally silly (and simplistic) approach to answering this question by merely finding out what percentage of the total mainland US sightings are in Wisconsin.

First we select the Wisconsin feature and get the SIGHTINGS_COUNT value.

In [11]:
wi_with_count = mainland_with_counts.query('NAME == "Wisconsin"')
wi_sightings = wi_with_count.SIGHTINGS_COUNT.iloc[0]

Next we find the total number of sightings in mainland US and then use this value to find the percentage of sightings that occurred in Wisconsin.

In [12]:
mainland_total_sightings = mainland_with_counts.SIGHTINGS_COUNT.sum()
wi_percentage = 100 * wi_sightings / float(mainland_total_sightings)

We now simply print out the percentage.

In [13]:
print "{0:.2f} percent of sightings are in WI.".format(wi_percentage)
1.81 percent of sightings are in WI.

Only about 2% of total sightings occurred in Wisconsin, so we must conclude that Bigfoot is NOT a cheesehead!


3. Appendix

Credits

The idea for this notebook was inspired by work from Joshua Stevens and Timothy Renner.

Data Sources

  • Data on bigfoot sightings (bigfoot_sightings.shp) came from Timothy Renner.
  • US state boundaries (cb_2016_us_state_5m.shp) came from the Census Bureau.

This is a short list of other useful Python geospatial modules.

  • GDAL/OGR: A Python implementation of GDAL/OGR
  • arcpy: ESRI's Python interface to ArcGIS.
  • fiona: A user friendly interface for reading and writing geospatial data.
  • shapely: A module for performing geometric operations.
  • pyproj: For working with map projections.