Plotting African ecosystems
This post is my first foray into plotting geospatial data. I have no prior experience whatsoever, so there may be mistakes here. If you spot anything that looks wrong, please drop a comment below.
For a paper we’re currently writing I need to plot a map of sub-Saharan Africa with colours to indicate something about the type of ecosystem or land-cover. I found this map of standardized terrestrial ecosystems for which the underlying data are freely available and can be downloaded from the USGS website. I need to overlay some points and other graphics on this map, and so I’d like to plot the data using matplotlib.
I’ve downloaded the data to a local directory and unzipped, here are the file contents:
total 904M
-rw-rw-r-- 1 aliman aliman 82 Mar 20 2013 Africa_IVC_20130316_final_MG.tfw
-rw-rw-r-- 1 aliman aliman 643M Mar 20 2013 Africa_IVC_20130316_final_MG.tif
-rw-rw-r-- 1 aliman aliman 50K Mar 20 2013 Africa_IVC_20130316_final_MG_tif_arc10_1.lyr
-rw-rw-r-- 1 aliman aliman 49K Apr 2 2013 Africa_IVC_20130316_final_MG_tif_arc10.lyr
-rw-rw-r-- 1 aliman aliman 2.8K Oct 3 17:09 Africa_IVC_20130316_final_MG.tif.aux.xml
-rw-rw-r-- 1 aliman aliman 260M Apr 2 2013 Africa_IVC_20130316_final_MG.tif.ovr
-rw-rw-r-- 1 aliman aliman 45K Oct 4 14:40 Africa_IVC_20130316_final_MG.tif.vat.csv
-rw-rw-r-- 1 aliman aliman 393K Mar 20 2013 Africa_IVC_20130316_final_MG.tif.vat.dbf
-rw-rw-r-- 1 aliman aliman 3.1K Mar 27 2013 Africa_IVC_20130316_final_MG.tif.xml
-rw-rw-r-- 1 aliman aliman 9.1K Mar 27 2013 Africa_IVC_20130316_final_MG.xml
-rw-rw-r-- 1 aliman aliman 69K Mar 23 2013 African and Malagasy Veg_Macrogroups_2013.xlsx
Inspect the GeoTIFF file
The main data are contained in the file Africa_IVC_20130316_final_MG.tif
which is a GeoTIFF file containing raster data. To read a GeoTIFF file we can use GDAL.
gdal 2.1.1
The first thing to do is figure out what spatial reference system (a.k.a. coordinate reference system) has been used.
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433],AUTHORITY["EPSG","4326"]]
The proj_wkt
variable above is a string formatted according to the well-known text (WKT) markup language. We can convert this to a spatial reference object.
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Here “GEOGCS” stands for “geographic coordinate system”, where positions are measured in longitude and latitude relative to a geodetic datum, which roughly speaking is a definition of the shape of the Earth.
Note that this is not a projected coordinate system. There is some nice documentation on geographic and projected coordinate systems on the GDAL website.
The next thing to do is figure out where the boundaries of this image are.
(-26.00013888888887,
0.0008333333333,
0.0,
38.00013888888887,
0.0,
-0.0008333333332999999)
These numbers define how to transform from pixel raster space to coordinate space. From the GDAL documentation and this helpful blog post by Max König, if geo_transform[2]
and geo_transform[4]
are zero then the image is “north up”. Then geo_transform[1]
is the pixel width, geo_transform[5]
is the pixel height, and the upper left corner of the upper left pixel is at position (geo_transform[0], geo_transform[3])
.
Get some information about the raster data.
(108000, 87600)
1
'Int32'
So the dataset contains a single band of integers. This makes sense as the data are a classification of ecosystems, and so each integer corresponds to an ecosystem class.
The dataset is also big, too large to fit in memory.
'37.8 GB'
Here’s a few useful things to know about the band, thanks to the Python GDAL/OGR Cookbook.
-2147483648.0
0.0
971.0
True
Extract the colour table
Before we can do anything exciting like actually plotting data, we need to extract the colour table that maps the integer codes onto colours. This has been provided by USGS as a .dbf
(DBase) file, which I’ll convert to text for further processing (thanks to pvanb for hints on how to handle this). Prepare for some munging.
Field Name Type Length Decimal Pos
Value N 9 0
Count F 19 11
hierarchy C 254 0
class C 254 0
subclass C 254 0
formation C 254 0
formation C 254 0
division k C 254 0
division c C 254 0
Division C 254 0
Mgkey C 254 0
Mg code C 254 0
Mg name fi C 254 0
Macrogroup C 254 0
Mapped N 4 0
Red F 13 11
Green F 13 11
Blue F 13 11
Opacity F 13 11
0:2.73051336800e+009:::::::::::::0:0.00000e+000:0.00000e+000:0.00000e+000:1.00000e+000:
1:3.36479726000e+008:1.A.2.Fd:1 Forest to Open Woodland:1.A Tropical Forest:1.A.2:1.A.2 Tropical Lowland Humid Forest:D147:1.A.2.Fd:1.A.2.Fd Guineo-Congolian Evergreen & Semi-Evergreen Rainforest:MA001:1.A.2.Fd.1:1.A.2.Fd.1-Guineo-Congolian Evergreen Rainforest:Guineo-Congolian Evergreen Rainforest:1:0.00000e+000:4.58824e-001:3.92157e-002:1.00000e+000:
3:3.99018260000e+007:1.A.2.Fd:1 Forest to Open Woodland:1.A Tropical Forest:1.A.2:1.A.2 Tropical Lowland Humid Forest:D147:1.A.2.Fd:1.A.2.Fd Guineo-Congolian Evergreen & Semi-Evergreen Rainforest:MA003:1.A.2.Fd.3:1.A.2.Fd.3-Guineo-Congolian Semi-Deciduous Rainforest:Guineo-Congolian Semi-Deciduous Rainforest:1:6.66667e-002:4.82353e-001:4.70588e-002:1.00000e+000:
Field Name | Type | Length | Decimal Pos |
---|---|---|---|
Value | N | 9 | 0 |
Count | F | 19 | 11 |
hierarchy | C | 254 | 0 |
class | C | 254 | 0 |
subclass | C | 254 | 0 |
formation | C | 254 | 0 |
formation | C | 254 | 0 |
division k | C | 254 | 0 |
division c | C | 254 | 0 |
Division | C | 254 | 0 |
Mgkey | C | 254 | 0 |
Mg code | C | 254 | 0 |
Mg name fi | C | 254 | 0 |
Macrogroup | C | 254 | 0 |
Mapped | N | 4 | 0 |
Red | F | 13 | 11 |
Green | F | 13 | 11 |
Blue | F | 13 | 11 |
Opacity | F | 13 | 11 |
Value | Count | hierarchy | class | subclass | formation | formation | division k | division c | Division | Mgkey | Mg code | Mg name fi | Macrogroup | Mapped | Red | Green | Blue | Opacity |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2730513368.0 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | ||||||||||||
1 | 336479726.0 | 1.A.2.Fd | 1 Forest to Open Woodland | 1.A Tropical Forest | 1.A.2 | 1.A.2 | D147 | 1.A.2.Fd | 1.A.2.Fd Guineo-Congolian Evergreen & Semi-Evergreen Rainforest | MA001 | 1.A.2.Fd.1 | 1.A.2.Fd.1-Guineo-Congolian Evergreen Rainforest | Guineo-Congolian Evergreen Rainforest | 1 | 0.0 | 0.458824 | 0.0392157 | 1.0 |
3 | 39901826.0 | 1.A.2.Fd | 1 Forest to Open Woodland | 1.A Tropical Forest | 1.A.2 | 1.A.2 | D147 | 1.A.2.Fd | 1.A.2.Fd Guineo-Congolian Evergreen & Semi-Evergreen Rainforest | MA003 | 1.A.2.Fd.3 | 1.A.2.Fd.3-Guineo-Congolian Semi-Deciduous Rainforest | Guineo-Congolian Semi-Deciduous Rainforest | 1 | 0.0666667 | 0.482353 | 0.0470588 | 1.0 |
4 | 3666429.0 | 1.A.2.Fd | 1 Forest to Open Woodland | 1.A Tropical Forest | 1.A.2 | 1.A.2 | D147 | 1.A.2.Fd | 1.A.2.Fd Guineo-Congolian Evergreen & Semi-Evergreen Rainforest | MA004 | 1.A.2.Fd.4 | 1.A.2.Fd.4-Guineo-Congolian Littoral Rainforest | Guineo-Congolian Littoral Rainforest | 1 | 0.027451 | 0.431373 | 0.0392157 | 1.0 |
5 | 1398415.0 | 1.A.2.Fe | 1 Forest to Open Woodland | 1.A Tropical Forest | 1.A.2 | 1.A.2 | D148 | 1.A.2.Fe | 1.A.2.Fe Malagasy Evergreen & Semi-Evergreen Forest | MA005 | 1.A.2.Fe.5 | 1.A.2.Fe.5-Madagascar Evergreen Littoral Forest | Madagascar Evergreen Littoral Forest | 1 | 0.054902 | 0.588235 | 0.184314 | 1.0 |
...
Now build a matplotlib colour map.
971
array([[ 0. , 0. , 0. ],
[ 0. , 0.458824 , 0.0392157],
[ 0. , 0. , 0. ],
...,
[ 1. , 0.882353 , 0.882353 ],
[ 0. , 0. , 0. ],
[ 1. , 0.921569 , 0.745098 ]])
0
Extract raster data
To plot the data I’m going to load from the GeoTIFF file into a numpy array. The size of the raster is too big to load the whole thing into memory, so I need to resample the data. A very nice feature of the GDAL API is the ability to specify the size of the output buffer along with the resampling algorithm at the point of extracting data from the GeoTIFF file.
array([[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648],
[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648],
[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648],
...,
[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648],
[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648],
[-2147483648, -2147483648, -2147483648, ..., -2147483648,
-2147483648, -2147483648]], dtype=int32)
(876, 1080)
Before we do any proper plotting with cartopy or basemap, let’s just take a quick look at the data to check the color map is correct.
Plot with cartopy
The quick plot above just plots the raster data without any projection onto a coordinate system. For doing this and other useful geospatial plotting things there are two Python libraries, cartopy and basemap. Let’s look at cartopy first.
cartopy 0.14.2
Define the extent of the data.
Cartopy needs a projection for plotting. In this nice example notebook from Filipe Fernandes the EPSG code from the projection information given in the GeoTIFF file is used to create a projection via cartopy. However, we cannot do that here because the GeoTIFF we’re working with uses a geographical coordinate system and not a projected coordinate system.
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
In this case, the datum is ‘WGS84’ which is also the default globe that cartopy uses. And because we are using longitude and latitude coordinates, the appropriate projection is the plate carrée projection which maps meridians to vertical straight lines of constant spacing, and circles of latitude to horizontal straight lines of constant spacing, assuming the standard parallel is the equator. I.e., longitude and latitude become X and Y coordinates.
Now we can plot using this projection, adding in coastlines and country borders.
The plate carrée projection distorts both area and shape. Apparently Lambert azimuthal equal area is a better projection for plotting continental Africa, so let’s use that.
This takes a little bit longer to run, I’m guessing because the data are being transformed on the fly.
Plot with basemap
Let’s try and do the same thing with basemap.
basemap 1.0.8
Basemap docs on managing projections and plotting raster data were useful. Note that there is no South Sudan, so for publication images a more up-to-date set of country borders is needed.
Here’s the same plot using a different map projection.
Credit to this SO answer for help on transforming coordinates from lon/lat to the map’s projection.
Further reading
- USGS global ecosystems
- A New Map of Standardized Terrestrial Ecosystems of Africa
- GDAL - Geospatial Data Abstraction Library
- Python GDAL/OGR Cookbook
- Cartopy
- Matplotlib Basemap Toolkit
- QGIS
- Geospatial data with Python (tutorial from SciPy 2013)
- Reading Raster Data with Python and gdal (blog)
- Plotting geotiff with cartopy (blog)
- Creating a QGIS color map from text file (blog)