Tag Maps rendering with Python and Mapnik

Alexander Dunkel, TU Dresden; Institute of Cartography

TL;DR This notebook illustrates the process for rendering of tag maps shapefiles in Mapnik. This notebook can be run with the jupyter docker for cartography (docker tag jupyterlab:mapnik), which contains the stable Mapnik installation and Python bindings, but must be accessed through the system Python installation (/usr/bin/python3), not the conda environment.

The source files for this notebook are available in https://gitlab.vgiscience.de/ad/tagmaps-mapnik-jupyter.

•••
Out[1]:

Last updated: Jul-29-2022

Preparations

Import dependencies

In [68]:
import sys
import math
import fiona
import rasterio
import numpy as np
import contextily as cx
import mapclassify as mc
import libpysal as lps
from IPython import display
from contextily import Place
from typing import Tuple, Optional
from rasterio.plot import show as rioshow
from pathlib import Path
from esda.getisord import G_Local
from shapely.geometry import shape

We are using a number of helper functions from py/modules/tools.py to classify values.

In [3]:
module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules import tools
In [4]:
%load_ext autoreload
%autoreload 2

Test Mapnik bindings

In [5]:
!/usr/bin/python3 -c "import mapnik;print(mapnik.__file__)"
/usr/lib/python3/dist-packages/mapnik/__init__.py

Global Parameters

Some parameters are used throughout this notebook. Adjust to your needs.

In [6]:
INPUT: Path = Path.cwd().parents[0] / "input" # define path to input and working directory (shapefiles, stylesheets etc.)
OUTPUT: Path = Path.cwd().parents[0] / "output" # define path to output directory (map, notebook html etc.)
TMP: Path = INPUT / "intermediate"
MAP_X: int = 2500 # x dimensions of the final map, in pixels
MAP_Y: int = 1400 # y dimensions of the final map, in pixels
In [61]:
for folder in [OUTPUT, TMP]:
    if not folder.exists():
        folder.mkdir()

Test creating a map

  • Data source: Shapefiles in input/shapefiles, created using TagMaps
    • Original data from Flickr, Twitter and Instagram for the TUD Campus
    • Clustered and aggregated to illustrate collective use of terms (allTagCluster.shp)
    • and overall frequentation of places in the area (allLocationCluster.shp)
  • see this notebook for how to generate these files
  • A mapnik stylesheet input/tagmap_style.xml, containing the style information for how the data should be rendered
  • A python script input/tagmap_xml.py, using Mapnik Python bindings to process data and generate the map

The mapnik renderer can be accessed through Python bindings, available in the system python installation. Below, we use a python script that provides specific instructions to render a map in Mapnik.

In [157]:
%%time
!cd {INPUT} && /usr/bin/python3 tagmap_xml.py
CPU times: user 2.98 s, sys: 1.32 s, total: 4.29 s
Wall time: 2min 27s

Load the generated image to jupyter

In [158]:
display.Image(f"{OUTPUT}/tagmap_style.png")
Out[158]:

Add Basemap

Contextily allows to plot existing basemaps in Python.

In [8]:
berlin = Place("Berlin", source=cx.providers.Esri.WorldImagery)
ax = berlin.plot()

It also provides options to store results as geotiff, see the docs.

In [10]:
filename = f"{OUTPUT}/berlin_carto.tif"
img, ext = cx.bounds2raster(
    berlin.w, berlin.s, berlin.e, berlin.n,
    filename, source=cx.providers.CartoDB.PositronNoLabels,
    ll=True)
In [11]:
with rasterio.open(filename) as r:
    rioshow(r)

Get the basemap for our tagmaps TUD Campus area. Also add zoom parameter, to get a higher level of detail.

In [12]:
zoom = 17
bg_name = f"{INPUT}/bg/tudcampus_carto_{zoom}.tif"
if not Path(bg_name).exists():
    img, ext = cx.bounds2raster(
        13.71216, 51.0218707395, 13.749046, 51.0340579,
        bg_name, source=cx.providers.CartoDB.PositronNoLabels,
        ll=True, zoom=zoom)

Render tagmap with basemap

Below, use mapnik-cli to use map generation with parameters.

In [52]:
stylesheet = "tagmap_bg_style.xml"
output_name = "tagmap_bg_style.png"
In [53]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}
CPU times: user 3.03 s, sys: 1.25 s, total: 4.27 s
Wall time: 2min 25s
In [54]:
display.Image(f'{OUTPUT}/{output_name}')
Out[54]:

Add Cluster Points

TagMaps puts out two shapefiles, one for Emoji+Tag Clusters (allTagCluster.shp), and one for Location clusters (allLocationCluster), which can be used to visualize overall frequentation patterns, regardless of the tags/emoji used.

The steps to visualize these clusters in ESRI ArcMap are explained in the docs. The process requires additional tools, such as Incremental Spatial Autocorrelation and the Getis-Ord GI* Statistic.

Below, the same approach is followed using Open Source Software in Python.

In [65]:
stylesheet = "tagmap_points_style.xml"
output_name = "tagmap_points_style.png"

For the actual Point, we need to create an svg.

Create the svg first:

  • the radius r defines how 'smooth' the svg curve is (how many vertices are used), adapt to your final output map dimension
In [66]:
point_svg = \
"""<?xml version="1.0" standalone="yes"?>
<svg height="128" width="128" version="2"
     xmlns="http://www.w3.org/2000/svg">
  <circle cx="0" cy="0" r="100" stroke="white" stroke-width="10" fill="#849EB9" />
</svg>
"""

Write svg to file:

In [67]:
with open(TMP / "circle.svg", "w") as svg_file:
    svg_file.write(point_svg)
In [68]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}
CPU times: user 57 ms, sys: 31.1 ms, total: 88.1 ms
Wall time: 2.73 s
In [69]:
display.Image(f'{OUTPUT}/{output_name}')
Out[69]:

The Mapnik PointSymbolizer has little ability to modify size of points. Directly Mapping the [Join_Count] or [Weights] attribute from the Location cluster shapefile provides no useful output: The largest cluster dominates in the map above.

We can read in the shapefile using Fiona and add a field, with values to be used as the svg scale factor for points in Mapnik's PointSymbolizer. This direction provides more flexibility to influence the style of the location cluster layer. Below is mainly following the docs.

Read allLocationCluster.shp

In [70]:
data_src = Path(INPUT / "shapefiles" / "allLocationCluster.shp")
locations = fiona.open(data_src, encoding='UTF-8', mode="r")

The Join_Count refers to the number of user that took photos or posted from a particular cluster. Weights is a normalized version of Join_Count, to the values range 1..1000.

In [71]:
locations.schema
Out[71]:
{'properties': OrderedDict([('Join_Count', 'int:18'),
              ('Weights', 'float:24.15')]),
 'geometry': 'Point'}

Copy the source schema and add a new property, for point size.

In [72]:
updated_schema = locations.schema
updated_schema["properties"]["point_size"] = "float"

Write a new shapefile with the same format and coordinate reference system as the source, with the new field.

The formula used for point size below:

4+(math.sqrt(feature["properties"]["Join_Count"])*8)

The use of square root means that the largest values will be affected most, which helps a bit reducing the dominant largest clusters on the map.

In [73]:
with fiona.open(
        Path(TMP / "allLocationCluster_Size.shp"), "w",
        crs=locations.crs,
        driver=locations.driver,
        schema=updated_schema,
    ) as shapefile:
        for feature in locations:
            _size = 4+(math.sqrt(feature["properties"]["Join_Count"])*8)
            feature["properties"].update(point_size=_size)
            shapefile.write(feature)

Create map

In [74]:
stylesheet = "tagmap_pointssize_style.xml"
output_name = "tagmap_pointssize_style.png"
In [75]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}
CPU times: user 54.9 ms, sys: 33.2 ms, total: 88 ms
Wall time: 2.68 s
In [76]:
display.Image(f'{OUTPUT}/{output_name}')
Out[76]:

Create a map with all labels

In [77]:
stylesheet = "tagmap_pointssize_style_v2.xml"
output_name = "tagmap_pointssize_style_v2.png"
In [78]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}
CPU times: user 3.32 s, sys: 1.43 s, total: 4.75 s
Wall time: 2min 36s
In [79]:
display.Image(f'{OUTPUT}/{output_name}')
Out[79]:

Local Autocorrelation: Hot Spots, Cold Spots, and Spatial Outliers

The process below is based on the PySal Notebook Exploratory Analysis of Spatial Data: Spatial Autocorrelation and the API docs for esda.G_Local.

Loading location clusters and preparing the point data set for spatial analysis. We are using Join_Count field as the weight for clusters in our spatial autocorrelation analysis.

In [150]:
points = []
weights = []
with fiona.open(data_src, encoding='UTF-8', mode="r") as shapefile:
    for feature in locations:
        point = (feature["geometry"]['coordinates'][0], feature["geometry"]['coordinates'][1])
        points.append(point)
        weights.append(float(feature["properties"]["Join_Count"]))

The points are already projected to a UTM coordinate system (33N), meaning that the values are given in Meters.

In [151]:
points[:5]
Out[151]:
[(409779.71121669677, 5653978.9132233495),
 (410910.41373917606, 5654203.42148416),
 (411427.90304027294, 5653979.014157857),
 (410355.74448030593, 5653851.490872544),
 (412093.73301671044, 5654235.220379112)]
In [152]:
weights[:5]
Out[152]:
[1.0, 1.0, 1.0, 1.0, 1.0]

Below, libpysal is used to create a weights object from points.

The threshold value referc to the distance threshold and is specified in Meters (based on the UTM Zone 33 Coordinate System). Smaller values will include less points, larger values will include more points in the distance matrix of the statistical calculation. The result is a smooth transition between colors (larger values), or more spatially fine grained color classification (smaller values).

In [153]:
distance_threshold = 350
In [154]:
w = lps.weights.DistanceBand(points, threshold=distance_threshold, binary=False)

Preparing the weights variable

In [85]:
y = np.array(weights)

Applying Getis and Ord local G* test using a binary weights object

There are a number of other examples in the pysal/esda docs for parameters. We are using the Getis Ord Gi*, where the star means focal observation.

In [86]:
%%time
lg_star = G_Local(y, w, transform='B', star=True)
CPU times: user 2.03 s, sys: 89 ms, total: 2.12 s
Wall time: 10 s
In [87]:
lg_star.Zs
Out[87]:
array([-0.32400565, -0.209922  ,  0.30461145, ...,  0.91187328,
        0.10473242,  1.39941857])

p-value based on standard normal approximation from permutations. Smaller p-values mean less significance.

In [88]:
round(lg_star.p_sim[0], 3)
Out[88]:
0.398

Classify values

PySal mapclassify provides classification schemes such as Quantiles, HeadTailBreaks, or NaturalBreaks.

In [89]:
print(lg_star.Zs.min())
print(lg_star.Zs.max())
-0.9078873268279701
2.3824995596530747
In [90]:
scheme_breaks = tools.classify_data(
    values=lg_star.Zs, scheme="Quantiles")
In [91]:
scheme_breaks
Out[91]:
Quantiles             

   Interval      Count
----------------------
[-0.91, -0.61] |   163
(-0.61, -0.43] |   163
(-0.43, -0.31] |   163
(-0.31, -0.03] |   162
(-0.03,  0.38] |   162
( 0.38,  1.37] |   163
( 1.37,  2.38] |   163
In [92]:
cat_nr = scheme_breaks.find_bin(
    lg_star.Zs)
In [93]:
cat_nr
Out[93]:
array([2, 3, 4, ..., 5, 4, 6])

Update schema and write to shapefile

In [94]:
updated_schema["properties"]["cat_nr"] = "int"
In [95]:
with fiona.open(
        Path(TMP / "allLocationCluster_SizeClass.shp"), "w",
        crs=locations.crs,
        driver=locations.driver,
        schema=updated_schema,
    ) as shapefile:
        for ix, feature in enumerate(locations):
            _size = 4+(math.sqrt(feature["properties"]["Join_Count"])*8)
            feature["properties"].update(point_size=_size)
            feature["properties"].update(cat_nr=int(cat_nr[ix]))
            shapefile.write(feature)

Create SVG markers for classes

The last remaining step is to assign colors to classes. Since Mapnik style sheets provide no parameter for color values, everything must be defined in the svg. We will create a separate svg for each of the 7 classes, each having a different color defined.

Define list of colors (hot and cold spots):

In [96]:
hex_list = ['#4576b5', '#849eba', '#c0ccbe', '#ffffc0', '#fab984', '#ed7551', '#d62f27']
In [97]:
tools.display_hex_colors(hex_list)
  #4576b5 #849eba #c0ccbe #ffffc0 #fab984 #ed7551 #d62f27
Colors
In [52]:
def get_svg_circle(color: str) -> str:
    """Returns a circle as svg with the given color"""
    return \
    f"""<?xml version="1.0" standalone="yes"?>
    <svg height="128" width="128" version="2"
         xmlns="http://www.w3.org/2000/svg">
      <circle cx="0" cy="0" r="100" stroke="white" stroke-width="10" fill="{color}" />
    </svg>
    """
In [99]:
for ix, hex_color in enumerate(hex_list):
    with open(TMP / f"circle_{ix}.svg", "w") as svg_file:
        point_svg = get_svg_circle(color=hex_color)
        svg_file.write(point_svg)

Produce Map

In [9]:
stylesheet = "tagmap_production.xml"
output_name = "tagmap_production.png"
In [101]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}
CPU times: user 3.12 s, sys: 1.32 s, total: 4.43 s
Wall time: 2min 29s
In [102]:
display.Image(f'{OUTPUT}/{output_name}')
Out[102]:

Focus regions

Commonly, these maps would be interactively explored using a tile server with multiple zoom levels. In order to zoom in to certain clusters, we can also create static close-ups of focus regions.

To locate elements, select a hashtag that is written bold (e.g. BISMARCKTURM). Bold tells us that this is the most important cluster of the hashtag BISMARCKTURM. In the data table, this entry is highlighted with himp=1. We can use this information to get the latitude and longitude of the element and tell Mapnik to zoom into this location.

In [31]:
data_src = Path(INPUT / "shapefiles" / "allTagCluster.shp")
focus_tag = 'BISMARCKTURM'
In [34]:
def add_buffer_bbox(
        bbox: Tuple[float,float,float,float], buffer: int) -> Tuple[float,float,float,float]:
    """Add buffer to bbox tuple (Meters)"""
    return (bbox[0]-buffer, bbox[1]-buffer, bbox[2]+buffer, bbox[3]+buffer)
In [69]:
def find_feature(data_src: Path, feature_name: str, add_buffer: Optional[int]) -> Tuple[float,float,float,float]:
    """Returns bounding box of a feature (x, y), if found"""
    with fiona.open(data_src, encoding='UTF-8', mode="r") as shapefile:
        for feature in shapefile:
            properties = feature["properties"]
            if properties["HImpTag"] == 1 and properties["ImpTag"] == feature_name.lower():
                bounds = shape(feature["geometry"]).bounds
                if add_buffer:
                    bounds = add_buffer_bbox(bounds, buffer = add_buffer)
                return str(bounds).lstrip("(").rstrip(")").replace(" ","")
In [105]:
focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag)
In [128]:
focus_bbox
Out[128]:
(411271.46579418617, 5652996.481350748, 411351.8284562295, 5653134.999408296)

Create map and zoom in to bounding box

In [132]:
output_name = f"tagmap_production_{focus_tag.lower()}.png"
In [133]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 500 \
    --map_dimensiony_y 250 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}
CPU times: user 154 ms, sys: 91 ms, total: 245 ms
Wall time: 8.48 s
In [134]:
display.Image(f'{OUTPUT}/{output_name}')
Out[134]:

Sometimes, the bounding box returned from the cluster is small. We can add a buffer to crop the map to a larger area, for the cluster under investigation.

In [35]:
focus_tag = 'BOCKBIERANSTICH'
focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag, add_buffer=100)
output_name = f"tagmap_production_{focus_tag.lower()}.png"
In [146]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 500 \
    --map_dimensiony_y 250 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}
CPU times: user 311 ms, sys: 158 ms, total: 469 ms
Wall time: 16.7 s
In [147]:
display.Image(f'{OUTPUT}/{output_name}')
Out[147]:

Experiment

Now it is time to experiment. For instance, changing the background map to aerial imagery and filtering only emoji.

In [48]:
focus_tag = 'TUDRESDEN'
In [7]:
zoom = 17
bg_name = f"{INPUT}/bg/tudcampus_aerial_{zoom}.tif"
if not Path(bg_name).exists():
    img, ext = cx.bounds2raster(
        13.71216, 51.0218707395, 13.749046, 51.0340579,
        bg_name, source=cx.providers.Esri.WorldImagery,
        ll=True, zoom=zoom)

Create a white svg circle

In [58]:
with open(TMP / f"circle_white.svg", "w") as svg_file:
    point_svg = get_svg_circle(color='#FFFFFF')
    svg_file.write(point_svg)

Zoom to Campus

In [102]:
focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag, add_buffer=450)

Visualize

In [103]:
stylesheet = "tagmap_aerial_emoji.xml"
output_name = f"tagmap_production_aerial_{focus_tag.lower()}.png"
In [104]:
%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 1600 \
    --map_dimensiony_y 900 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}
CPU times: user 168 ms, sys: 100 ms, total: 268 ms
Wall time: 8.67 s
In [105]:
display.Image(f'{OUTPUT}/{output_name}')
Out[105]:

Further work

Experiment further, e.g.:

  • Emoji Font: Replace Segoe UI Symbol Regular in input/*.xml Stylesheets with another Emoji font, e.g. Twitter Color Emoji Regular, Segoe UI Symbol Regular, Segoe UI Emoji Regular, Noto Emoji Regular, Symbola Regular etc.
  • Use gray location markers and use color to classify different types of emoji, e.g. all emoji of the unicode group "activity emoji" as red
  • Adjust Rules in XML styles to affect label placement
  • Create a custom basemap from vector/osm data
  • Use of different svg markers to classify different types of places (e.g. based on the majority of posts from Instagram, Twitter, or Flickr)
  • Output a tile map with mapnik, for different resolutions and areas

Create Notebook HTML

In [106]:
!jupyter nbconvert --to html_toc \
    --output-dir=../output/html ./01_mapnik-tagmaps.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&- # create single output file and suppress output

Copy HTML file to resource folder

In [107]:
!cp ../output/html/01_mapnik-tagmaps.html ../resources/html/
In [ ]: