Workshop: Social Media, Data Science, & Cartograpy
Alexander Dunkel, Madalina Gugulica

First step: Enable worker_env in jupyter lab

!cd .. && sh activate_workshop_env.sh

activate_workshop_env.sh: 7: activate_workshop_env.sh: /projects/p_lv_mobicart_2021/workshop_env/bin/python: not found

Well done!

Welcome to the IfK Social Media, Data Science, & Cartograpy workshop.

This is the first notebook in a series of four notebooks:

Introduction to Social Media data, jupyter and python spatial visualizations
Introduction to privacy issues with Social Media data and possible solutions for cartographers
Specific visualization techniques example: TagMaps clustering
Specific data analysis: Topic Classification

Open these notebooks through the file explorer on the left side.

tl;dr

Please make sure that "workshop_env" is shown on the top-right corner. If not, click & select.
use SHIFT+ENTER to walk through cells in the notebook

User Input

We'll highlight sections where you can change parameters
All other code is intended to be used as-is, do not change

FAQ

If you haven't worked with jupyter, these are some tips:

Jupyter Lab allows to interactively execute and write annotated code
There are two types of cells: Markdown cells contain only text (annotations), Code cells contain only python code
Cells can be executed by SHIFT+Enter
The output will appear below
States of python will be kept in-between code cells: This means that a value assigned to a variable in one cell remains available afterwards
This is accomplished with IPython, an interactive version of python
Important: The order in which cells are executed does not have to be linear. It is possible to execute any cell in any order. Any code in the cell will use the current "state" of all other variables. This also allows you to update variables.

LINKS

Some links

This notebook is prepared to be used from the TUD ZIH Jupyter Hub
.. but it can also be run locally. For example, with our IfK Jupyter Lab Docker Container
The contents of this workshop are available in a git repository
There, you'll also find static HTML versions of these notebooks.

AVAILABLE PACKAGES

This python environment is prepared for spatial data processing/ cartography.
The following is a list of the most important packages, with references to documentation:

We will explore some functionality of these packages in this workshop.
If you want to run these notebooks at home, try the IfK Jupyter Docker Container, which includes the same packages.

Preparations¶

We are creating several output graphics and temporary files.

These will be stored in the subfolder notebooks/out/.

from pathlib import Path

OUTPUT = Path.cwd() / "out"
OUTPUT.mkdir(exist_ok=True)

Syntax: pathlib.Path() / "out" ?

Python pathlib provides a convenient, OS independend access to local filesystems. These paths work independently of the OS used (e.g. Windows or Linux). Path.cwd() gets the current directory, where the notebook is running. See the docs..

To reduce the code shown in this notebook, some helper methods are made available in a separate file.

Load helper module from ../py/module/tools.py.

import sys

module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules import tools

Activate autoreload of changed python files:

%load_ext autoreload
%autoreload 2

Introduction: VGI and Social Media Data¶

Broadly speaking, GI and User Generated Content can be classified in the following three categories of data:

Authoritative data that follows objective criteria of measurement such as Remote Sensing, Land-Use, Soil data etc.
Explicitly volunteered data, such as OpenStreetMap or Wikipedia. This is typically collected by many people, who collaboratively work on a common goal and follow more or less specific contribution guidelines.
Subjective information sources

Explicit: e.g. Surveys, Opinions etc.
Implicit: e.g. Social Media

Social Media data belongs to the third category of subjective information, representing certain views held by groups of people. The difference to Surveys is that there is no interaction needed between those who analyze the data and those who share the data online, e.g. as part of their daily communication.

Social Media data is used in marketing, but it is also increasingly important for understanding people's behaviour, subjective values, and human-environment interaction, e.g. in citizen science and landscape & urban planning. In this notebook, we will explore basic routines how Social Media and VGI can be accessed through APIs and visualized in python.

Social Media APIs¶

Social Media data can be accessed through public APIs.
This will typically only include data that is explicitly made public by users.
Social Media APIs exist for most networks, e.g. Flickr, Twitter, or Instagram

Privacy?

We'll discuss legal, ethical and privacy issues with Social Media data in the second notebook: 02_hll_intro.ipynb

Instagram Example¶

Retrieving data from APIs requires a specific syntax that is different for each service.
commonly, there is an endpoint (a url) that returns data in a structured format (e.g. json)
most APIs require you to authenticate, but not all (e.g. Instagram, commons.wikimedia.org)

But the Instagram API was discontinued!

Instagram discontinued their official API in October 2018. However, their Web-API is still available, and can be accessed even without authentication.
One rationale is that users not signed in to Instagram can have "a peek" at images, which provides significant attraction to join the network.
We'll discuss questions of privacy and ethics in the second notebook.

Load Instagram data for a specific Hashtag.

hashtag = "park"
query_url = f'https://www.instagram.com/explore/tags/{hashtag}/?__a=1'

Use your own hashtag

Optionally replace "park" with another hashtag above

Syntax: f'{}' ?

This is called an f-string, a convenient python convention to concat strings and variables.

from IPython.core.display import HTML
display(HTML(tools.print_link(query_url, hashtag)))

If you're not signed in: Chances are high that you're seeing a "Login" page. Since we are working in a workshop, only very few requests to Instagram non-login API are allowed.
otherwise, you'll see a json object with the latest feed content
In the following, we will try to retrieve this json object and display it.

Use your own json, in case automatic download did not work.

Since it is likely that access without login will not be possible for all in the workshop, we have provided a sample json, that will be retrieved if no access is possible
If automatic download didn't work above, you can use your own json below, by saving the result from the link above (e.g. park.json) and moving it, via drag-and-drop, to the out folder on the left.
This is optional, we also provide a sample download link that will automatically request alternate data

First, try to get the json-data without login. This may or may not work:

import requests

json_text = None
response = requests.get(
    url=query_url, headers=tools.HEADER)

if not response.status_code == 429 and not "/login/" in response.url:
    json_text = response.text
    print("Loaded live json")

Loaded live json

"Loaded live json"? Successful

If you see Loaded live json, loading of json was successful. if nothing is shown, simply continue, to load sample json below.

Optionally, write to temporary file:

if json_text:
    with open(OUTPUT / f"live_{hashtag}.json", 'w') as f:
        f.write(json_text)

If the the url refers to the "login" page (or status_code 429), access is blocked. In this case, get the sample json:

if not json_text:
    # check if manual json exists
    local_json = [json for json in OUTPUT.glob('*.json')]
    if len(local_json) > 0:
        # read local json
        with open(local_json[0], 'r') as f:
            json_text = f.read()
        print("Loaded local json")

Syntax: [x for x in y] ?

This is called a list comprehension, a convenient python convention to to create lists (from e.g. generators etc.).

If neither live nor local json has been loaded, load sample json:

if not json_text:
    sample_url = tools.get_sample_url()
    sample_json_url = f'{sample_url}/download?path=%2F&files=park.json'
    
    response = requests.get(url=sample_json_url)
    json_text = response.text
    print("Loaded sample json")

Turn text into json format:

import json
json_data = json.loads(json_text)

Have a peek at the returned data.

print(json.dumps(json_data, indent=2)[0:550])

{
  "graphql": {
    "hashtag": {
      "id": "17841563668080313",
      "name": "park",
      "allow_following": false,
      "is_following": false,
      "is_top_media_only": false,
      "profile_pic_url": "https://scontent-frt3-1.cdninstagram.com/v/t51.2885-15/e15/c179.0.721.721a/s150x150/166744655_1719090121604264_9033828806473210135_n.jpg?tp=1&_nc_ht=scontent-frt3-1.cdninstagram.com&_nc_cat=108&_nc_ohc=EBg1uEzip80AX_Za9Ag&ccb=7-4&oh=5d812109d80baaa71a671236d9daaa3d&oe=608E3E1B&_nc_sid=4efc9f",
      "edge_hashtag_to_media": {
        "cou

The json data is nested. Values can be accessed with dictionary keys.

total_cnt = json_data["graphql"]["hashtag"]["edge_hashtag_to_media"].get("count")

display(HTML(
    f'''<details><summary>Working with the JSON Format</summary>
    The json data is nested. Values can be accessed with dictionary keys. <br>For example,
    for the hashtag <strong>{hashtag}</strong>, 
    the total count of available images on Instagram is <strong>{total_cnt:,.0f}</strong>.
    </details>
    '''))

Another more flexible data analytics interface is available with pandas.DataFrame().

Dataframe ?

A pandas dataframe is the typical tabular data format used in python data science. Most data can be directly converted to a DataFrame.

import pandas as pd
pd.set_option("display.max_columns", 4)
df = pd.json_normalize(
    json_data["graphql"]["hashtag"]["edge_hashtag_to_media"]["edges"],
    errors="ignore")
pd.reset_option("display.max_columns")

df.transpose()

View the first few images

First, define a function.

PIL Library

The PIL library allows transformation of images. In the example below, we apply the resize function and the ImageFilter.BLUR filter. The Image is processed in-memory. Afterwards, plt.subplot() is used to plot images in a row. Can you modify the code to plot images in a multi-line grid?

from typing import List
import matplotlib.pyplot as plt

from PIL import Image, ImageFilter
from io import BytesIO

def image_grid_fromurl(url_list: List[str]):
    """Load and show images in a grid from a list of urls"""
    count = len(url_list)
    plt.figure(figsize=(11, 18))
    for ix, url in enumerate(url_list):
        r = requests.get(url=url)
        i = Image.open(BytesIO(r.content))
        resize = (150, 150)
        i = i.resize(resize)
        i = i.filter(ImageFilter.BLUR)
        ax = plt.subplot(1, count, ix + 1)
        ax.axis('off')
        plt.imshow(i)

Use the function to display images from "node.display_url" column.

image_grid_fromurl(
    df["node.thumbnail_src"][:10])

Get Images for a location

Similar to hashtags, we can get the results for a specific location using the public Instagram location id.

For example, the location-feed for the Großer Garten is available at: https://www.instagram.com/explore/locations/1893214/.

Have a look at the json and, optionally, repeat the process above with the location-json.

Creating Maps¶

Frequently, VGI and Social Media data contains references to locations such as places or coordinates.
Most often, spatial references will be available as latitude and logitude (decimal degrees and WGS1984 projection).
To demonstrate integration of data, we are now going to query another API, commons.wikimedia.com, to get a list of places near certain coordinates.

Choose a coordinate

Below, coordinates for the Großer Garten are used. They can be found in the json link.
Substitute with your own coordinates of a chosen place.

lat = 51.03711
lng = 13.76318

Get list of nearby places using commons.wikimedia.org's API:

query_url = f'https://commons.wikimedia.org/w/api.php'
params = {
    "action":"query",
    "list":"geosearch",
    "gsprimary":"all",
    "gsnamespace":14,
    "gslimit":50,
    "gsradius":1000,
    "gscoord":f'{lat}|{lng}',
    "format":"json"
    }

response = requests.get(
    url=query_url, params=params)
if response.status_code == 200:
    print(f"Query successful. Query url: {response.url}")

Query successful. Query url: https://commons.wikimedia.org/w/api.php?action=query&list=geosearch&gsprimary=all&gsnamespace=14&gslimit=50&gsradius=1000&gscoord=51.03711%7C13.76318&format=json

json_data = json.loads(response.text)
print(json.dumps(json_data, indent=2)[0:500])

{
  "batchcomplete": "",
  "query": {
    "geosearch": [
      {
        "pageid": 4712421,
        "ns": 14,
        "title": "Category:Gro\u00dfer Garten, Dresden",
        "lat": 51.0375,
        "lon": 13.7631,
        "dist": 43.7,
        "primary": ""
      },
      {
        "pageid": 4312703,
        "ns": 14,
        "title": "Category:Palais im Gro\u00dfen Garten",
        "lat": 51.0378,
        "lon": 13.7628,
        "dist": 81.2,
        "primary": ""
      },
      {
        "pag

Get List of places.

location_dict = json_data["query"]["geosearch"]

Turn into DataFrame.

df = pd.DataFrame(location_dict)
display(df.head())

df.shape

(50, 7)

If we have queried 50 records, we have reached the limit specified in our query. There is likely more available, which would need to be queried using subsequent queries (e.g. by grid/bounding box). However, for the workshop, 50 locations are enough.

Modify data.: Replace "Category:" in column title.

Functions can be easily applied to subsets of records in DataFrames.
although it is tempting, do not iterate through records
dataframe vector-functions are almost always faster and more pythonic

df["title"] = df["title"].str.replace("Category:", "")
df.rename(
    columns={"title":"name"},
    inplace=True)

Turn DataFrame into a GeoDataFrame

GeoDataframe ?

A geopandas GeoDataFrame is the spatial equivalent of a pandas dataframe. It supports all operations of DataFrames, plus spatial operations. A GeoDataFrame can be compared to a Shapefile in (e.g.), QGis.

import geopandas as gp
gdf = gp.GeoDataFrame(
    df, geometry=gp.points_from_xy(df.lon, df.lat))

Set projection, reproject

Projections in Python

Most available spatial packages have more or less agreed on a standard format for handling projections in python.
The recommended way is to define projections using their epsg ids, which can be found using epsg.io
Note that, sometimes, the projection-string refers to other providers, e.g. for Mollweide, it is "ESRI:54009"

CRS_PROJ = "epsg:3857" # Web Mercator
CRS_WGS = "epsg:4326" # WGS1984
gdf.crs = CRS_WGS # Set projection
gdf = gdf.to_crs(CRS_PROJ) # Project

gdf.head()

Display location on a map

Maplotlib and contextily provide one way to plot static maps.
we're going to show another, interactive map renderer afterwards

Import contextily, which provides static background tiles to be used in matplot-renderer.

import contextily as cx

1. Create a bounding box for the map

x = gdf.loc[0].geometry.x
y = gdf.loc[0].geometry.y

margin = 1000 # meters
bbox_bottomleft = (x - margin, y - margin)
bbox_topright = (x + margin, y + margin)

gdf.loc[0] ?

gdf.loc[0] is the loc-indexer from pandas. It means: access the first record of the (Geo)DataFrame.
.geometry.x is used to access the (projected) x coordinate geometry (point). This is only available for GeoDataFrame (geopandas)

2. Create point layer, annotate and plot.

With matplotlib, it is possible to adjust almost every pixel individual.
However, the more fine-tuning is needed, the more complex the plotting code will get.
In this case, it is better to define methods and functions, to structure and reuse code.

Code complexity

Matplotlib is not made for spatial visualizations.

The code below simply illustrates, how much fine-tuning is possible.

But it also shows the limits of using matplotlib as a backend.

We will learn to use easier methods in the following.

from matplotlib.patches import ArrowStyle
# create the point-layer
ax = gdf.plot(
    figsize=(10, 15),
    alpha=0.5,
    edgecolor="black",
    facecolor="red",
    markersize=300)
# set display x and y limit
ax.set_xlim(
    bbox_bottomleft[0], bbox_topright[0])
ax.set_ylim(
    bbox_bottomleft[1], bbox_topright[1])
# turn of axes display
ax.set_axis_off()
# add callouts 
# for the name of the places
for index, row in gdf.iterrows():
    # offset labels by odd/even
    label_offset_x = 30
    if (index % 2) == 0:
        label_offset_x = -100
    label_offset_y = -30
    if (index % 4) == 0:
        label_offset_y = 100
    ax.annotate(
        text=row["name"],
        xy=(row["geometry"].x, row["geometry"].y),
        xytext=(label_offset_x, label_offset_y),
        textcoords="offset points",
        bbox=dict(
            boxstyle='round,pad=0.5',
            fc='white',
            alpha=0.5),
        arrowprops=dict(
            mutation_scale=4,
            arrowstyle=ArrowStyle(
                "simple, head_length=2, head_width=2, tail_width=.2"), 
            connectionstyle=f'arc3,rad=-0.3',
            color='black',
            alpha=0.2))
cx.add_basemap(
    ax, alpha=0.5,
    source=cx.providers.OpenStreetMap.Mapnik)

Have a look at the available basemaps:

cx.providers.keys()

dict_keys(['OpenStreetMap', 'OpenSeaMap', 'OpenPtMap', 'OpenTopoMap', 'OpenRailwayMap', 'OpenFireMap', 'SafeCast', 'Thunderforest', 'OpenMapSurfer', 'Hydda', 'MapBox', 'Stamen', 'Esri', 'OpenWeatherMap', 'HERE', 'FreeMapSK', 'MtbMap', 'CartoDB', 'HikeBike', 'BasemapAT', 'nlmaps', 'NASAGIBS', 'NLS', 'JusticeMap', 'Wikimedia', 'GeoportailFrance', 'OneMapSG'])

And a look at the basemaps for a specific provider:

cx.providers.CartoDB.keys()

dict_keys(['Positron', 'PositronNoLabels', 'PositronOnlyLabels', 'DarkMatter', 'DarkMatterNoLabels', 'DarkMatterOnlyLabels', 'Voyager', 'VoyagerNoLabels', 'VoyagerOnlyLabels', 'VoyagerLabelsUnder'])

Interactive Maps¶

Plot with Holoviews/ Geoviews (Bokeh)

import holoviews as hv
import geoviews as gv
from cartopy import crs as ccrs
hv.notebook_extension('bokeh')

Create point layer:

places_layer = gv.Points(
    df,
    kdims=['lon', 'lat'],
    vdims=['name', 'pageid'],
    label='Place')

Make an additional query, to request pictures shown in the area from commons.wikimedia.org

query_url = f'https://commons.wikimedia.org/w/api.php'
params = {
    "action":"query",
    "list":"geosearch",
    "gsprimary":"all",
    "gsnamespace":6,
    "gsradius":1000,
    "gslimit":500,
    "gscoord":f'{lat}|{lng}',
    "format":"json"
    }

response = requests.get(
        url=query_url, params=params)
print(response.url)

https://commons.wikimedia.org/w/api.php?action=query&list=geosearch&gsprimary=all&gsnamespace=6&gsradius=1000&gslimit=500&gscoord=51.03711%7C13.76318&format=json

json_data = json.loads(response.text)

df_images = pd.DataFrame(json_data["query"]["geosearch"])

df_images.head()

Unfortunately, this didn't return any information for the pictures. We want to query the thumbnail-url, to show this on our map.
For this, we'll first set the pageid as the index (=the key),
and we use this key to update our Dataframe with thumbnail-urls, retrievd from an additional API call

Set Column-type as integer:

df_images["pageid"] = df_images["pageid"].astype(int)

Set the index to pageid:

df_images.set_index("pageid", inplace=True)

df_images.head()

Load additional data from API: Place Image URLs

params = {
    "action":"query",
    "prop":"imageinfo",
    "iiprop":"timestamp|user|userid|comment|canonicaltitle|url",
    "iiurlwidth":200,
    "format":"json"
    }

See the full list of available attributes.

Query the API for a random sample of 50 images:

%%time
from IPython.display import clear_output
from datetime import datetime

count = 0
df_images["userid"] = 0 # set default value
for pageid, row in df_images.sample(n=50).iterrows():
    params["pageids"] = pageid
    response = requests.get(
        url=query_url, params=params)
    json_data = json.loads(response.text)
    image_json = json_data["query"]["pages"][str(pageid)]
    if not image_json:
        continue
    image_info = image_json.get("imageinfo")
    if image_info:
        thumb_url = image_info[0].get("thumburl")
        count += 1
        df_images.loc[pageid, "thumb_url"] = thumb_url
        clear_output(wait=True)
        display(HTML(
            f"Queried {count} image urls, "
            f"<a href='{response.url}'>last query-url</a>."))
        # assign additional attributes
        df_images.loc[pageid, "user"] = image_info[0].get("user")
        df_images.loc[pageid, "userid"] = image_info[0].get("userid")
        timestamp = pd.to_datetime(image_info[0].get("timestamp"))
        df_images.loc[pageid, "timestamp"] = timestamp
    df_images.loc[pageid, "title"] = image_json.get("title")

CPU times: user 1.04 s, sys: 156 ms, total: 1.2 s
Wall time: 14.1 s

%%time ?

IPython has a number of built-in "magics", and %%time is one of them. It will output the total execution time of a cell.

We have only queried 50 of our 100 images for urls.
To view only the subset of records with urls, use boolean indexing

df_images[
    df_images["userid"] != 0].head()

What happens here in the background is that df_images["userid"] != 0 returns True for all records where "iserid" is not 0 (the default value).
In the second step, this is used to slice records using the boolean indexing: df_images[Condition=True]

Next (optional) step: Save queried data to CSV

dataframes can be easily saved (and loaded) to (from) CSV using pd.DataFrame.to_csv()
there's also pd.DataFrame.to_pickle()
a general recommendation is to use to_csv() for archive purposes ..
..and to_pickle() for intermediate, temporary files stored and loaded to/from disk

df_images[df_images["userid"] != 0].to_csv(
    OUTPUT / "wikimedia_commons_sample.csv")

Open CSV in the Explorer

Click the link wikimedia_commons_sample.csv, to have a look at the structure of the generated CSV.
Jupyter Lab provides several renderers for typical file formats such as CSV, JSON, or HTML
Notice some of the Full User Names provides in the list. We will use this sample data in the second notebook, to explore privacy aspects of VGI.

Create two point layers, one for images with url and one for those without:

images_layer_thumbs = gv.Points(
    df_images[df_images["thumb_url"].notna()],
    kdims=['lon', 'lat'],
    vdims=['thumb_url', 'user', 'timestamp', 'title'],
    label='Picture (with thumbnail)') 
images_layer_nothumbs = gv.Points(
    df_images[df_images["thumb_url"].isna()],
    kdims=['lon', 'lat'],
    label='Picture')

kdims and vdims?

kdims refers to the key-dimensions, which provide the primary references for plotting to x/y axes (coordinates)
vdims refers to the value-dimensions, which provide additional information that is shown in the plot (e.g. colors, size, tooltips)
Each string in the list refers to a column in the dataframe.
Anything that is not included here in the layer-creation cannot be shown during plotting.

margin = 500 # meters
bbox_bottomleft = (x - margin, y - margin)
bbox_topright = (x + margin, y + margin)

from bokeh.models import HoverTool
from typing import Dict, Optional
def get_custom_tooltips(
        items: Dict[str, str], thumbs_col: Optional[str] = None) -> str:
    """Compile HoverTool tooltip formatting with items to show on hover
    including showing a thumbail image from a url"""
    tooltips = ""
    if items:
        tooltips = "".join(
            f'<div><span style="font-size: 12px;">'
            f'<span style="color: #82C3EA;">{item}:</span> '
            f'@{item}'
            f'</span></div>' for item in items)
    tooltips += f'''
        <div><img src="@{thumbs_col}" alt="" style="height:170px"></img></div>
        '''
    return tooltips

Bokeh custom styling

The above code to customize Hover tooltips is shown for demonstration purposes only
As it is obvious, such customization can become quite complex
Below, it is also shown how to use the default Hover tooltips, which is the recommended way for most situations
In this case, Holoviews will display tooltips for any DataFrame columns that are provided as vdims (e.g.: vdims=['thumb_url', 'user', 'timestamp', 'title'])

def set_active_tool(plot, element):
    """Enable wheel_zoom in bokeh plot by default"""
    plot.state.toolbar.active_scroll = plot.state.tools[0]

# prepare custom HoverTool
tooltips = get_custom_tooltips(
    thumbs_col='thumb_url', items=['title', 'user', 'timestamp'])
hover = HoverTool(tooltips=tooltips) 
    
gv_layers = hv.Overlay(
    gv.tile_sources.EsriImagery * \
    places_layer.opts(
        tools=['hover'],
        size=20,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='red') * \
    images_layer_nothumbs.opts(
        size=5,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='lightblue') * \
    images_layer_thumbs.opts(
        size=10,
        line_color='black',
        line_width=0.1,
        fill_alpha=0.8,
        fill_color='lightgreen',
        tools=[hover])
    )

Combining Layers

The syntax to combine layers is either * or +
* (multiplay) will overlay layers
+ (plus) will place layers next to each other, in separate plots
The \ (backslash) is python's convention for line continuation,
to break long lines
The resulting layer-list is a hv.Overlay, which can be used for defining global plotting criteria

Store map as static HTML file

gv_layers.opts(
    projection=ccrs.GOOGLE_MERCATOR,
    title=df.loc[0, "name"],
    responsive=True,
    xlim=(bbox_bottomleft[0], bbox_topright[0]),
    ylim=(bbox_bottomleft[1], bbox_topright[1]),
    data_aspect=0.45, # maintain fixed aspect ratio during responsive resize
    hooks=[set_active_tool])
hv.save(
    gv_layers, OUTPUT / f'geoviews_map.html', backend='bokeh')

Open map in new tab

In the file explorer on the left, go to notebooks/out/ and open geoviews_map.html with a right-click: Open in New Browser Tab.

Display in-line view of the map:

gv_layers.opts(
    width=800,
    height=480,
    responsive=False,
    hooks=[set_active_tool],
    title=df.loc[0, "name"],
    projection=ccrs.GOOGLE_MERCATOR,
    data_aspect=1,
    xlim=(bbox_bottomleft[0], bbox_topright[0]),
    ylim=(bbox_bottomleft[1], bbox_topright[1])
    )

Create Notebook HTML¶

For archive purposes, we can convert the entire notebook, including interactive maps and graphics, to an HTML file.
The command is invoked through the exclamation mark (!), which means: instead of python, use the command line.

Steps:

Create a single HTML file in ./out/ folder
disable logging (&>/dev/null)
use nbconvert template

Make sure to Save your Notebook before this step.

!jupyter nbconvert --to Html_toc \
    --output-dir=./out/ ./01_raw_intro.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-

Summary¶

There are many APIs

But it is quite complex to query data, since each API has its own syntax
Each API also returns data structured differently
Just because it is possible to access data doesn't mean that it is allowed or ethically correct to use the data (e.g. Instagram)

There are (some) Solutions

Typically, you would not write raw queries using request. For many APIs, packages exist that ease workflows.
We have prepared LBSN Structure, a common format to handle cross-network Social Media data (e.g. Twitter, Flickr, Instagram). Using this structure reduces the work that is necessary, it also allows visualizations to be adpated to other data easily.
Privacy is critical, and we will explore one way to reduce the amount of data stored in the follwing notebook.

This is just an introduction

We have only covered a small number of steps. If you want to continue, we recommend trying the following tasks:

Start and run this notebook locally, on your computer, for example, using our IfK Jupyter Lab Docker Container, or re-create the environment manually with Miniconda
Explore further visualization techniques, for example:
- Create a static line or scatter plot for temporal contribution of the df_images dataframe, see Pandas Visualization
- Repeat the same with Holoviews, to create an interactive line/scatter plot.

	0	1	2	3	4	5	6	7	8	9	...	62	63	64	65	66	67	68	69	70	71
node.comments_disabled	False	False	False	False	False	False	False	False	False	False	...	False	False	False	False	False	False	False	False	False	False
node.__typename	GraphImage	GraphSidecar	GraphImage	GraphSidecar	GraphImage	GraphImage	GraphImage	GraphImage	GraphImage	GraphVideo	...	GraphImage	GraphImage	GraphImage	GraphImage	GraphImage	GraphImage	GraphImage	GraphSidecar	GraphImage	GraphImage
node.id	2540774574208661591	2540774554570229779	2540774469588120132	2540774404073415997	2540774256188660988	2540774033270344301	2540774016191271118	2540773950893762390	2540773905451753198	2540773845474579301	...	2540755014767762889	2540747449048563737	2540726769225711143	2540719763220795766	2540649887994549360	2540550480086579626	2539424865805258775	2538900007730711579	2537878748329462561	2536668389280079876
node.edge_media_to_caption.edges	[{'node': {'text': 'Mai bahut paresan hu ki am...	[{'node': {'text': 'Детские шалости . . . . . ...	[{'node': {'text': 'Throwback to before Corona...	[{'node': {'text': '2021-3-28 最近サボり気味だったのでボックス...	[{'node': {'text': 'Trying to find some color ...	[{'node': {'text': 'Smile 😅 #smile #son #myson...	[{'node': {'text': 'Spring is in the air 🥰 So ...	[{'node': {'text': '#tree#tree_magic#trees#tre...	[{'node': {'text': 'Heute ist Mache-einen-Spaz...	[{'node': {'text': 'The evening is all about r...	...	[{'node': {'text': '💕💕'}}]	[{'node': {'text': 'You go to team 1, 2 or 3 ?...	[{'node': {'text': 'It’s the last Cherry Bloss...	[{'node': {'text': 'Defensive Castling. . . ....	[{'node': {'text': 'Cherry blossom . . . . . ...	[{'node': {'text': 'Spring is finally here!!!🥺...	[{'node': {'text': 'Selfie con el verano. #sel...	[{'node': {'text': '《PASSA PRO LADO》Chile é o ...	[{'node': {'text': 'Bloom into something SPECI...	[{'node': {'text': '#bounty #pitbull #dontbull...
node.shortcode	CNCpQN-DThX	CNCpP7rgogT	CNCpOsiLnJE	CNCpNvhM2U9	CNCpLlylRz8	CNCpIWLnYpt	CNCpIGRn5DO	CNCpHJdlqdW	CNCpGfJCJ7u	CNCpFnSHZNl	...	CNCkzl0hl3J	CNCjFfshQAZ	CNCeYkHBj4n	CNCcynQlxF2	CNCM5y4jRRw	CNB2TODgjmq	CM92XYEHdAX	CM7_Br5Bowb	CM4W0Z9MMsh	CM0DnXQAGAE
node.edge_media_to_comment.count	0	0	0	0	0	0	0	0	0	0	...	1	1	1	5	0	4	2	5	54	4
node.taken_at_timestamp	1617103953	1617103951	1617103941	1617103933	1617103915	1617103889	1617103887	1617103879	1617103873	1617103870	...	1617101621	1617100720	1617098254	1617097419	1617089089	1617077239	1616943055	1616880487	1616758744	1616614458
node.dimensions.height	1350	1080	1080	562	1350	1350	1232	1080	1080	1080	...	803	719	1350	1080	720	1350	1350	1351	1080	1136
node.dimensions.width	1080	1080	1080	750	1080	1080	1080	1080	1080	1080	...	1080	1080	1080	1080	1080	1080	1080	1080	1080	1080
node.display_url	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...
node.edge_liked_by.count	1	1	0	0	0	0	4	0	2	1	...	3	4	19	16	16	213	72	128	955	263
node.edge_media_preview_like.count	1	1	0	0	0	0	4	0	2	1	...	3	4	19	16	16	213	72	128	955	263
node.owner.id	5851781099	3119365248	194246328	6945287843	7678988855	3237980116	1799586119	46158799795	10834760559	9407786715	...	39990270	46646388447	16262526	14780238	29627463	26419246	1026486804	5362882268	2872355804	6830279584
node.thumbnail_src	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-2.cdninstagram.com/v/t51...	...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...	https://scontent-frx5-1.cdninstagram.com/v/t51...	https://scontent-frt3-1.cdninstagram.com/v/t51...
node.thumbnail_resources	[{'src': 'https://scontent-frt3-2.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frx5-1.cdninstagram...	[{'src': 'https://scontent-frt3-2.cdninstagram...	[{'src': 'https://scontent-frt3-2.cdninstagram...	[{'src': 'https://scontent-frt3-2.cdninstagram...	[{'src': 'https://scontent-frt3-2.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-2.cdninstagram...	...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frx5-1.cdninstagram...	[{'src': 'https://scontent-frx5-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...	[{'src': 'https://scontent-frx5-1.cdninstagram...	[{'src': 'https://scontent-frt3-1.cdninstagram...
node.is_video	False	False	False	False	False	False	False	False	False	True	...	False	False	False	False	False	False	False	False	False	False
node.accessibility_caption	Photo by Sachin Kumar Prajapati in Lucknow, Ut...	Photo by 🇷🇺 Alx Che 🇷🇺 on March 30, 2021. May...	Photo by Igor Jakovljević in Jet d'eau Genève...	None	Photo by Sumit Pandey on March 30, 2021. May b...	Photo by Piotr Wójcik in Priory Park with @oo...	Photo by Daria Zvereva in Villa Eden The Leadi...	Photo by Brigita Instructor on March 30, 2021....	Photo by CW Niemeyer Buchverlage on March 30, ...	None	...	Photo by Kwan Norachid on March 30, 2021. May ...	Photo by Kultura Park on March 30, 2021. May b...	Photo by Sasha Wallace 💛 in Beechwood Park, Ne...	Photo by David L. Merin in Cambodia. May be an...	Photo by Paulina Nartowicz in Hall Place. May ...	Photo by @lhmann on March 29, 2021. May be an ...	Photo by Nadia Arlene Donaire García in Parque...	Photo by Daniele Barros on March 27, 2021. May...	Photo by fяαωℓα in Planet Earth. May be art.	Photo by thelifeofBounty on March 24, 2021. Ma...
node.product_type	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	feed	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
node.video_view_count	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1.0	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	pageid	ns	title	lat	lon	dist
0	4712421	14	Category:Großer Garten, Dresden	51.037500	13.763100	43.7
1	4312703	14	Category:Palais im Großen Garten	51.037800	13.762800	81.2
2	82327811	14	Category:Südallee, Dresden	51.035530	13.764342	193.6
3	82325805	14	Category:Querallee, Dresden	51.035881	13.760845	212.9
4	82328367	14	Category:Herkulesallee, Dresden	51.039355	13.763524	250.8

	pageid	ns	name	lat	lon	dist	geometry
0	4712421	14	Großer Garten, Dresden	51.037500	13.763100	43.7	POINT (1532101.284 6627929.721)
1	4312703	14	Palais im Großen Garten	51.037800	13.762800	81.2	POINT (1532067.888 6627982.831)
2	82327811	14	Südallee, Dresden	51.035530	13.764342	193.6	POINT (1532239.543 6627580.976)
3	82325805	14	Querallee, Dresden	51.035881	13.760845	212.9	POINT (1531850.258 6627643.112)
4	82328367	14	Herkulesallee, Dresden	51.039355	13.763524	250.8	POINT (1532148.483 6628258.121)

	pageid	ns	title	lat	lon	dist
0	71373919	6	File:Ready For Fall (229868743).jpeg	51.037090	13.763133	3.9
1	16826163	6	File:Dresden großer garten brühlsche vase süd ...	51.037218	13.763359	17.3
2	58673825	6	File:Cygnus olor from Palaisteich 2014, Dresde...	51.037049	13.763498	23.2
3	58929336	6	File:Südvorstadt-Ost, Dresden, Germany - panor...	51.037280	13.763391	24.0
4	59707051	6	File:Palais im Großen Garten (1).jpg	51.037295	13.763358	24.0

	ns	title	lat	lon	dist
pageid
71373919	6	File:Ready For Fall (229868743).jpeg	51.037090	13.763133	3.9
16826163	6	File:Dresden großer garten brühlsche vase süd ...	51.037218	13.763359	17.3
58673825	6	File:Cygnus olor from Palaisteich 2014, Dresde...	51.037049	13.763498	23.2
58929336	6	File:Südvorstadt-Ost, Dresden, Germany - panor...	51.037280	13.763391	24.0
59707051	6	File:Palais im Großen Garten (1).jpg	51.037295	13.763358	24.0

	ns	title	lat	lon	dist	userid	thumb_url	user	timestamp
pageid
59707051	6	File:Palais im Großen Garten (1).jpg	51.037295	13.763358	24.0	4153868	https://upload.wikimedia.org/wikipedia/commons...	Panoramio upload bot	2017-06-07 19:19:41+00:00
57225797	6	File:Palais im Großen Garten (1406).jpg	51.037605	13.762956	57.2	4153868	https://upload.wikimedia.org/wikipedia/commons...	Panoramio upload bot	2017-03-19 20:53:04+00:00
38635377	6	File:Park Castle Dresden 97179263.jpg	51.037410	13.762430	62.2	3358101	https://upload.wikimedia.org/wikipedia/commons...	Blackwhiteupl	2015-02-27 19:27:15+00:00
38635735	6	File:Park Castle Dresden 97179319.jpg	51.037408	13.762426	62.2	3358101	https://upload.wikimedia.org/wikipedia/commons...	Blackwhiteupl	2015-02-27 19:53:45+00:00
38635739	6	File:Park Castle Dresden 97179335.jpg	51.037408	13.762425	62.3	3358101	https://upload.wikimedia.org/wikipedia/commons...	Blackwhiteupl	2015-02-27 19:53:48+00:00

Part 1: Social Media Data ¶