Python data science 101, Slides

This was a slight update for the 2022/23 lecture on “Python data science 101”. Our aim was to have a more practical introduction to Python visual analytics and introduce some best practices, before we head into the workshop weeks.

Title: Python data science 101 for Social Media Cartography (and beyond)

This semester, I revised the slides for better code legibility. RevealJS 4.4.0 allows to use new animations (e.g. see slide 61-63), which I used to better explain concepts, such as code formatting conventions. The code line highlighting is also good to break content down into individual pieces (e.g. see slide 57-61).

We also explored how ChatGPT can be used to program.

The question was:

Write a really short python method to download a zip file from an URL and unzip it to a folder.

This is the code that was returned:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import urllib.request
import zipfile

def download_and_unzip(url, target_dir):
  # Download the zip file
  zip_file, headers = urllib.request.urlretrieve(url)

  # Extract the contents of the zip file to the target directory
  with zipfile.ZipFile(zip_file, 'r') as zf:
    zf.extractall(target_dir)

Pretty good.

Compare this to my code on slide 35:

1
2
3
4
import requests, zipfile, io
r = requests.get(f'{URI}{FILENAME}')
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(DATA_PATH)

Instead of the native requests library, ChatGPT suggests to use urllib.request (which requests uses under the hood, too). However, requests provides for a more simple API. I tested the function from ChatGPT, and it did work out of the box. The customizations turn out to be more complex using ChatGPT urllib.request instead of requests. For instance, I later had to add headers/user agent to request the zip file, which is easier in requests.

Here’s a full code example based on the ChatGPT function

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import urllib.request
import zipfile
from pathlib import Path

def download_and_unzip(url, target_dir):
    opener = urllib.request.build_opener()
    opener.addheaders = [
        ('User-agent', 
         'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
         'AppleWebKit/537.36 (KHTML, like Gecko) '
         'Chrome/102.0.0.0 Safari/537.36')]
    urllib.request.install_opener(opener)
    # Download the zip file
    zip_file, headers = urllib.request.urlretrieve(url)
    
    # Extract the contents of the zip file to the target directory
    with zipfile.ZipFile(zip_file, 'r') as zf:
      zf.extractall(target_dir)
      
DATA_PATH = Path.cwd() / "out"
URI = "https://www.naturalearthdata.com/http//www." \
      "naturalearthdata.com/download/50m/cultural/"
FILENAME = "ne_50m_admin_0_map_subunits.zip"

print(f'{URI}{FILENAME}')

> https://www.naturalearthdata.com/http//www.naturalearthdata.com/ \
> download/50m/cultural/ne_50m_admin_0_map_subunits.zip

download_and_unzip(f'{URI}{FILENAME}', DATA_PATH)