EvaVGI V2 Meeting Slides
EvaVGI V2 Meeting Slides, August 8, 2019, Dresden
EvaVGI, meinGrün, & Virginia Mainstreet21
Dr.-Ing Alexander Dunkel
Technische Universität Dresden
Environmental Sciences, Institute of Cartography
Slide structure:
- HyperLogLog: Privacy aware data
- Parametric Visualizations
- Explicit VGI: App + API
Slide setup:
➡ Right arrows: Main sections
⬇ Down arrows: Section slides
HyperLogLog: Privacy aware data
What is HyperLogLog(HLL) ?
HLL is a probabilistic data structure first presented by Flajolet et al. in 2007.
HLL allows
approximate counts of the number of distinct elements
in a set.
E.g.: Count distinct users in "Dresden"
Usually:
# initialize set
user_ids_dresden = set()
# loop LBSM posts for Dresden
for post in posts:
if intersects_dresden_shape(post.latlng):
# e.g. user_id: 64974314@N08
user_ids_dresden.add(post.user_id)
# derive sum of unique ids
distinct_usercount= len(user_ids_dresden)
print(distinct_users_dresden)
48.408
In HLL:
# get hll shard from original ids
user_ids_dresden_hll = hll_add_agg(user_ids_dresden)
# derive sum of unique ids
distinct_usercount = cardinality(user_ids_dresden_hll)
print(distinct_users_dresden)
48.401
Difference
Data sensitivity spectrum (complemented). From: What Are Data? A Categorization of the Data Sensitivity Spectrum. Rumbold & Pierscionek (2018)
Operator | Description | |
---|---|---|
A⋂B | intersection | Subset of elements found in both A and B |
A⋃B | union | Combine distinct elements in A and B |
add(a, A) | update | Add single element to A |
cardinatlity(A) | cardinality | Get estimate of unique elements in A |
→ HLL allows to stream updates
(no storage of raw data necessary)
→ Original IDs cannot be derived from HLL
(HLL is classified as statistics data)
'base' = distinct element
(space, time, social, thematic)
'overlay' = hll
(count/ measure)
We've created bases for:
- all terms (thematic)
- all lat-lng coordinates (spatial)
- all dates (temporal)
- all services (social)
... and overlay hll shards for
- post-ids
- user-ids
- user-days
- user-post-locations
e.g.:
- Base: location, area, path, raster (spatial)
- Overlay: post_ids, user_ids, terms, hours, days, user-days etc.
HLL shards for different locations
- Base: term, emoji, topic (thematic)
- Overlay: post_ids, user_ids, location_ids, hours, days, user-days etc.
HLL shards for emoji
- Base: day, month, unique-day, period etc. (temporal)
- Overlay: post_ids, user_ids, terms, hours, days, user-days etc.
HLL shards for dates
- Base: service, group, community (social)
- Overlay: post_ids, user_ids, terms, hours, days, user-days etc.
We're live!
|
|
Parametric Visualizations
- improve information on green spaces
- combine objectively measured GI with LBSM subjective information
- Analyst (city planner) provides 'context' (parameters)
Input: {set,of,terms}, {lat,lng}-bounds Output: HeatMap (HTML, Vector, Geojson)
Input: Sets of {set,of,terms} ▶ activity
Output: ATKIS Landuse Correlation Map (HTML, PNG, SVG)
Input: GeoJson (Lines) / OSM Graph
Output: LBSN '"Popularity" weighted OSM Graph (HTML, PNG, CSV)
Input: GeoJSON (e.g. Parks)
Output: LBSN Popularity Map (HTML, PNG, SVG)
Prototypes in Jupyter Lab, final implementaton in Web-API
Towards Explicit VGI: App + API
- combine implicitly collected LBSM with explicit discourse data ("eVGI")
- compare outside picture from tourists (LBSM) with perception of locals
Presentation of prototype app, Waynesboro (VA)
Prototype: waynesboro.theplink.org
Features
- Containerized Infrastructure entirely build using open source software
- published publicly as Open Source Software (Github)
- everyone can collaborate and contribute
- state of the art security, https, oauth (login with google/yahoo ...)
- separation of concerns principle: those who develop the software have no access to the actual data
- local knowledge stays local
- automatically deployed and updated through git continuous integration
- instances for different cities are completely separated, data is stored in a persistent volume for each instance
- minimal hardware requirements
- Progressive Web App: works on Apple iOS, Android and as a regular webpage
- "Waynesboro App" can be added to users’ home screen
Public "City" API for App developers
A platform for strengthening the local discourse
Made possible by:
- Gitlab Continuous Integration (CI)
- Git submodules & semantic versioning
- Docker Service Containerization
- Jupyter Lab & Conda PyViz Channel