Alaska (City of Tshwane) data - collected by UP GMT320

From Geobach wiki
Jump to: navigation, search


Over the past few years, the GMT320 students at the University of Pretoria have collected various datasets, such as dwelling numbers and landuse, in the informal settlement of Alaska in Mamelodi, City of Tshwane. This page provides an overview of the general process followed and how the data was cleaned.

Getting the datasets for Alaska, City of Tshwane

If you would like to get access to the data, please contact Victoria Rautenbach (

What is unique about Informal settlements?

What is an informal settlement?

Informal settlements (also known as squatter camps, shantytowns or slums) are densely populated illegal or unauthorised settlements characterised by rapid and unstructured expansion and improvised dwellings made from scrap material. The settlements are traditionally located along the borders of urban areas, close to the social and economic hubs. As informal settlements are considered to be illegal or unauthorised, they lack secure tenure, basic service delivery (e.g. access to water, electricity and waste removal) and infrastructure (e.g. roads and storm water drainage). In South Africa, informal settlements arise due to biased planning, housing backlog and the search for work and a better quality of life.

Issues in and around informal settlements

Informal settlements are plagued by various socioeconomic problem, for example, lack of service delivery and unsecured tenure. Below is a short explanation of two of these issues:

  • Illegal electricity connections:
Due to the lack of service delivery, illegal electricity connections (as known as an Izinyoka) are created by the community. These illegal connections are very dangerous and electrocution is a real threat.
Example of illegal connections in Alaska
  • Multiple dwelling numbers:
As the dwelling are not on proclaimed land, the dwelling does not always have any dwelling number. However, in other cases, dwellings can have multiple numbers assigned by more than one organization. For example, one dwelling number can be assigned by a service provider and another by the community itself.
Example of multiple dwelling numbers in Alaska

Data collected

For the last two years, the final year BSc Geoinformatics students have been collecting data in the informal settlement of Alaska, City of Tshwane. Before we got involved in Alaska, there was no information available for the area. Viva had to rely on their local knowledge to make their own maps, see below. Over the last two years, we have collecting the following datasets:

  • Dwelling points
  • Roads and foot paths
  • Hazards, such as illegal electricity connections and dumping sites
  • Communal water taps
Accuracy achieved with EpiCollect

How was the data collected?

What tools were used?

The data was collected in the field, using various techniques. The following was used to collect the data in 2015 and 2016:

What was collected?

The table below shows what was collected.

Location Primary address Secondary address Tertiary address Type of structure Notes
Handheld GPS X
EpiCollect+ X X X X X X
Notebook X X X X X

Why did we use various tools?

Basically, two different datasets were collected, with the paper notes as a backup:

  • GPS waypoints (using the handheld GPS)
  • Attribute data for each point captured (using EpiCollect+)

The reason for this is the density of dwellings in the settlement and the accuracy of points collected using a mobile application, such as EpiCollect+. With EpiCollect+, the best accuracy we could achieve was 4-6m. This is definitely suitable when capturing points in areas that are not very dense, but in an informal settlement, where there could be more than one dwelling in this radius it is simply just not good enough.

Accuracy achieved with EpiCollect

How was the data integration done?

Data integration was important, as the data was captured by more than one group and this results in inconsistencies in the attribute information. Additionally, we captured two GPS locations and this needed to be combined. The following datasets were used:

  • GPS ways points (2015 - 7 sets and 2016 - 16 sets)
Location of dwelling with a 3m accuracy
  • EpiCollect+
Location of dwelling with about a 5m accuracy
Attribute data
  • City of Tshwane aerial photographs from 2012

The table below is an example of the data before the cleaning process.

Primary Add Secondary Add Tertiary Add
cd-132 50000906 AL 75
AL-45 50000565 CD 145
50000123 cd226 Al36
Cd 256 al-22 500001025

The first step in the data integration process was to manually review the data for any inconsistencies and then find and replace small letters and additional spaces. For example, Al-987 -> AL987 and al 987 -> AL98. The next decision was to decide what is the primary, secondary and tertiary dwelling numbers. With PostgreSQL and PostGIS, the values were moved into the correct attribute column. The table below is an example of the results of the first two steps:

Primary Add Secondary Add Tertiary Add
CD132 50000906 AL75
CD145 50000565 AL45
CD226 50000123 AL36
CD256 500001025 AL22

After this step, the attribute data should now be consistent, and we can now focus on the locations. Firstly, the waypoints were merged into one dataset using QGIS. When we look at the data in QGIS, you basically see the following:


With the type of structure, we also captured if a dwelling is actually a shop or maybe a local tavern (as known as a shebeen).


As discussed, each dwelling has two locations associated with it. The image below illustrates the locations and the different accuracies achieved by the handheld GPS and mobile phone respectively.


The next step was then to create buffers around each point using the recorded accuracy. If the buffers intersected and the GPSid recorded in EpiCollect+ is the same as the GPSid captured by the handheld GPS the pints were integrated. This resulted in a dataset with both the location from the GPS and the attributes from EpiCollect+. The resulting dataset contains just over 1400 dwelling points.

What data has been collected?

The dwellings point data set contains the following attributes:

  • X - x coordinate of the point in WGS84 UMT 35S
  • Y - y coordinate of the point in WGS84 UMT 35S
  • gpsid - The GPSid is the foreign key to the waypoints data set collected with handheld GPSs
  • priadd - This is the primary address. We decided to consider the CD or CCD numbers the primary address
  • secadd - This is the secondary address. This number was assigned by StatsSA in 2011 during the national census. This is not actually a dwelling number, it was only used to indicate that the dwelling and its inhabitants were counted during the census.
  • teradd - This is the tertiary address. For this, we selected the AL number. However, last year we learned that new AL numbers have been assigned to each dwelling and this is considered to be the official number within the community.
  • type - The type indicates if it is a dwelling or used for another purpose.
1 - dwelling
2 - shop
3 - barber of salon
4 - tyre repair
5 - taxi stop
6 - recycling
7 - health hazard
8 - NGO
9 - daycare or school
10 - church
  • remoteid - This refers to the SAPRI remote number assigned to that dwelling