mobility_pipeline.lib package

Submodules

mobility_pipeline.lib.make_matrix module

Functions for making tower-tower, tower-admin, and admin-tower matrices

Matrices:

  • tower-tower: The raw mobility data between cell towers. The value at row i and column j is the number of people who move on that day from the region served by tower i to the region served by tower j. Note that really, this is the number of cell phones that connect to tower i in the morning and tower j in the evening, which we assume represents a person moving. This matrix has row indices of the origin towers and column indices of the destination towers.

  • tower-admin: Computed from the Voronoi tessellation and the country shapefile, this matrix represents the percent of each admin that is covered by each tower. For any x in the matrix at row i and column j, we know that a fraction x of the admin with index i is covered by the tower with index j. This means that the matrix has row indices of admin indices and column indices of tower indices.

  • admin-tower: Computed from the Voronoi tessellation and the country shapefile, this matrix represents the percent of each tower’s range that is within each admin. For any x in the matrix at row i and column j, we know that a fraction x of the Voronoi cell for the tower with index i is within the admin with index j. This means that the matrix has row indices of tower indices and column indices of admin indices.

  • admin-admin: This is the final mobility matrix, which represents the number of people who move between admins each day. The value at row i and column j is the number of people who move on that day from the admin with index i to the admin with index j. This is, of course, being estimated from cell phone data and the overlaps as computed in the other matrices.

This strategy is explained by Mike Fabrikant at UNICEF: https://medium.com/@mikefabrikant/cell-towers-chiefdoms-and-anonymized-call-detail-records-a-guide-to-creating-a-mobility-matrix-d2d5c1bafb68

mobility_pipeline.lib.make_matrix.generate_rtree(polygons: collections.abc.Sequence) → Tuple[shapely.strtree.STRtree, Dict[Tuple[tuple, ...], int]]

Helper function that builds an RTree from MultiPolygons

The Rtree is built from MultiPolygons using shapely.strtree.STRtree. Since the RTree returns the overlapping MultiPolygons, we need a way to retrieve the polygon’s index. We do this with a dictionary from the exterior coordinates (Polygon.exterior.coords) of every Polygon in the MultiPolygon to the MultiPolygon’s index in the provided Sequence.

Specifically, you can generate the key for a given MultiPolygon mpoly like so:

key = tuple([tuple(p.exterior.coords) for p in mpoly])
Parameters

polygons – A Sequence of MultiPolygons. Must be iterable and able to be passed to the STRtree constructor. Iteration must be deterministic.

Returns

A tuple of the RTree and the index mapping dictionary.

mobility_pipeline.lib.make_matrix.make_a_to_b_matrix(a_cells: List[shapely.geometry.multipolygon.MultiPolygon], b_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray

Create an overlap matrix from sequence A to B

Computes for every pair of MultiPolygons between A and B, the fraction of the MultiPolygon in B that is covered by the one in A. We use an RTree to reduce the number of overlaps we have to compute by only computing overlaps between MultiPolygons that have overlapping bounding boxes.

Parameters
  • a_cells – Sequence A of MultiPolygons

  • b_cells – Sequence B of MultiPolygons

Returns

A matrix with row indices that correspond to the indices of B and column indices that correspond to the indices of A. Every element at row i and column j in the matrix represents the fraction of the MultiPolygon in B at index i that overlaps with the MultiPolygon in A at index j.

mobility_pipeline.lib.make_matrix.make_admin_admin_matrix(tower_tower: numpy.ndarray, tower_admin: numpy.ndarray, admin_tower: numpy.ndarray) → numpy.ndarray

Compute the admin-to-admin matrix

Computed by multiplying the three provided matrices like so: (tower_admin) * (tower_tower) * (admin_tower)

Parameters
  • tower_tower – The tower-to-tower mobility data

  • tower_admin – Stores the fraction of each admin that is covered by each cell tower

  • admin_tower – Stores the fraction of each cell tower’s range that is within each admin

Returns

An admin-to-admin mobility matrix such that each value with row index i and column index j is the estimated number of people who moved that day from the admin with index i to the admin with index j.

mobility_pipeline.lib.make_matrix.make_admin_to_tower_matrix(admin_cells: List[shapely.geometry.multipolygon.MultiPolygon], tower_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray

Compute the admin-to-tower matrix.

This is a wrapper function for make_a_to_b_matrix(), with matrices A and B as denoted for each argument.

Parameters
  • tower_cells – Sequence of Voronoi cells; used as matrix A.

  • admin_cells – Sequence of administrative regions; used as matrix B.

Returns

The admin-tower matrix.

mobility_pipeline.lib.make_matrix.make_tower_to_admin_matrix(tower_cells: List[shapely.geometry.multipolygon.MultiPolygon], admin_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray

Compute the tower-to-admin matrix.

This is a wrapper function for make_a_to_b_matrix(), with matrices A and B as denoted for each argument.

Parameters
  • admin_cells – Sequence of administrative regions; used as matrix A.

  • tower_cells – Sequence of Voronoi cells; used as matrix B.

Returns

The tower-admin matrix.

mobility_pipeline.lib.make_matrix.make_tower_tower_matrix(mobility: pandas.core.frame.DataFrame, n_towers: int) → numpy.ndarray

Make tower-to-tower mobility matrix

Thank you to Tomas Bencomo (https://github.com/tjbencomo) for writing the initial version of this function.

Parameters
  • mobility – DataFrame of mobility data with columns [ORIGIN, DESTINATION, COUNT]. All values should be numeric.

  • n_towers – Number of towers, which defines the length of each matrix dimension

Returns

The tower-to-tower matrix, which has shape (n_towers, n_towers) and where the value at row i and column j is the mobility count for origin i and destination j.

mobility_pipeline.lib.overlap module

Utilities for working with overlapping Polygons and MultiPolygons

mobility_pipeline.lib.overlap.compute_overlap(polygon_1: Union[shapely.geometry.polygon.Polygon, shapely.geometry.multipolygon.MultiPolygon], polygon_2: Union[shapely.geometry.polygon.Polygon, shapely.geometry.multipolygon.MultiPolygon])

Computes the fraction of the first polygon that intersects the second

The returned fraction is (area of intersection) / (area of polygon_1).

Parameters
  • polygon_1 – The first polygon, whose total area will be the denominator for the computed fraction

  • polygon_2 – The second polygon

Returns

The fraction of the first polygon that intersects the second

mobility_pipeline.lib.validate module

Functions for validating data file formats and contents

mobility_pipeline.lib.validate.AREA_THRESHOLD = 0.0001

Allowable deviance, as a fraction of the area of the union, between the area of the union of polygons and the sum of the polygons’ individual areas. Agreement between these values indicates the polygons are disjoint and contiguous. Threshold was chosen based on the deviances in known good Voronoi tessellations.

mobility_pipeline.lib.validate.all_numeric(string: str) → bool

Check that a string is composed entirely of digits

Parameters

string – String to check

Returns

True if and only if the string is composed entirely of digits

mobility_pipeline.lib.validate.validate_admins(country_id) → Optional[str]

Check that the admins defined in the shapefile are reasonable

Admins are loaded using load_admin_cells().

Checks:

  • That the cells can be loaded by load_admin_cells.

  • That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal.

  • That at least one cell is loaded.

Parameters

country_id – Country identifier.

Returns

A description of a found error, or None if no error found.

mobility_pipeline.lib.validate.validate_contiguous_disjoint_cells(cells: List[Union[shapely.geometry.multipolygon.MultiPolygon, shapely.geometry.polygon.Polygon]])

Check that cells are contiguous and disjoint and that they exist

Checks:

  • That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal. The allowable deviation is specified by AREA_THRESHOLD

  • That at least one cell is loaded.

Returns

A description of a found error, or None if no error found.

mobility_pipeline.lib.validate.validate_mobility(raw: List[List[str]]) → Optional[str]

Checks that the text from a CSV file is in a valid format for mobility

The text must consist of a list of rows, where each row is a list of exactly 4 strings: a date (not checked), an origin tower, a destination tower, and a count.

The origin and destination must be composed of digits following data_interface.TOWER_PREFIX. The count must be composed entirely of digits and represent a non-negative integer.

The origin and destination tower numeric portions must strictly increase in origin-major order.

Parameters

raw – List of mobility CSV data by applying list(csv.reader(f))

Returns

None if the input is valid, a string describing the error otherwise.

mobility_pipeline.lib.validate.validate_mobility_full(mobility: List[List[str]]) → Optional[str]

Check whether the mobility data file is correctly ordered and full

The mobility data file is loaded from the file at path mobility_pipeline.data_interface.MOBILITY_PATH(). Correctly ordered means that the tower names’ numeric portions strictly increase in origin-major order. Full means that there is a row for every combination of origin and destination tower.

If this order were perfect, it would make forming the mobility matrix as easy as reshaping the last column. Unfortunately, this function showed that some coordinates are missing or out of order, so counts must be inserted manually.

Parameters

mobility – List of mobility CSV data by applying list(csv.reader(f))

Returns

None if there is no error, otherwise a description of the error.

mobility_pipeline.lib.validate.validate_tower_cells_aligned(cells: List[shapely.geometry.multipolygon.MultiPolygon], towers: numpy.ndarray) → Optional[str]

Check that each tower’s index matches the cell at the same index

For any cell c at index i, an error is found if c has nonzero area and the tower at index i is not within c.

Parameters
  • cells – List of the cells (multi) polygons, in order

  • towers – List of the towers’ coordinates (latitude, longitude), in order

Returns

A description of a found error, or None if no error found.

mobility_pipeline.lib.validate.validate_tower_index_name_aligned(csv_reader: Iterator) → Optional[str]

Check that in the towers data file, the tower names match their indices

Indices are zero-indexed from the second row in the file (to skip the header). An error is considered found if any tower name is not exactly TOWER_PREFIX appended with the tower’s index.

Parameters

csv_reader – CSV reader from calling csv.reader(f) on the open data file f

Returns

A description of a found error, or None if no error found.

mobility_pipeline.lib.validate.validate_voronoi(voronoi_path) → Optional[str]

Check that the Voronoi cells are reasonable

Checks:

  • That the cells can be loaded by load_cells.

  • That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal.

  • That at least one cell is loaded.

Returns

A description of a found error, or None if no error found.

mobility_pipeline.lib.voronoi module

Tools for working with Voronoi tessellations

Given a 2-dimensional space with a set of points (called seeds), the Voronoi tessellation is a partitioning of the space such that for every partition, which is called a cell, the cell contains exactly one seed, and every point in the cell is closer to the cell’s seed than it is to any other seed. For more information, see https://en.wikipedia.org/wiki/Voronoi_diagram.

class mobility_pipeline.lib.voronoi.VoronoiCell

Bases: dict

This class describes is for type hinting Voronoi cell JSONs

mobility_pipeline.lib.voronoi.json_to_polygon(points_json: List[List[float]]) → shapely.geometry.polygon.Polygon

Loads a Polygon from a JSON of points

Loads Polygon from a JSON of the format:

where each latitude-longitude pair describes a point defining the boundary of the polygon.

Parameters

points_json – The points that define the boundary of the polygon

Returns

A polygon

mobility_pipeline.lib.voronoi.load_cell(cell_json: mobility_pipeline.lib.voronoi.VoronoiCell) → shapely.geometry.multipolygon.MultiPolygon

Loads a Voronoi cell from JSON in Polygon or MultiPolygon format

Loads Voronoi cell from a JSON of the Polygon format:

or of the MultiPolygon format:

where each latitude-longitude pair describes a point of the Voronoi tessellation. If the JSON is in the Polygon format, a shapely.geometry.MultiPolygon object will be returned where the MultiPolygon has one member, the described polygon.

The value of the type key is used to distinguish Polygon and MultiPolygon formats.

Parameters

cell_json – The points that define the boundary of the Voronoi cell

Returns

A polygon of the Voronoi cell

Module contents