mobility_pipeline.lib package¶
Submodules¶
mobility_pipeline.lib.make_matrix module¶
Functions for making tower-tower, tower-admin, and admin-tower matrices
Matrices:
tower-tower: The raw mobility data between cell towers. The value at row
iand columnjis the number of people who move on that day from the region served by towerito the region served by towerj. Note that really, this is the number of cell phones that connect to toweriin the morning and towerjin the evening, which we assume represents a person moving. This matrix has row indices of the origin towers and column indices of the destination towers.tower-admin: Computed from the Voronoi tessellation and the country shapefile, this matrix represents the percent of each admin that is covered by each tower. For any
xin the matrix at rowiand columnj, we know that a fractionxof the admin with indexiis covered by the tower with indexj. This means that the matrix has row indices of admin indices and column indices of tower indices.admin-tower: Computed from the Voronoi tessellation and the country shapefile, this matrix represents the percent of each tower’s range that is within each admin. For any
xin the matrix at rowiand columnj, we know that a fractionxof the Voronoi cell for the tower with indexiis within the admin with indexj. This means that the matrix has row indices of tower indices and column indices of admin indices.admin-admin: This is the final mobility matrix, which represents the number of people who move between admins each day. The value at row
iand columnjis the number of people who move on that day from the admin with indexito the admin with indexj. This is, of course, being estimated from cell phone data and the overlaps as computed in the other matrices.
This strategy is explained by Mike Fabrikant at UNICEF: https://medium.com/@mikefabrikant/cell-towers-chiefdoms-and-anonymized-call-detail-records-a-guide-to-creating-a-mobility-matrix-d2d5c1bafb68
-
mobility_pipeline.lib.make_matrix.generate_rtree(polygons: collections.abc.Sequence) → Tuple[shapely.strtree.STRtree, Dict[Tuple[tuple, ...], int]]¶ Helper function that builds an RTree from MultiPolygons
The Rtree is built from MultiPolygons using
shapely.strtree.STRtree. Since the RTree returns the overlapping MultiPolygons, we need a way to retrieve the polygon’s index. We do this with a dictionary from the exterior coordinates (Polygon.exterior.coords) of every Polygon in the MultiPolygon to the MultiPolygon’s index in the provided Sequence.Specifically, you can generate the key for a given MultiPolygon
mpolylike so:key = tuple([tuple(p.exterior.coords) for p in mpoly])
- Parameters
polygons – A Sequence of MultiPolygons. Must be iterable and able to be passed to the
STRtreeconstructor. Iteration must be deterministic.- Returns
A tuple of the RTree and the index mapping dictionary.
-
mobility_pipeline.lib.make_matrix.make_a_to_b_matrix(a_cells: List[shapely.geometry.multipolygon.MultiPolygon], b_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray¶ Create an overlap matrix from sequence A to B
Computes for every pair of MultiPolygons between A and B, the fraction of the MultiPolygon in B that is covered by the one in A. We use an RTree to reduce the number of overlaps we have to compute by only computing overlaps between MultiPolygons that have overlapping bounding boxes.
- Parameters
a_cells – Sequence A of MultiPolygons
b_cells – Sequence B of MultiPolygons
- Returns
A matrix with row indices that correspond to the indices of B and column indices that correspond to the indices of A. Every element at row
iand columnjin the matrix represents the fraction of the MultiPolygon in B at indexithat overlaps with the MultiPolygon in A at indexj.
-
mobility_pipeline.lib.make_matrix.make_admin_admin_matrix(tower_tower: numpy.ndarray, tower_admin: numpy.ndarray, admin_tower: numpy.ndarray) → numpy.ndarray¶ Compute the admin-to-admin matrix
Computed by multiplying the three provided matrices like so: (tower_admin) * (tower_tower) * (admin_tower)
- Parameters
tower_tower – The tower-to-tower mobility data
tower_admin – Stores the fraction of each admin that is covered by each cell tower
admin_tower – Stores the fraction of each cell tower’s range that is within each admin
- Returns
An admin-to-admin mobility matrix such that each value with row index
iand column indexjis the estimated number of people who moved that day from the admin with indexito the admin with indexj.
-
mobility_pipeline.lib.make_matrix.make_admin_to_tower_matrix(admin_cells: List[shapely.geometry.multipolygon.MultiPolygon], tower_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray¶ Compute the admin-to-tower matrix.
This is a wrapper function for
make_a_to_b_matrix(), with matrices A and B as denoted for each argument.- Parameters
tower_cells – Sequence of Voronoi cells; used as matrix A.
admin_cells – Sequence of administrative regions; used as matrix B.
- Returns
The admin-tower matrix.
-
mobility_pipeline.lib.make_matrix.make_tower_to_admin_matrix(tower_cells: List[shapely.geometry.multipolygon.MultiPolygon], admin_cells: List[shapely.geometry.multipolygon.MultiPolygon]) → numpy.ndarray¶ Compute the tower-to-admin matrix.
This is a wrapper function for
make_a_to_b_matrix(), with matrices A and B as denoted for each argument.- Parameters
admin_cells – Sequence of administrative regions; used as matrix A.
tower_cells – Sequence of Voronoi cells; used as matrix B.
- Returns
The tower-admin matrix.
-
mobility_pipeline.lib.make_matrix.make_tower_tower_matrix(mobility: pandas.core.frame.DataFrame, n_towers: int) → numpy.ndarray¶ Make tower-to-tower mobility matrix
Thank you to Tomas Bencomo (https://github.com/tjbencomo) for writing the initial version of this function.
- Parameters
mobility – DataFrame of mobility data with columns
[ORIGIN, DESTINATION, COUNT]. All values should be numeric.n_towers – Number of towers, which defines the length of each matrix dimension
- Returns
The tower-to-tower matrix, which has shape
(n_towers, n_towers)and where the value at rowiand columnjis the mobility count for originiand destinationj.
mobility_pipeline.lib.overlap module¶
Utilities for working with overlapping Polygons and MultiPolygons
-
mobility_pipeline.lib.overlap.compute_overlap(polygon_1: Union[shapely.geometry.polygon.Polygon, shapely.geometry.multipolygon.MultiPolygon], polygon_2: Union[shapely.geometry.polygon.Polygon, shapely.geometry.multipolygon.MultiPolygon])¶ Computes the fraction of the first polygon that intersects the second
The returned fraction is
(area of intersection) / (area of polygon_1).- Parameters
polygon_1 – The first polygon, whose total area will be the denominator for the computed fraction
polygon_2 – The second polygon
- Returns
The fraction of the first polygon that intersects the second
mobility_pipeline.lib.validate module¶
Functions for validating data file formats and contents
-
mobility_pipeline.lib.validate.AREA_THRESHOLD= 0.0001¶ Allowable deviance, as a fraction of the area of the union, between the area of the union of polygons and the sum of the polygons’ individual areas. Agreement between these values indicates the polygons are disjoint and contiguous. Threshold was chosen based on the deviances in known good Voronoi tessellations.
-
mobility_pipeline.lib.validate.all_numeric(string: str) → bool¶ Check that a string is composed entirely of digits
- Parameters
string – String to check
- Returns
True if and only if the string is composed entirely of digits
-
mobility_pipeline.lib.validate.validate_admins(country_id) → Optional[str]¶ Check that the admins defined in the shapefile are reasonable
Admins are loaded using
load_admin_cells().Checks:
That the cells can be loaded by
load_admin_cells.That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal.
That at least one cell is loaded.
- Parameters
country_id – Country identifier.
- Returns
A description of a found error, or
Noneif no error found.
-
mobility_pipeline.lib.validate.validate_contiguous_disjoint_cells(cells: List[Union[shapely.geometry.multipolygon.MultiPolygon, shapely.geometry.polygon.Polygon]])¶ Check that cells are contiguous and disjoint and that they exist
Checks:
That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal. The allowable deviation is specified by
AREA_THRESHOLDThat at least one cell is loaded.
- Returns
A description of a found error, or
Noneif no error found.
-
mobility_pipeline.lib.validate.validate_mobility(raw: List[List[str]]) → Optional[str]¶ Checks that the text from a CSV file is in a valid format for mobility
The text must consist of a list of rows, where each row is a list of exactly 4 strings: a date (not checked), an origin tower, a destination tower, and a count.
The origin and destination must be composed of digits following
data_interface.TOWER_PREFIX. The count must be composed entirely of digits and represent a non-negative integer.The origin and destination tower numeric portions must strictly increase in origin-major order.
- Parameters
raw – List of mobility CSV data by applying
list(csv.reader(f))- Returns
None if the input is valid, a string describing the error otherwise.
-
mobility_pipeline.lib.validate.validate_mobility_full(mobility: List[List[str]]) → Optional[str]¶ Check whether the mobility data file is correctly ordered and full
The mobility data file is loaded from the file at path
mobility_pipeline.data_interface.MOBILITY_PATH(). Correctly ordered means that the tower names’ numeric portions strictly increase in origin-major order. Full means that there is a row for every combination of origin and destination tower.If this order were perfect, it would make forming the mobility matrix as easy as reshaping the last column. Unfortunately, this function showed that some coordinates are missing or out of order, so counts must be inserted manually.
- Parameters
mobility – List of mobility CSV data by applying
list(csv.reader(f))- Returns
None if there is no error, otherwise a description of the error.
-
mobility_pipeline.lib.validate.validate_tower_cells_aligned(cells: List[shapely.geometry.multipolygon.MultiPolygon], towers: numpy.ndarray) → Optional[str]¶ Check that each tower’s index matches the cell at the same index
For any cell
cat indexi, an error is found ifchas nonzero area and the tower at indexiis not withinc.- Parameters
cells – List of the cells (multi) polygons, in order
towers – List of the towers’ coordinates (latitude, longitude), in order
- Returns
A description of a found error, or
Noneif no error found.
-
mobility_pipeline.lib.validate.validate_tower_index_name_aligned(csv_reader: Iterator) → Optional[str]¶ Check that in the towers data file, the tower names match their indices
Indices are zero-indexed from the second row in the file (to skip the header). An error is considered found if any tower name is not exactly
TOWER_PREFIXappended with the tower’s index.- Parameters
csv_reader – CSV reader from calling
csv.reader(f)on the open data filef- Returns
A description of a found error, or
Noneif no error found.
-
mobility_pipeline.lib.validate.validate_voronoi(voronoi_path) → Optional[str]¶ Check that the Voronoi cells are reasonable
Checks:
That the cells can be loaded by
load_cells.That the cells are contiguous and disjoint. This is checked by comparing the sum of areas of each polygon and the area of their union. These two should be equal.
That at least one cell is loaded.
- Returns
A description of a found error, or
Noneif no error found.
mobility_pipeline.lib.voronoi module¶
Tools for working with Voronoi tessellations
Given a 2-dimensional space with a set of points (called seeds), the Voronoi tessellation is a partitioning of the space such that for every partition, which is called a cell, the cell contains exactly one seed, and every point in the cell is closer to the cell’s seed than it is to any other seed. For more information, see https://en.wikipedia.org/wiki/Voronoi_diagram.
-
class
mobility_pipeline.lib.voronoi.VoronoiCell¶ Bases:
dictThis class describes is for type hinting Voronoi cell JSONs
-
mobility_pipeline.lib.voronoi.json_to_polygon(points_json: List[List[float]]) → shapely.geometry.polygon.Polygon¶ Loads a Polygon from a JSON of points
Loads Polygon from a JSON of the format:
where each latitude-longitude pair describes a point defining the boundary of the polygon.
- Parameters
points_json – The points that define the boundary of the polygon
- Returns
A polygon
-
mobility_pipeline.lib.voronoi.load_cell(cell_json: mobility_pipeline.lib.voronoi.VoronoiCell) → shapely.geometry.multipolygon.MultiPolygon¶ Loads a Voronoi cell from JSON in
PolygonorMultiPolygonformatLoads Voronoi cell from a JSON of the
Polygonformat:or of the
MultiPolygonformat:where each latitude-longitude pair describes a point of the Voronoi tessellation. If the JSON is in the
Polygonformat, ashapely.geometry.MultiPolygonobject will be returned where theMultiPolygonhas one member, the described polygon.The value of the
typekey is used to distinguishPolygonandMultiPolygonformats.- Parameters
cell_json – The points that define the boundary of the Voronoi cell
- Returns
A polygon of the Voronoi cell