ROAD-MAP stands for Repository of Open Access Data – Malaria Atlas Project. The II refers to our second major tranche of funding, which will go towards the following primary outcomes:
Assembly, curation and dissemination of input malariometric and relevant ancillary data.
Curation and dissemination of modelled malariometric surfaces (spatial) and cubes (spatiotemporal) to support malaria research community.
Packaging, dissemination, and tailoring services for policy and control communities.
The diagram below gives a high-level summary of the application stack being developed to help meet the above outcomes. Each red circled number is explained in the following sections.
1) MAP Administration Systems.
These are internal systems concerned with servers, authentication, documentation, code source control, and working practices
2) Relational Databases
Relational databases for storing malariometric data (Raw Data Stack on the diagram) and spatial data (Vector Data on the diagram). This data is recorded at the lowest possible level of granularity – for example, an individual survey respondent. These relational database models are currently being designed, with a view to making them scaleable and extensible.
An important byproduct of this approach is that although the database will store malariometric data, the design is such that other epidemiological data (e.g Dengue Fever) could just as easily be accommodated. We intend to publicize this database design and make the database creation scripts and documentation available via open source platforms.
3) The “Business Intelligence” database.
Business Intelligence (hereafter BI) is a term widely used in the IT Industry to refer to reporting from aggregated data sources such as a data warehouse. Mariometric data are usually aggregated to cluster level (for example at the level of a village).
In our technology stack, we have chosen to separate out the relational database stack (explained in point 2 above) from the BI stack in order to allow the maximum degree of flexibility over how to aggregate the relational data for the BI stack. Reporting is generally done against the BI stack for the sake of processing efficiency.
4) ETL Tools
MAP accumulates published and unpublished data from a variety of sources, with data in both structured and unstructured formats. In order to efficiently incorporate these data into our relational database stack, we will use ETL tools to write scripts.
ETL is an acronym from the IT Industry that stands for “Extract, Transform, and Load”. Each of these three steps are relevant to how we bring data into our databases and ensure proper data integrity and auditability. In addition, we will use the same ETL tools to populate the BI database stack (point 3) from the relational database stack (point 2)
5) MAP Processing Toolbox and Server
A core function for the new MAP data stack will be to allow users to easily create maps using our malariometric and covariate data. We will develop a “tool box” of scripts and procedures to make this possible.
6) Geo Portal and Geo Server stack.
For the sake of data security, all the above points need to be securely behind our firewalls. In order to expose the data safely to our externally facing website (www.map.ox.ac.uk), we will use a combination of a geospatial server and portal. We are exploring which technology to use for this but there are several avenues, both open-source and proprietary.
7) MAP API
An Application Programming Interface (API) will be provided to allow:
- The automated extraction of data (for example to other systems)
- The creation of spreadsheet templates for collecting data
- Easy uploading of data into MAP databases
8) Public MAP Pages
The next three points relate to what public users of the system will see. www.map.ox.ac.uk will have a number of pages describing what MAP does, explaining the science and statistical models, crediting our collaborators, and so on. These will be managed via the open-source content management system Drupal. Drupal has been selected because there of its good functionality and large user-base within the University sector.
9) “Explorer Content”
This will be the replacement systems for www.map.ox.ac.uk/explorer, with improved functionality and greater range of data and raster surfaces available. Access will be principally via non-authenticated public access but please see the next point. New features will include:
Creation of maps, with the ability to incorporate modelled outputs and covariates
Downloading of shape files with associated raw or aggregated data
Graphs and charts
Query data and modelled outputs for a given coordinate
Vector (i.e. mosquito) distribution
Blood disorder distribution
10) Additional research tools
In addition to the non-authenticated public access to the data described in 8 above, people will be able to register for an account, which will allow authenticated access to the data. This access allows further features to be made available as follows:
The ability to save searches and outputs
The ability to store private data and choose when and how to share it
The ability to choose how to aggregate raw data, stored as the user’s own private data-warehouse
Fully backed-up, secure, and properly curated IT infrastructure
In addition to setting up the service and providing public and authenticated access, all the code and documentation relating to the above technology stack will be made available via an open source platform. This would allow third parties to create their own instances of this technology stack to support (for example) research into another epidemiology such as Dengue Fever.
We are currently at the stage of gathering requirements, evaluating technology options, and designing and building the relational and BI database stacks. We are recruiting an application developer on a fixed-term contract, with a start date of the beginning of September. It is at this point the main user-facing tools and website will be built.
We have established a number of groups to work with us as “early adopters” of the new system in order to run a pilot system from the beginning of 2016. Our plan is to have a final version released by April 2016.