More and more data is becoming available online every day coming from both the public sector and private sources. As an example, the European data portal registers over 400,000 public datasets online. Most of this data is available via some kind of (semi)structured format (XML, RDF, JSON,…) which, in theory, facilitates its consumption and combination. Indeed, the open data movement promises to bring to the fingertips of every citizen all the data they need, whether it is for planning their next trip or for government oversight.
Unfortunately, this is still far from reality. Our society is opening its data but not building the technology and infrastructure required to enable citizens to access and manipulate it. Only technical people have the skills to consume the heterogeneous data sources while the rest is forced to depend on third-party applications or companies.
This research project aims to change this. Our goal is to empower all citizens to exploit and benefit from the open data, helping them to become not only consumers but also creators of data that add new value to our society. In this sense, the project will automatically infer a unified global schema of the knowledge available in open data sets and present that schema to the average citizen in a way she can easily browse and query to get the information she needs. This request will be then transparently translated into a combined sequence of accesses to the required data sources to retrieve, visualize and republish it (if desired). When several data sources could be used (e.g. due to an overlap in the exposed data) quality aspects of the source or even monetary costs (some sources may be only partially free) will be taken into account to provide an optimal solution.
To achieve this ambitious goal, the project will pursue the following key research contributions:
- APIfication of data sources: (Web) APIs are becoming the de facto choice for publishing content online. We will unify access to all kinds of data sources via an API interface
- Schema discovery: Most sources won’t have any kind of formal description we could use to precisely understand what information the source provides. A systematic analysis of data samples will help us to infer that schema, enriched with annotations regarding quality aspects (e.g. reliability, availability, etc) to better characterize the data source.
- Schema composition: Individual schemas will be matched and merged to create the global schema representing all available knowledge.
- Citizen languages: Human-computer interaction techniques will be used to build a user-friendly language to express and visualize information requests on this global schema.
- Query resolution: Each request will be translated into an optimal sequence of API calls on the underlying data sources to retrieve the data needed to respond to the request.
Project Reference Card
|English Title:||ODA – Open Data for All|
|Original Title:||Open Data para todos: Una infraestructura basada en APIs para la explotación de fuentes de datos online|
|Researcher's beneficiary organization:||Fundació per a la Universitat Oberta de Catalunya|
|Duration:||48 months||Start date:||December 30, 2016|
|End date:||December 29, 2020|
|Area:||National||Project type:||Competitive R&D project|
|Funding entity:||Ministerio de Economía, Industria y Competitividad|
|Principal Investigator:||Jordi Cabot|
|Type of participation:||Team Member|