DISK DISK: Automated DIscovery of Scientific Knowledge

The DISK Project

The automation of important aspects of scientific discovery will significantly accelerate. Our claim is that given the right knowledge and methods, computers could autonomously carry out discovery processes by searching hypothesis spaces in a systematic, comprehensive, and efficient manner.

In this project, we are investigating a novel approach to automate the hypothesize-test-evaluate discovery cycle with an intelligent system that a scientist can task with lines of inquiry to test hypotheses of interest. We are implementing this approach in DISK (automated DIscovery of Scientific Knowledge), a system that uses the existing WINGS intelligent workflow system for scientific data analysis, and applying it to multiple domains.

Our work to date has focused on four major research objectives:

1) Representing hypotheses and associated evidence and confidence values;

2) Formulating lines of inquiry that express how to test hypotheses by running data analysis workflows against the latest data available;

3) Designing a meta-analysis engine that uses meta-workflows to assess the results of lines of inquiry and revise and extend the original hypotheses accordingly; and

4) Developing intelligent agents for interactive discovery that explain new findings to scientists.

An overview of the DISK framework
An Overview of the DISK framework

An overview of the DISK framework can be seen in the image above, illustrating how the four main objectives are integrated. First, a user defines the hypothesis to test with the help of the interactive discovery agent, which helps to transform the hypothesis statements into a machine readable representation. If the hypothesis matches a line of inquiry, then the system will start searching for the appropriate data to test it, exploring open repositories for each specific domain.

When the data is found, the workflows in the line of inquiry are sent to the workflow system, where they are executed. The results of the execution are then stored in the DISK system and can be used for posterior analysis. Finally, the repository and previous results are explored by the metaworkflows associated with the line of inquiry to create a revision of the original hypothesis.


Representing hypotheses

To represent hypotheses and goals we use the Scientific Question Ontology (SQO). SQO aims to define Semantic Templates for Scientific Questions, in order to be make them customizable by users. This ontology defines Question Templates, Question Variables and their respective options and constraints. This way users are guided on the creation of testable hypothesis.

The latest version of the ontology is available at w3id.org/sqo