The automation of important aspects of scientific discovery will
significantly accelerate. Our claim is that given the right knowledge
and methods, computers could autonomously carry out discovery processes
by searching hypothesis spaces in a systematic, comprehensive, and
efficient manner.
In this project, we are investigating a novel approach to automate the
hypothesize-test-evaluate discovery cycle with an intelligent system
that a scientist can task with lines of inquiry to test hypotheses of
interest. We are implementing this approach in DISK (automated DIscovery
of Scientific Knowledge), a system that uses the existing WINGS
intelligent workflow system for scientific data analysis, and applying
it to multiple domains.
Our work to date has focused on four major research objectives:
1) Representing hypotheses and associated evidence and confidence values;
2) Formulating lines of inquiry that express how to test hypotheses by running data analysis workflows against the latest data available;
3) Designing a meta-analysis engine that uses meta-workflows to assess the results of lines of inquiry and revise and extend the original hypotheses accordingly; and
4) Developing intelligent agents for interactive discovery that explain new findings to scientists.
An overview of the DISK framework can be seen in the image above, illustrating how the four main objectives are integrated. First, a user defines the hypothesis to test with the help of the interactive discovery agent, which helps to transform the hypothesis statements into a machine readable representation. If the hypothesis matches a line of inquiry, then the system will start searching for the appropriate data to test it, exploring open repositories for each specific domain.
When the data is found, the workflows in the line of inquiry are sent to the workflow system, where they are executed. The results of the execution are then stored in the DISK system and can be used for posterior analysis. Finally, the repository and previous results are explored by the metaworkflows associated with the line of inquiry to create a revision of the original hypothesis.
To represent hypotheses and goals we use the Scientific Question Ontology (SQO). SQO aims to define Semantic Templates for Scientific Questions, in order to be make them customizable by users. This ontology defines Question Templates, Question Variables and their respective options and constraints. This way users are guided on the creation of testable hypothesis.
The latest version of the ontology is available at w3id.org/sqo