The DISK Project
The automation of important aspects of scientific discovery will significantly accelerate. Our claim is that given the right knowledge and methods, computers could autonomously carry out discovery processes by searching hypothesis spaces in a systematic, comprehensive, and efficient manner.
In this project, we are investigating a novel approach to automate the hypothesize-test-evaluate discovery cycle with an intelligent system that a scientist can task with lines of inquiry to test hypotheses of interest. We are implementing this approach in DISK (automated DIscovery of Scientific Knowledge), a system that extends the existing WINGS intelligent workflow system for scientific data analysis, and applying it to multi-omics.
Our work to date has focused on four major research objectives:
1) Representing hypotheses and associated evidence and confidence values;
2) Formulating lines of inquiry that express how to test hypotheses by running data analysis workflows against the data available;
3) Designing a meta-analysis engine that uses meta-workflows to assess the results of lines of inquiry and to revise and extend the original hypotheses accordingly; and
4) Developing intelligent agents for interactive discovery that explain new findings to scientists.
An overview of the DISK framework can be seen in the image above, illustrating how the four main objectives are integrated. First, a user defines the hypothesis to test with the help of the interactive discovery agent, which helps to transform the hypothesis statements into a machine readable representation. If the hypothesis matches a line of inquiry, then the system will start searching for the appropriate data to test it, exploring open repositories like the TCGA.
When the data is found, the workflows in the line of inquiry are sent to the workflow system, where they are executed. The results of the execution are then stored in a Linked Data repository. Finally, the repository is explored by the metaworkflows associated with the line of inquiry to analyze the results of all the workflows and create a revision of the original hypothesis.