The automation of important aspects of scientific discovery will
significantly accelerate. Our claim is that given the right knowledge
and methods, computers could autonomously carry out discovery processes
by searching hypothesis spaces in a systematic, comprehensive, and
efficient manner.
In this project, we are investigating a novel approach to automate the
hypothesize-test-evaluate discovery cycle with an intelligent system
that a scientist can task with lines of inquiry to test hypotheses of
interest. We are implementing this approach in DISK (automated DIscovery
of Scientific Knowledge), a system that extends the existing WINGS
intelligent workflow system for scientific data analysis, and applying
it to multi-omics.
Our work to date has focused on four major research objectives:
1) Representing hypotheses and associated evidence and confidence values;
2) Formulating lines of inquiry that express how to test hypotheses by running data analysis workflows against the data available;
3) Designing a meta-analysis engine that uses meta-workflows to assess the results of lines of inquiry and to revise and extend the original hypotheses accordingly; and
4) Developing intelligent agents for interactive discovery that explain new findings to scientists.
An overview of the DISK framework can be seen in the image above, illustrating how the four main objectives are integrated. First, a user defines the hypothesis to test with the help of the interactive discovery agent, which helps to transform the hypothesis statements into a machine readable representation. If the hypothesis matches a line of inquiry, then the system will start searching for the appropriate data to test it, exploring open repositories like the TCGA.
When the data is found, the workflows in the line of inquiry are sent to the workflow system, where they are executed. The results of the execution are then stored in a Linked Data repository. Finally, the repository is explored by the metaworkflows associated with the line of inquiry to analyze the results of all the workflows and create a revision of the original hypothesis.