My research passion
I help biologists put information from literature into a form that computers can understand.
Biologists and computers
If you want to understand how cancer or Alzheimer’s work, then you’re dealing with really complex biological systems. So “systems biologists” start by collecting lots of information from the scientific literature. And they often put that into Excel sheets, to generate a proper overview.
But it’s impossible to collect all knowledge into Excel tables. Because there are so many different types of research. It doesn’t fit. The information from research conclusions is so diverse, or is so context-dependent, that we’ll never have enough tables or columns to hold all relevant details.
For example, you don’t want to just store that “molecule A binds to DNA”, when the paper actually reported that “A, attached to molecule B, binds DNA, in cell type C, under some conditions D, according to an experiment type E”. You want to store all relevant context! And the next day you’d want to capture something completely different, like “neuron X with gene variant Y does not signal brain region Z, when molecules M or N are depleted”. And so on.
Capturing real-world information is not practical with any current tool.
One tool to catch it all
Therefore I invented the tool ‘VSM’. VSM is a new method (and user-interface) that enables people to translate any kind of information into a digital form, and in particular a form that both humans and computers can understand. So let me rephrase – this one, single tool allows you to take any complex thought, anything you can think of, and allows you to formulate it into a precise digital form, that is both easily readable by humans and ‘understandable’ by computers.
It is hard to emphasize enough how important this is.
It means that for example, a biologist could use this tool to create a full, digital summary of all findings reported in a scientific publication.
And imagine this: we could make something like a Wikipedia, but where every page holds the summary of one scientific publication, in a form that is both human-readable and computer-understandable! – And if we crowd-source this, then over time this could become like a giant digital brain that contains all scientific knowledge, in computable form. Then we could ask it questions, or visualize networks and relations, or it could help us reason over the sum of human knowledge.
That is my research vision. And mission.
First applications
Apart from a 100 demo examples, we are already testing VSM in a couple of biological niches. And our users are giving enthusiastic feedback.
I’m excited about all the complexity we can now handle in such an elegant manner. It allows us to focus on the biology; no more need to worry about the entry format so much. It makes it so quick and smooth
Looking far ahead
In a 100 years, the digital brain I mentioned will most certainly exist. It has to. The current situation is that biologists have to dig for pieces of information that are hidden in the 25-million-and-growing research papers. That is often an inefficient use of research time.
We must start producing computable knowledge on a large scale.
And no. Specialized algorithms can not yet extract knowledge from text reliably. And small numbers of people can never extract enough knowledge. We need a user-friendly tool like VSM, that can be scaled up for use in a huge crowd-sourcing approach.
This means that this project is not just a technical challenge, but also a social challenge. To raise awareness, organization, and a critical mass of computable knowledge. To create an open resource where people automatically start going to and contribute. Enabled by a universal information-capturing interface, shared across biological domains.
Our immediate plans to develop this further
The road has been hard. Very hard. I designed much of the theoretical work for VSM by the time my Postdoc ended here in 2011. And even though we wrote a dozen project proposals for funding and got no support, I decided to program a prototype anyway, without funding. The resulting web-platform (SciCura) has been used intensively at NTNU.
The past three years were better. I got a Researcher position for extending that prototype and user support. – But as any software developer will tell you: a prototype is far from a universally usable tool, and you’ll have to rebuild it properly.
So, given my determination that this tool must exist in this world, and my experience with unpredictable funding, I prioritized two things: 1) to write down all the ideas at least for future use, and 2) to create an open-source, well-designed, well-documented, reusable version of the core user-interface. – And now this exists: the ideas are here, and a proper web-component is quite-ready on GitHub. Official manuscripts that incorporate lots of feedback are also being finalized.
And now funding is about to dry up again. Although it would be great to plan for the ambitious future that I described to you, this is impossible on short-term contracts. So the plan is… to resume the search for funding or partnerships.
Conclusion
In the combined words of dozens of external observers
This is high-risk, high-reward research. Really important, solid, well-founded. Bold, but feasible. This should definitely be funded. It is too important not to try
So it is my and many others’ hope that this story is… to be continued.
Further info
– An in-depth, enjoyable long-blog intro to the research field
– A project proposal on VSM and SciCura (PDF)
– A full explanation of VSM (soon to be submitted for official publication)
– About me
This blog post was written by Steven Vercruysse, Researcher at the Department of Biology.