Data Science in practice
Joost Kok translates his knowledge of data storage and analysis into many different practical applications. He gives a few some examples.
Networks and molecules
Joost Kok conducts research into networks, datasets that refer to yet other data. ‘If you think about webpages, for example, they are datasets in themselves that often refer via a link to another page again.’ Within the scope of their research into networks, Kok and some colleagues looked at molecules. There are 500,000 molecules, some of which are toxic and some of which are not. These molecules all have substructures. If you know which molecules are toxic and which are not, and what their substructures are, can you then use a computer to work out which substances (often composed of multiple molecules) are poisonous and which are not?
‘To find out that kind of thing, you have to dig deep in all the available data. It is not simply a case of working your way through an Excel sheet,’ Kok explains. To make the process easier, he developed software that can easily search structures and substructures (for example, friendship networks such as Facebook).
This software also proved capable of searching the substructures of molecules and determining which substructures are ‘more important’ in a substance that comprises different molecules. The software can quickly determine whether a substance is toxic. This is important information, for example in the development of new treatments. This kind of software also helps computers and robots with deep learning, understanding complex structures such as language and speech. In other words, they are getting smarter quicker.
The Hollandse Brug, a bridge on the A6 motorway between Amsterdam and Almere, had to close from one day to the next in 2007 because a Ministry of Transport employee had discovered a crack. ‘Lorry drivers were disgruntled, because they suddenly needed to take a detour. The firm Strukton decided to implement a plan that involved placing different sensors (such as cameras and weather stations) on the bridge during the renovation work. The aim was to be able to predict when the bridge needs maintenance at an earlier stage, so that motorists are not confronted with a sudden bridge closure again.’ Joost Kok was asked to help figure out the best ways to analyse all the data that the sensors collected. These sensors measure a huge number of factors every day, such as the temperature, the wind, the number of cars on the bridge, the wave force and so on.
Kok and his colleagues started working on computer architecture and software for the project in 2008. No fewer than 528 computers can now simultaneously analyse the data from three months in the space of two minutes. To give an impression of how much data this is: together the sensors collect more data each day than can fit on a DVD. Periodical inspections, which require the bridge to shut, should no longer be necessary.
‘We have learnt a great deal from the project and we now use this knowledge in other sensor projects. For example, we have worked with psychologists who wanted to learn more about children’s play in the school playground and in which “types” of game they interacted most. We investigated this by applying sensors to the children before playtime. These sensors registered when the children came close to one another. This, in combination with video images and analysis software that we wrote for the project, enabled the researchers to conduct a thorough analysis of the children’s play and draw their conclusions.’ Kok and his colleagues continue to work on different sensor projects, including one at the Leiden University Medical Center (LUMC), for which they wrote software to analyse the data from movement sensors on patients.