News

Molecular big data

Using quantum mechanical calculations, a dataset of 62k organic molecules was generated
Dataset overview and dataset statistics for 62k dataset of organic molecules and their properties
Dataset statistics: Number of molecules and number of different chemical species in each data subset. Sample molecules are shown on the right.

We generated a dataset of 61,489 molecules extracted from organic crystals in the Cambridge Structural Database. For each molecule, the geometry and several electronic properties calculated with density-functional theory are available. For two subsets, we also supply data from higher level methods, such as hybrid functionals and the GW Green's function method. The dataset is available at Nature Scientific Data. The data is open access and can be freely used for applications, data science and machine learning.

  • Updated:
  • Published:
Share
URL copied!

Read more news

A black hand touches a tablet screen with white shapes. Papers and a pen are on a pink surface.
Research & Art Published:

Training available in AI, research data management, research ethics + more – register now!

New topics included! Registrations for spring 2026 are open.
A person presenting to an audience in a modern lecture hall with yellow bean bags and wooden chairs.
Research & Art, Studies Published:

New Innovation Postdoc programme launching this spring in Aalto

Innovation Postdoc launching this spring for AI researchers eager to turn cutting-edge research into real-world impact.
A group of people standing in front of a Kemira sign and a world map made of small spheres.
Research & Art Published:

Kemira Hosts TexirC Results Meeting

Kemira hosted the results meeting of the TexirC project on February 3, 2026.
Diagram showing individual and group behaviours, with comparison views of a robotic arm on a checkered floor.
Research & Art Published:

Better AI models by incorporating user feedback into training

New research improves a popular method for fine-tuning AI models by 60% using visualization tools.