Across research institutions and industrial partners, decades of microstructural characterisation have produced vast archives of optical microscopy (OM), scanning electron microscopy (SEM) and electron backscatter diffraction (EBSD) images.
These archives represent an enormous investment in experimental effort, yet much of this data remains underutilised. Images are scattered across local drives, labelling is inconsistent or incomplete, and finding relevant prior work often relies on individual researchers’ memory, rather than systematic search.
Within AID4GREENEST, ePotentia is developing MicrostructureDB, an open data platform that combines structured data management with integrated AI tools, making both the data and the intelligence built on top of it directly accessible through an intuitive web interface.
MicrostructureDB development is supported by Ghent University, which provides seed data, metallurgical expertise and model training, and Fraunhofer IWM for insights into standardisation of data and workflows and various other partners for contributions in data and user feedback.
From pixels to materials space
The core of MicrostructureDB’s AI capability is a foundation model for steel microstructures: a deep learning system trained using self-supervised learning techniques that converts any microstructural image into a compact numerical representation, a “fingerprint”, where similar microstructures are automatically grouped together.
This mapping is learned entirely from visual structure, without manual labels, making it applicable to the messy, inconsistently labelled historical archives that are common in practice.
The result is a materials space where thousands of images are organised by visual and morphological similarity.
In this space, related microstructures cluster naturally, different magnifications of the same steel family sit nearby, dual-phase microstructures appear between their constituent phases, and imaging artifacts stand out as outliers.
Researchers can explore this space interactively, discovering relationships across datasets that would be invisible through manual inspection.
Overcoming data scarcity with synthetic microstructures
Even in large archives, data for certain steel types can be rare, either because the material is uncommon, or because the data is proprietary. To address this, generative AI models developed within the project produce synthetic microstructure images that expand the available data. These synthetic images are now virtually indistinguishable from real micrographs by visual inspection alone.
The foundation model’s embedding space serves as a quality check: synthetic images that land in plausible regions alongside real data confirm the generator’s fidelity. In contrast, the model’s ability to detect subtle differences ensures quality control.
From representation to practical tools
The foundation model powers several practical tools integrated directly into the platform:
- Image similarity search: uploading a new image immediately retrieves the most similar microstructures from the entire archive, helping researchers find relevant prior work across steel families and institutions.
- Anomaly detection: unusual images, whether due to mislabelling, imaging artifacts, or genuinely novel microstructures, are automatically flagged, supporting data quality management.
- Property and processing prediction: the learned representations serve as input features for models that predict mechanical properties, processing parameters, and phase classifications.
- Automated EBSD batch processing: raw EBSD data can be uploaded and automatically processed to generate orientation maps, phase maps, grain statistics, and other derived visualisations, all viewable live in the browser without requiring local analysis software.
The EBSD processing pipeline has received particularly positive feedback, as it removes one of the most time-consuming bottlenecks in the characterisation workflow, turning hours of manual post-processing into automated, reproducible analysis accessible from any computer.
Explainability and trust
We recognise that AI tools are only useful if domain experts can understand and trust their outputs. Our foundation model provides inherent interpretability maps, spatial visualisations showing which regions of an image the model focuses on when making its assessment.
These maps consistently highlight metallurgically relevant features such as grain boundaries, precipitate regions, and phase interfaces, providing researchers with intuitive explanations of the AI’s reasoning.
Making AI accessible
A key challenge with AI in materials science is that powerful models often remain locked in Python notebooks, accessible only to those with programming expertise. MicrostructureDB addresses this through a web-based interface, where all AI functionality is available through point-and-click interactions.
In addition, we are developing a chatbot interface that allows researchers to query the database and AI tools using natural language, asking questions like “show me similar microstructures to this sample” or, “what processing conditions produced this type of bainite?
Current status and call for contributors
The AI pipeline is operational with the foundation model integrated into the platform. We are currently refining the automation of the embedding and prediction pipeline, developing novel explainable AI methods tailored to metallurgical interpretation, and preparing for cross-partner validation with data from consortium members.
During the current beta phase, the platform and AI tools can be requested for free by data contributors, reflecting the project’s commitment to open data and collaborative development.
MicrostructureDB supports multiple data ingestion pathways, browser upload, API access, and local synchronisation tools, to accommodate different institutional workflows.
Following the project, a credit-based system will sustain the AI services, with free monthly credits and additional credits for open data contributors, ensuring that sharing data is always rewarded.
We welcome consortium partners and external collaborators interested in contributing microstructural data or testing the platform. Contact: michael.sluydts@epotentia.com