IDTechEx Explains the Pillars of Success in Materials Informatics
May 16, 2024
Sam Dale
The impact of machine learning and data science techniques on the materials industry has grown exponentially since IDTechEx started covering the field of materials informatics in 2020, impacting the real world from lightweight alloys to new battery chemistries. However, the keys to project success here have changed remarkably little in this time, as outlined in IDTechEx's report, "Materials Informatics 2024-2034: Markets, Strategies, Players". What, then, are the essential areas of focus for an organization looking to deploy a materials informatics strategy, whether as an end-user or as a provider of software?
Pieces of the materials informatics puzzle. Source: IDTechEx
"Garbage in, garbage out" - the problem of sourcing good data
Machine learning models can only ever be as good as the data they are trained on. The major choices consist of performing your own experiments, computational simulation, pulling data from public or private repositories, or scraping data from patents and scientific literature.
Predictably, experimentation is the gold standard in terms of accuracy and will almost universally be needed at the verification stage of a materials informatics project, but the high costs can limit data volume significantly. Simulation is cheaper, but the financial and monetary expense of computation remains significant, and accuracy to reality can be dubious.
On the other hand, if data repositories and scraped data cover the right problem space, they can yield higher data volumes, but bias is likely to be a significant issue, including the non-inclusion of negative results. Often, limited information is available about the full experimental conditions. Some larger end-users of materials informatics have told IDTechEx that the limitations of external data have led them to reject it entirely, but this is not an option for players without such deep pockets.
Japanese player Preferred Computational Chemistry (PFCC) have approached the challenge of speeding up data gathering with their Matlantis software, which trains a graph neural network surrogate model for potential energy surfaces on density functional theory (DFT) data. By taking millions of DFT simulation results and performing its on unstable structures close to the stable structures in the pre-existing data, PFCC's model has a much fuller picture of the infinite combinatorial space, allowing it to model first-principles results more closely. The end product reduces the time needed to get results to seconds, compared to DFT simulations, which can take hours to months. The model will only ever be as good as the DFT data it's trained on, but being able to produce more results will offset much of this disadvantage.
Pulling information together
Managing data is typically the key stumbling block for a materials firm seeking digital transformation in its R&D efforts, especially given the conservative nature of this industry. Electronic lab notebook and laboratory information management (ELN/LIMS) software generally form the easiest off-the-shelf tools for moving away from disparate Excel files or even paper notebooks. Problems arise when different business units take different approaches here, which can lead to data siloing and overspending.
Fortunately, most materials informatics software is designed to interface with the APIs of common ELN/LIMS software. Indeed, the offerings of some materials informatics providers, including Uncountable Inc., MaterialsZone, and Albert Invent, focus heavily on managing information in the lab while integrating advanced machine learning features. For end-users looking for a one-stop shop, opting for an integrated platform may make a lot of sense.
Applying AI requires creative approaches
There are opportunities to use machine learning at every stage of the materials informatics process, from scraping data to using large language models to help design an experimental process. However, the process commonly of most interest is modeling the behavior of a class of materials to suggest candidates that meet a desired set of properties.
This inverse design process will typically use high-dimensional data that often has many missing values and may be pulled from many data sources, offering a substantially different challenge from "big data" problems. Active or sequential learning, where the performance of suggested candidates is verified and the underlying model retrained, is a common approach pursued by players like Citrine Informatics to forming an optimal experimental strategy. Advanced AI approaches abound: a cherry-picked example from UK player Intellegens modifies the input/output structure of neural networks to allow missing properties to be estimated iteratively. The peculiarity of this class of problem to materials science is why materials companies have, in general, been founded to focus on materials R&D instead of pivoting from another class of AI problem.
Usability is key
The role of a materials informatics provider is to link the expertise of data and materials scientists, who will typically have significantly different expertise. Interfaces need to be accessible to users who have no coding experience while offering more powerful code inputs to those with programming and machine-learning skills. Visualization of results should be flexible and intuitive to allow users to get the most out of the platform. The materials informatics SaaS companies that have seen the most success so far have tended to excel in making the software easy to use while offering more advanced tools for power users, allowing end users to get enthusiastic about the platform long before its use on a full-scale commercial project. Putting usability front and center should be a top priority for anyone looking to enter this industry.
Further insights
IDTechEx's recent report, "Materials Informatics 2024-2034: Markets, Strategies, Players",is now in its fourth edition. Informed by first-hand interviews with the industry's major players, the report provides market forecasts, player profiles, investments, roadmaps, and comprehensive company lists, making this essential reading for anyone wanting to get ahead in this field.
To find out more about this report, including downloadable sample pages, please visit www.IDTechEx.com/MaterialsInformatics.
For the full portfolio of advanced materials and critical minerals market research from IDTechEx, please visit www.IDTechEx.com/Research/AM.
IDTechEx provides trusted independent research on emerging technologies and their markets. Since 1999, we have been helping our clients to understand new technologies, their supply chains, market requirements, opportunities and forecasts. For more information, contact research@IDTechEx.com or visit www.IDTechEx.com.