University of Turku and SiloGen launch consortium to build the world’s largest open LLM

28.08.2023

The University of Turku and one of Europe’s largest private AI labs Silo AI, with its large language model (LLM) arm SiloGen, are today announcing a large-scale initiative on open and trustworthy LLMs. SiloGen launches a consortium together with TurkuNLP research group to develop a family of open LLMs, including the world’s largest open source LLM. The initiative aims to ensure European digital sovereignty and democratise access to LLMs. 

To ensure digital sovereignty and democratise access to LLMs, SiloGen and TurkuNLP research group at the University of Turku launch a consortium with a key focus to develop the world’s largest open source language model, covering all official European languages. 

In addition to compute access totaling approximately 15 million GPU hours, the initiative is dedicated to ensure that data utilized in these models accurately represent European languages, also covering the English-speaking world. The initiative is conducted in close collaboration with key European institutions and agencies, and is committed to adhering to European regulations. Beyond Europe, the open source initiative will democratise access to LLMs and enable the development of use-case specific downstream applications.

The consortium, led by SiloGen and the TurkuNLP research group at the University of Turku, stands apart from many other initiatives in that it uniquely combines resources to build LLMs:

  • World-class LLM team, including professors and leading scholars like Filip Ginter, Jussi Karlgren, Sampo Pyysalo, Magnus Sahlgren, Aarne Talman among others, as well as others involved out of Silo AI’s more than 150 PhDs and 300 AI experts, 
  • Data resources covering all European languages and code, including High-Performance Language Technology (HPLT) data, and other collected and curated data, and 
  • Access to compute, including software infrastructure to train LLMs and access to the LUMI supercomputer and other hardware and cloud services for LLM training.

In addition to a world-class team, the consortium has access to, and experience with, the LUMI supercomputer, which as one of the European High-Performance Computing (EuroHPC) undertakings is the third largest supercomputer in the world and the largest in Europe. Having built LLMs on LUMI for more than a year, the team has developed a distinctive software layer for training LLMs effectively and efficiently on the AMD-based hardware. As part of the EU-funded HPLT project, the data for this initiative has been collected and curated since early 2022 to provide a representative basis for LLM development. Combining all of this with a total of 15 million GPU hours, Silo AI and TurkuNLP are uniquely positioned to train a family of language models, including the world's largest open LLM.

Sampo Pyysalo lähikuvassa.

University Research Fellow in data analysis Sampo Pyysalo is the principal investigator of the High Performance Language Technologies consortium at Turku.

“LLMs are rapidly reshaping how we access information and interact with technology. As their impact grows, it is increasingly important to assure that the models are developed in a transparent and reproducible manner and made openly available to ensure accountability and equal access to the technology. From a European perspective, it is also critical that models are designed from the outset to prioritize multilinguality and an equitable approach to all languages. The High Performance Language Technologies (HPLT) project is addressing these goals through the creation of open European data resources and language models and delighted to partner in this consortium with SiloGen and Silo AI, an industry leader with shared goals", says Sampo Pyysalo, University of Turku Research Fellow and HPLT principal investigator..

Peter Sarlin lähikuvassa.

Peter Sarlin is the CEO and co-founder of Silo AI.

“We are honored to contribute to the development of open LLMs. The development of base models aligned with European values is imperative for our digital sovereignty. This initiative helps to ensure that underlying models are based on data and information representing the citizens and organisations of the region, and overall compliance with regulation, data privacy and other vital concerns. And eventually we need sovereignty on how downstream applications and value creation happen. This requires trusted and secure approaches to independent base models that enable fine-tuning for domain-specific needs. This way we can ensure digital sovereignty, while advancing technological development,” says Peter Sarlin, CEO and co-founder of Silo AI.

The TurkuNLP research group’s extensive experience in NLP and LLMs aligns with Silo AI’s and SiloGen’s commitment to contribute to world-class research on generative AI. The alignment, combined with the resources of the consortium, provides a robust foundation for redefining the boundaries of what is possible in the world of open source language models. Together with the LLM development platform, this opens a unique path for companies to create value using independent, trusted and secure base models with a possibility to finetune, instruct and control LLMs for domain-specific needs.

TurkuNLP research group
The TurkuNLP group of the University of Turku was founded in 2001 and has been carrying out research in natural language processing for over 20 years with a focus on machine learning applications to the automatic analysis and generation of text. TurkuNLP is the leading Finnish research group in large generative language models and one of the partners in the Horizon EU High Performance Language Technologies project, which is currently creating the next generation of European language models on the LUMI supercomputer.

Silo AI 
Silo AI is one of Europe’s largest private AI labs – a trusted AI partner that brings competitive advantage to product R&D. We build AI-driven solutions and products to enable smart devices, autonomous vehicles, industry 4.0, and smart cities. Silo AI provides its customers a unique access to world-class AI expertise, covering a team of more than 300 AI experts and more than 150 PhDs, as well as the Silo OS infrastructure to speed up AI development and deployment. Established in 2017, Silo AI is on a mission to build a European flagship AI company, with offices currently in Finland, Sweden, Denmark, the Netherlands and Canada.

SiloGen
SiloGen is a large-scale initiative with the aim of building generative AI technology for Europe’s digital sovereignty. It is gathering some of Europe’s leading generative AI and Large Language Model (LLM) experts, as well as access to data sources, powerful computational resources and infrastructure to train, run and operate LLMs. SiloGen has been operational since late 2022 and is currently working on its technology at full speed with clients like Allianz, Sandvik and Tietoevry. Its core focus is to improve downstream and domain-specific applications and to ensure companies can utilize trustworthy models for private, confidential and proprietary data. As a trusted provider SiloGen offers base and specialized models as well as a development suite to ensure accurate, trustworthy and robust downstream applications
 

Created 28.08.2023 | Updated 28.08.2023