LUSA 11/20/2024

Lusa - Business News - Portugal: Final version of Portuguese large language model launched in 2026

Lisbon, Nov. 19, 2024 (Lusa) - A planned large language model (LLM) of Portuguese artificial intelligence (AI) is to be called Amália and its final version will be launched in 2026, the CEO of the Centre for Responsible AI told Lusa in an interview. 

On 11 November, on the opening night of the Web Summit, the prime minister, Luís Montenegro announced the launch, in the first quarter of next year, of an LLM in Portuguese.

The project involves the Centre for Responsible AI, of which Paulo Dimas is CEO, and the research centres Nova FCT and Instituto Superior Técnico.

The first version "will not be a perfect version" but rather "beta, initial, to start getting feedback and, over time, it will be improved," said the centre's CEO, Paulo Dimas, adding that it is "a nineteen-month project."

The final version "won't be released until 2026," he added.

The three fundamental points of this project are the language variant - Portuguese from Portugal - cultural representativeness and data protection, he points out.

Dimas emphasised that, as the prime minister said, the model will be ready "in the first quarter" of 2025.

"We're going to be working on top of work already done by these research centres: so there's been work in this area for several years, both in the area of data for the Portuguese language, work done by the research centre of the Nova Faculdade de Ciências e Tecnologia (FCT), there's also work done at Técnico [and] there's also work that's going to be transferred from Unbabel's side, because of all the experience” that this Portuguese tech company “has in creating multilingual models and models that are currently being trained on supercomputers,” he said. 

In short, he stressed, "the team that will be working on the creation of this LLM is a team that already has many years of experience in this area." 

On top of this work "it's possible to deliver this LLM in the first quarter" and "to this is added a very close collaboration with the Foundation for Science and Technology, which has created the conditions in terms of computing" essential for this type of large-scale modelling, he added. "And the Foundation for Science and Technology has been investing in computing capacity that will be used here [since] in practice we're going to use... a computer that's in Barcelona, but part of it is Portuguese."

In other words, "we have a Portuguese computer that is physically in Barcelona, but a percentage of it belongs to the Portuguese state," he summarised.

Now, if "we were training this, for example, in a Microsoft, Google or Amazon cloud, it would cost a lot, but as we'll be using this national resource, it will be done in a much more efficient way from a financial point of view," he explained.

Asked what the Portuguese LLM represents for him, Dimas described it as "a key part of the national artificial intelligence ecosystem" because "on top of this LLM it will be possible to create new artificial intelligence applications where the Portuguese language is preserved, where we have control over the Portuguese language." 

Dimas, who is also vice-president of innovation at Unbabel, gives the example of a product that he considers one of the "most emotional" he has developed in his professional life, Halo.

Developed by the Unbabel team, this project allows "patients suffering from Amyotrophic Lateral Sclerosis [ALS] to regain their ability to communicate," having lost the ability to write and speak because of their general muscular disability.

"The only way they can communicate again with the people they love most, with their family, with their carers, is through alternative and augmentative communication technology," he said. "Using artificial intelligence, we've managed to clone patients" voices [and] we're already working with ALS patients who have started to speak again."

However, "this speech results from text that is often produced in the variant spoken in Brazil," is "not at all natural" for people who are Portuguese.

But as soon as "we have Amália, which will be the name given to the LLM, a name inspired by a very important figure in our history [fado singer Amália Rodrigues], we will be able to control what is said in these conversations," he stressed. 

In this way, patients will be able to speak in Portuguese spoken in Portugal and this "is a fundamental piece," but more than that, "it's a transversal piece for the Public Administration," he said, explaining that "we can, for example, work on this model in the area of education and have our children learn in schools with a personalised tutor who knows the national educational curriculum." 

In short, the use of LLM Amália "is completely transversal," he said.

On the other hand, "it gives us technological autonomy, it allows us to improve the model over time, particularly in terms of introducing the multimodality system, which is to add images as well, and then in the future, possibly speech as well."

This, Dimas stressed is "a national technological resource that cuts across all areas of our society, research and start-ups [and] it's also going to be an important piece for start-ups. She won't speak at first [but] we have Amália writing correct Portuguese, Portuguese spoken in Portugal and a basis for such cultural representativeness [and] definitely getting to know more about Portuguese culture.”

The LLM Amália will also play a "very important role" in the civil service, from education to innovation and for the "development of artificial intelligence in Portugal." 

A "very important" partner in this initiative "will be the Agency for Administrative Modernisation, AMA," because it will be the way to "transpose this LLM, this technology, to the Public Administration," Dimas said. 

Basically, "it's an example of a partnership that brings together research centres and the Public Administration" and that "also draws on the know-how developed in national start-ups like Unbable," with the Centre for AI responsible for driving these collaborations, he added.

 

ALU/ARO // ARO.

Lusa