In 2024, along with the new Galaxy S24 flagship mobile phone model, Samsung introduced Galaxy AI. This Artificial Intelligence powered technology package was aimed at helping users with every-day tasks, from translation of calls and transcriptions in chats and calls to editing pictures and searching with images. The Live Translation service was one of the most fascinating ingredients of Galaxy AI. Suddenly you could call any person from around the world and talk with them without even knowing the language they were speaking! The feature has been a huge success, and we decided to reach out to the Machine Translation Team in Samsung R&D Institute Poland (SRPOL) in Warsaw, which is responsible for developing translation models for Samsung worldwide to tell us how this new exciting technology came into being.
First, the Machine Translation team does not operate in void. To handle such a monumental project, one needs to cooperate with many other teams to create a robust pipeline for seamless productions of hundreds of models (of which only the best ended up on your telephone). The most important of these teams is probably the Data Acquisition Linguistics team, analyzing generated translations and producing training data to improve the quality of translations; there are also Quality Assurance teams, tirelessly evaluating model outputs and searching for errors to fix.
But let us start with some introduction on how Live Translation was created. Since we deal with human voice here, the team decided to separate speech recognition and generation from the core function of translation. The system therefore consists of 3 steps. The first step is the recognition of the caller’s speech and turning it into text. The second one is the translation of that text to the target language, and then, in the third step, it is transformed back to speech, but this time in the language of the second caller. The Machine Translation team was responsible for the translation parts, and the voice-related tasks were covered by other Samsung Research teams.
Another important factor shaping the final version of the service was the need for the system to act quickly and regardless of the quality of internet connection. That meant that software and models used for all the functions of the systems had to be put on the phones themselves, and not somewhere in the cloud. It made the whole task quite difficult as models that work on mobile phones have to be smaller and faster than the ones used on normal machines, so they are much more difficult to train.
At this point you are probably asking yourself what it means to train a model and what such model actually is. To start with the latter, a model is just a collection of numbers and functions to transform input (a sentence for translation) into output (translated sentence in another language). This transformation, similarly to all AI-related techniques, is learned – i.e. the model receives lots of sentences in one language and is told how to translate them, slowly getting closer to the ideal. This requires a lot of effort, mainly related to data preparation. Everything starts with data collection. The next step involves data cleaning and preprocessing. Low-quality training data causes many issues with translation and it needs to be removed. Then, we extract linguistic knowledge from the data by training a large translation model, called the teacher. The next step is to translate hundreds of millions of sentences with the teacher model. We clean the translated sentences and mix them with various additions, called augmentations. The teacher creates consistent training data for a much smaller student model. The augmentations enhance the student model with additional features like translating the Speech Recognition output or HTML pages (these are very different types of output and translating them is not an easy task). The last step after training the student model is compression, where the size of the student model is reduced to around 40 MB, with only a slight decrease in quality compared to its 1.3 GB teacher (notice the huge difference in model size!). At this point, the Quality Assurance team can take over the process and our role is to fix any issues they find.
All of it seems complex enough, but the team had also other challenges that they had to face. For instance, let’s look at Arabic dialects. Arabic is different from most languages in that its written form, called Modern Standard Arabic (MSA), is unlike any of the spoken ones, of which there are many (most of Arabic countries have their own dialects, significantly different from the others). The Polish team was responsible for the preparation of the Arabic translation model and it did not really have Arabic-speaking developers. Almost all available corpora are in the standardized, literary form of Arabic – MSA – but there was lack of the corpora for dialects. It was straightforward to prepare a good-quality MSA translation model. However, our goal was to cover selected Arabic dialects (KSA, UAE) as well. How to train a model without training data? The fact that there was no standardized, written data for Arabic dialects, which can vastly differ from MSA, made it even more challenging. Therefore, many people were involved in supporting SRPOL efforts. Samsung R&D Institute Jordan (SRJO) prepared a few million sentences that covered Arabic dialects. The Machine Translation team used almost every available method to improve the translation quality of low-resource languages. Dozens of teacher models were trained and more than one thousand student models were produced until the required quality was achieved. The combined efforts were enormous and in the end, the team developed one of the best models for Arabic dialects on the market.
It was of course not a coincidence that the Polish Machine Translation team was tasked with the ambitious goal of producing translation models for Galaxy S24. The team has acquired its competences over many years of hard work on large multilingual translation models for the Bixby Vision service. It collected lots of training data for more than 40 languages and prepared very efficient data preparation pipeline using its local computing infrastructure. This experience was essential when they started working on Galaxy AI. Similar techniques that were used for server models were now used for training teacher models. Data was updated and reprocessed, but the data processing machinery was ready for the task. The team has also been working on embedded translation models for Samsung Browser for a few years. Everything – knowledge, tools and resources – was in place and made the team prepared for the new challenge. Finally, the Polish team has established very good collaboration with Machine Translation engineers in Samsung Research and could quickly discuss any issue that needed to be resolved. The teams supported each other through the whole process. As a result, a relatively small SRPOL team was able to deliver a substantial number of languages released with Galaxy S24, including French (European and Canadian), German, Hindi, Italian, Polish, Portuguese (European and Brazilian), Spanish (European and Mexican) and Arabic (MSA, United Arab Emirates and Kingdom of Saudi Arabia).
We’ve already mentioned the important contribution that has been made by Data Acquisition Linguistics team, but the importance of linguists in any state-of-the-art AI translation project cannot be overstated. They are responsible for evaluating translations and spotting places where model capabilities can be improved; they validate corpora used for training, as there are often quite a lot of errors in them, which negatively influences the quality of models; and, last but not least – they meticulously create new corpora addressing problems that models have with specific phrases or grammatical constructs. It’s the dedication of linguists that kept the project afloat in the most difficult times.
It’s been a satisfying experience, members of the Machine Translation team say. Having your models on millions of the newest and coolest mobile phones around the world is something to be proud of. Being able to quickly and successfully answer to the needs of the market proves that SRPOL has qualified manpower and technology prowess allowing them to build most advanced solutions in the Artificial Intelligence world. And then, there’s also pleasure that comes with fulfilling people’s needs: people who travel a lot or communicate a lot with foreigners are given a reliable solution that works everywhere. It’s also true that supporting a new language makes a good impression on native speakers. It is like saying: “Samsung cares about you and your language.”
“Samsung Galaxy Unpacked was a great day for my team. We had been working tirelessly for many months to deliver the best models possible, so finding out our product was announced as one of the key features made us very proud. I know that it was the result of a combined effort of hundreds of people from all around the world. However, the translation models for around half of the languages came from Samsung R&D Institute Poland. SRPOL team started its adventure with machine translation in 2012. It has been a unique opportunity to contribute to the development of this remarkable technology throughout all those years. Researchers from various corners of the globe have diligently worked to elevate the quality of machine translation to its current standards. Watching this firsthand and actively contributing has been a fascinating experience for us.” – said Paweł Przybysz, Head of Machine Translation team in SRPOL.