[SR Talks] ④ Interview with a Natural Language Understanding and On-device AI Expert at Samsung Research America AI Center

Q: Can you please briefly introduce yourself, Samsung Research America (SRA), SRA AI Center, and the kind of work that goes on there? What project are you working on?

Samsung Research America (SRA) was founded in 1988 and witnessed Samsung’s history in the Bay Area. SRA is headquartered in Mountain View at the heart of Silicon Valley, with offices across the United States and Canada. SRA is at the forefront of cutting-edge technologies to create new businesses and developing core technology including artificial intelligence, 5G/6G, digital health, enterprise security, mobile innovation, and so on. SRA also plays a key role in providing the infrastructure to support Samsung’s open innovation and university collaboration activities. SRA aims to create new businesses and developing core technology to enhance the competitiveness of Samsung products to impact the future.

In SRA AI Center, we focus on building the next generation human assistant powered by voice interaction, even with limited device resources. I am Yilin Shen, the Head of Natural Language Understanding and On Device AI Lab, and leading the effort of natural language understanding (NLU) and on-device artificial intelligence (On-device AI) in SRA AI Center. On one hand, we develop fundamental machine learning technologies to enhance the language understanding capability and accuracy for existing Samsung products. On the other hand, we develop Samsung differentiation technologies to enable on-device intelligence on resource constrained Samsung devices and new user experiences such as multi-modal interaction, multi-device environment for future Samsung products.

Q: Please tell me the importance of your research field or technology.

Artificial intelligence (AI) is defined as the field of intelligent agents that perceives environment, takes actions that to achieve its goals. Many AI assistants emerge to improve user daily life, such as Samsung Bixby. NLU component plays a critical role in all these AI assistants. On-device AI introduces the differentiation of Samsung AI assistant from others.

Natural Language Understanding (NLU): Artificially intelligent voice assistants have been emerging in our daily life. Voice assistants provide a number of capabilities, in which NLU plays a critical role to understand all varieties of user utterances and carry out the intent of users. Recently, many large language and multimodal models emerged over the past several years, which are trained on enormous amounts of text data. In SRA AI Center, we leverage the pretrained language models to develop advanced NLU models that significantly improved the NLU accuracy and meanwhile gracefully reject unsupported user queries. We also develop multimodal NLU model to enable new user experiences for Samsung devices.

On-device AI: Most existing AI enabled services are cloud-based, which suffer from several disadvantages: long latency between data sent and response especially in real time applications; heavy dependency on Internet connection; leakage of user private and sensitive data. These disadvantages limit the AI ability to provide compelling user experiences. Therefore, on-device AI is highly desirable to enable always-available, responsive, and privacy-preserving intelligent services. In SRA AI Center, our research aims to develop on-device AI models that can run solely on edge devices with fast response, enhanced reliability, efficient bandwidth, and protected user privacy. We develop model compression technologies for the new mainstreaming Transformer model as well as lightweight Transformer mode architectures. Recently, we are further pushing the boundary to make AI models run on chip (namely TinyML), which will be largely helpful to save power consumption for always-on services. We are also working with universities to develop more fundamental on-device AI and TinyML technologies.

AI Generalization and Commercialization: A key challenge of AI commercialization and productization is the lack of generalizability of statistical AI methodologies, which is restricted to the narrow set of data they are initially trained on and their poor ability to generalize to unseen data. However, in practice, the real-world user data typically have different distribution from limited training data. Oftentimes, users have personalized request to satisfy their different needs. In both cases, although the model still performs well on the data that is similar to the “old” training data, it became dramatically less useful when it cannot have the correct prediction on real-world user data. In SRA AI Center, we developed deep learning and rule hybrid method for both natural language understanding (NLU) and natural language generation (NLG). We are also developing continual learning technology to adapt model to user needs without forgetting the previously learned capabilities. We are also working with universities to develop more generalizable AI models to tackle real-world data drift and concept drift scenarios in various applications.

Q: Can you tell me about the main achievement and rewarding moment in your research, or the episode?

Our research works have been published 70+ papers in many top-tier conferences and filed 35+ patents, along with multiple external and Samsung internal awards.

Natural Language Understanding (NLU): Our work CRUISE (Cold-Start New Skill Development via Iterative Utterance Generation) designed the world-first cold-start natural language generation (NLG) techniques to facilitate ordinary software developers without linguistic expertise. CRUISE reduced >50% human effort and meanwhile Improved skill NLU performance by 27% using automatically generated natural language. In addition, we also developed many more advanced NLU and NLG technologies and published in top-tier NLP and AI conferences, such as joint intent detection and slot filling model (NAACL 2018), user information augmented NLU model (INTERSPEECH 2018), NLU with the capability of learning out-of-vocabulary (IJCAI 2018), iterative delexicalization NLU (INTERSPEECH 2019), semantic response generation (EMNLP 2020), and so on. Our work CRUISE was awarded best demo paper nomination in Top-1 NLP conference ACL in 2018.

Combining natural language with visual data, we developed many more advanced language and vision multimodal technologies and published in top-tier AI conferences, such as HINT to leverage explanations to improve vision and language model grounding capability (ICCV 2019), visual language recommendation (KDD 2019) along with its attribute enhanced model (ACM MM 2019) and reward constrained model (NeurIPS 2020). We also participated 2019 ICCV challenge on “Linguistics Meets Image and Video Retrieval” and our team won the first place.

On-device AI: We focus on developing fundamental on-device Transformer models and our work has been accepted in top-tier AI conferences. We developed automatic mixed-precision quantization method (IJCAI 2021), weighted low-rank factorization for language Transformer model (ICLR 2022), lightweight Transformer architecture DictFormer with shared dictionary (ICLR 2022), lightweight multimodal Transformer architecture LiteMDETR (CVPR 2022), and so on. Our earlier work on privacy-preserving machine learning have been published in many top-tier AI and security conferences, including privacy-preserving personalized recommendation (ICDM 2014), private infinite data analysis (CIKM 2015), practical differentially private framework EpicRec (CCS 2016), secure neural network SAFENET (ICLR 2021) and so on. Our private infinite data analysis was awarded ACM CIKM Best Paper Runner-Up in 2015.

AI Generalization and Continual Learning: We developed various continual learning technologies and published in many top-tier AI conferences, including SkillBot that teaches agent to learn skills via user demonstration (NAACL 2019) and across apps (MobiSys 2019), progressive NLU model to continuously learn new slots (EMNLP 2019), generalized ODIN to detect out-of-distribution data (CVPR 2020), generalized NLU model with the capability of rejecting unsupported utterances (ACL 2021), data-free class-incremental learning (ICCV 2021), hyperparameter-free continuous learning for NLU (NAACL 2022), and so on.

More importantly, our developed technologies are commercialized in various Samsung products. Our advanced NLU models have been applied in Samsung AI assistant for all Samsung devices.

Q: What is your vision for the future and the goal you want to achieve?

Located in the heart of Silicon Valley, SRA is at the forefront of cutting-edge technology that impacts the future. SRA AI Center’s vision is “Develop next generation human level AI agent that can approach human level intelligence, even with limited brain capacity”.

We are envisioning that future AI agents can be equipped with knowledge and cognition to intelligently react, adapt, and respond to human commands and interactions under real world contexts. Future AI agents can leverage knowledge and explainable AI to enhance visual and voice understanding capability without the need of large training data, and therefore improve user engagement and trust with AI agents. Also, future AI agents can quickly learn and accomplish various tasks in various environments without repeated human teaching and involvement (e.g., autonomous robotics navigation, autonomous robotics manipulation, etc.).