[SR Talks] ⑥ Interview with a 3D Computer Vision and Augmented Reality Expert at Samsung R&D Institute India-Bangalore

Q: Please briefly introduce yourself, the Samsung R&D Institute India-Bangalore (SRI-B), and the kind of work that goes on there. What projects are you currently working on?

I am Lokesh Boregowda, Head of the AR Vision Labs at Samsung R&D Institute India-Bangalore (SRI-B). I am a Computer Vision and Image Processing engineer with a doctoral degree (obtained in 2002) from the Indian Institute of Science, Bangalore. I have more than around 25 years of total experience (more than 20 years in the industry and more than 5 years in academia). I have been with Samsung for the past eight years nine months.

SRI-B is one of the largest Global Research Centers (GRCs) led by Managing Director Dipesh Shah. Having celebrated its 25th year last year, it has been the backbone of Samsung Electronics in multiple Centers of Excellence (CoEs), such as Communications (i.e., 5G, network), Multimedia (i.e., camera, vision, augmented reality [AR]), Intelligence (i.e., voice, on-device, data), Internet of Things (IoT) (i.e., Cloud and IoTivity), and Services (i.e., server technology, Ad PF). We have been spearheading the Samsung business in designing, developing, and commercializing smartphone solutions utilizing the deep expertise in the CoE areas listed above. Over the past 26 years, SRI-B has transformed itself from a software service center in the 1990s to an Advanced R&D center in the past 6–7 years, providing global product differentiation through cutting-edge solutions and services, as well as Indian business enablement through local insights and partnerships.

Since joining Samsung, my deepest thoughts and convictions have driven me toward the fact that the multi-camera setup will be the future of camera and multimedia solutions. Besides this, the possibility of providing realism in solutions that can be brought about only through the inclusion of depth as a key entity in vision and multimedia motivated me to pursue deep research into stereo cameras and their usage to provide real-life, 3D effects. Many successful advanced proposals, which have floated in this direction and were executed with the business focus, have brought me to today’s wonderful world of AR. Today, I proudly lead the AR Vision Labs at SRI-B, wherein cutting-edge research has gone into creating many vision solutions, such as the virtual shot, panorama, motion panorama, and the AR Zone in Samsung smartphones. This progress over the years has resulted in the development of AR solutions such as AR Measure, AR Doodle (i.e., 3D Space drawing, 3D Text), and AR Canvas (i.e., object picking, placement, and anchoring). The solutions were made possible using expertise in 3D vision multi-view geometry and advanced rendering competency, thereby enabling consumers to draw in a 3D virtual space.

Ongoing research and development (R&D) include the design and development of AR vision engines for smartphone AR solutions, graphic rendering for realistic depiction along with shadows, occlusion, real-life material effects, and particle/physics effects, enabling the consumer to unleash their imagination in a 3D space. Added to this is the new dimension of gesture recognition (i.e., body and hand gestures), which will provide a new modality for immersive human-centric interactions. Solving challenging problems related to 3D vision, depth and Digital Human Avatar (DHA), we have kept the urge and fire to deliver consumer-centric solutions enabling true realism.

Q: Please tell me about the importance of your research field or technology.

The 3D vision and 3D graphics competency, which have been built in the AR Vision Labs, is the key to creating futuristic 3D realistic solutions that can provide an immersive experience to the end consumer. The AR Canvas is the seed for building large-scale Mixed Reality (MR) / Extended Reality (XR) experiences. Further bringing in the human factor using body and hand gestures as additional modalities along with the voice will pave the way for the creation of multimodal experiences similar to human interactions. These are key component technologies in the latest megatrend in the Metaverse, wherein virtual interactions and DHA-type technologies are the basis for providing an increased immersive experience. Given these and the fact that several challenges are being solved by our expert members, such as object–object interactions and human–object interactions, which are enablers for building XR experiences, our team is at the forefront of delivering key solutions in AR–MR–XR space. The AR Graphics experts in our team are now solving the next level of technical problems related to physical-based rendering, various material effects, physics effects, particle effects, illumination, shadows of objects, and occlusion of objects, which are critical to enhancing rendering realism. The base technologies include simultaneous localization and mapping (SLAM), head/hand/eye tracking, spatial understanding, stereo correspondence, perspective distortion removal, 3D machine learning, etc., which are relatively complex challenges given the tight performance metrics that must be adhered to in case of embedded devices.

Overall, all the aforementioned subcomponent technologies being designed and developed at AR Vision Labs are expected to deliver cutting-edge, first-of-their-kind, immersive MR experiences to end users for indoor / limited mobility scenarios.

Q: Can you tell us about the main achievement and most rewarding moment in your research?

One of the most fulfilling moments was the day my first solution, Virtual Shot, got commercialized in Samsung Flagship phones. It was my dream to deliver something new and different for end users, and Virtual Shot had all the components of being a totally new, first-of-its-kind 3D vision feature. The trials and travails undergone during the design and development of this solution were quite exhilarating.

However, my greatest moment came when I led the design and development of multi-view geometry pipeline along with a rig setup using multiple smartphone cameras , which was established using 2–3 years of in-depth 3D vision R&D for exploratory work and future research. This is a unique setup using 100 S20 smart phone cameras. What made it even more special was the fact that it was built amid the pandemic, between the first and second wave of COVID-19 in 2020, overcomingseveral logistical and processing challenges, such as dealing with vendors, gaining approvals, and even spending nights and weekends at the office when majority employees were working from home. I really appreciate my team of five members, led by Sujoy Saha, who stood by me in spite of many roadblocks, through the full journey of building this rig. We successfully completed the setup and tested by preparing 3D reconstruction of various objects.

Q: What is your vision for the future, and what goal would you want to achieve?

I foresee a great future for the current series of exploratory research into 3D vision and AR happening across multiple GRCs within Samsung Electronics. The combined efforts from all the GRCs are bound to enable Samsung to spearhead technologies and solutions, which are key to the Metaverse in the coming decade. Together, all GRCs can be a great force to be reckoned with if we join hands and collaborate to form a global XR team and harness all the collective expertise and knowledge toward the aforementioned objective.

As an expert in 3D vision and AR, I strongly feel that the best is yet to come with respect to the creation of the ultimate immersive XR experience delivered to the end user, with the greatest 3D realism. This is possible only through the amalgamation of multiple modalities, such as vision, gesture, voice, touch, and inertial sensors, creating the human-like, human-friendly, near natural 3D experience on smartphone or other smart head-mounted devices (HMDs), such as AR glasses. Imagine a realistic, parallel world full of customizable virtual locations where you can interact with any virtual entity, while the real you is in the real world assimilating the virtual experience using your own DHA, which can talk, act, and behave exactly like you, through any suitable HMDs.