Interview

Time-of-flight (ToF) image sensor for mobile phone applications revolutionizes mobile entertainment content with its capability to accurately capture not only figures and backgrounds, but also body gestures

February 22, 2022

A image sensor based on time-of-flight technology (ToF image sensor) uses a laser beam to accurately measure the distance to the target object. It is already used in some high-end mobile devices for photography features such as auto-focus and blurring effect. While it enables the camera to capture objects and backgrounds as well as movements of body parts, it has struggled to penetrate the popularmodel devices for application.
Sony Semiconductor Solutions Group (hereafter “the Group”) faced the challenge and sought for a solution in developing ground-breaking apps for the ToF image sensor for mobile applications. A large-scale project was launched, connecting teams in Japan and four Chinese cities—Shanghai, Beijing, Shenzhen, and Chengdu. We asked what the project aimed to achieve and how the apps were created over the great distances.

Profile

H.Kamano
Sony Semiconductor Solutions Corporation
Mobile Sensing Business Division
Main line of work: developing new businesses opportunities for ToF image sensors for mobile applications
Responsibilities: application businesses at headquarters
H.Doi
Sony Semiconductor Solutions Corporation
System Solutions Business Division
Main line of work: development of PoC, SDK, and recognition processes
Responsibilities: SDK development and app technical supervisor
Ivy Yu
Sony Semiconductor Solutions(Shanghai)Limited
Business Development Division
Main line of work: investigation of apps for ToF image sensors for mobile applications, researching use cases, and coordinating with app development company
Responsibilities: development of businesses opportunities for ToF image sensors for mobile applications
Pop Zhang
Sony(China)Limited
Creative Design Team 2
Main line of work: creative & communication design
Responsibilities: design producer

The ToF image sensor, capable of recognizing figures, backgrounds, and even gestures, comes with great potential for diverse applications

What is a ToF image sensor?

Kamano:ToF, short for “time of flight”, is the technology to obtain depth data with high accuracy to measure distances to, say, a person or an object. Typically, it involves laser and sensing devices, and there are several types of sensing systems.
The iToF, or indirect ToF, is one that emits a laser beam toward an object and catches the reflected beam, then calculates the difference in the phases of the emitted and reflected lights to obtain the distance to that object.
*In the text below, “ToF image sensor” refers to the iToF image sensor.

What are ToF image sensors used for?

Kamano:The sensors are used in mobile devices such as smartphones, automotive sensors, and drones.
Its scope of application is widening today into e-commerce, AR effects, and other areas because it can facilitate clearer imaging of spaces and objects, on top of simply obtaining depth data.
For example, it is used in smartphones for the photography features of auto-focus and blurring background, while in the automotive applications, it is used for hand-gesture recognition.

Doi:Because the ToF image sensor has an excellent recognition capability for hand gestures, when applied in automotive applications, it can facilitate the operation of devices by hand gestures, like turning a hand clockwise or anti-clockwise to control the audio volume.

What were the issues the ToF image sensor faced despite its high recognition performance?

Kamano:While the contexts were steadily growing for leveraging the technology, there were no definite killer apps for it which people would put to everyday use. This situation resulted in a chicken or egg situation, that smartphone manufacturers were not keen to integrate ToF image sensors for the lack of killer apps while app developers had little incentive to develop apps for it because it was not adopted in many smartphones.
Given this situation, we thought that we should encourage the development of apps that leveraged ToF image sensors to incentivize both smartphone manufactures and app developers.

Doi:There were also obstacles from the development point of view. Laser emission increases power consumption, and so does depth sensing and processing. For the smartphone manufacturers, it also means more space needed to accommodate the sensor. There are, of course, additional advantages ToF image sensors can bring, but these advantages did not add enough value to extend the scope of application to all smartphone models. This resulted in the current situation that the sensor is installed in some high-end models, but not in other, more popular ones.
The camera features such as auto-focus and background blurring do not necessarily need a ToF image sensor. Such features may rely on the contrast information in the RGB image to focus on the target object or AI processing to delineate a person in the frame and blur the background. Some people think that such AI technology is sufficient, and this is partly making the situation harder for the ToF image sensor.

Kamano:That is true, but we have smartphone manufacturers who are interested in integrating the ToF image sensor if there are interesting apps to use it. This was our incentive to take up the challenge and develop apps in order to topple the first domino piece to establish and expand an app market for the sensor.

Smartphone manufacturers and app developers—do they think that AI processing is sufficient for their needs?

Doi:I think that they are not necessarily against ToF image sensors. There was this customer who recognized the value of depth sensing and immediately adopted a camera with a ToF image sensor to their smartphones, and we provided them with an SDK (software development kit)*1 to help with their app development for this sensor. However, the app development saw little progress because there were not enough smartphone devices with ToF image sensors. After that, they switched their approach to employing standard cameras and AI processing for most things. Meanwhile, a different customer has adopted the ToF image sensor. They develop their devices and apps altogether through a vertically integrated production. This allows them to decide every detail themselves, from the selection of cameras and sensors to be mounted on their smart devices to the designs of dedicated apps. In this way, they simply try to make the most of the ToF image sensor.
There is no doubt that our depth sensing technology has clear advantages in AI processing. It is therefore important for us to continue pursuing its value and enrich our know-how as well as a pool of app developers.
*1: Software development kit

Developing apps that cannot be realized without a ToF image sensor and that are fun to use

What did you work on to develop the apps?

Ivy:Our team was responsible for the planning, creative design, and development of SDK while our app development partner undertook the release phase, and we worked together. At the beginning, our idea was to create apps that were fun to use so that many people could benefit from the use of ToF image sensor.
To do this, we started by educating ourselves, with Doi-san and engineering staff in China, about the technical aspects of the sensor, to know what it could do. Then, we researched user trends thoroughly on various platforms, such as TikTok, to find out what was popular and what people were interested in.

Pop:Our design team also spent a lot of time at the beginning, studying what ToF image sensors could achieve. It was particularly important to know the best ways to leverage the sensor’s unique characteristics. We thought that the app development should be based on such a knowledge.
Another important point was knowing our target users. To come up with the apps appealing to the users, we definitely had to know what kinds of apps they liked and in what ways they enjoyed using their apps.
Luckily, our Design Center in China has an adequate development line, and we aimed for creating one-of-a-kind apps that overcame most technical challenges by making the most of the facility to test as much as possible.

How was the project managed between Japan and China?

Kamano:Given that many of the target smartphone manufacturers for this project were China-based companies, the planning phase was mainly led by local talents, such as Ivy-san and Pop-san. Doi-san worked very closely with the development team in China particularly to develop ideas for the effects to be achieved through the apps.

Doi:What happens is that our development team in Japan builds programs that are necessary for the app development and packages them into an SDK, and in China, the actual development will be undertaken by a specialist company. Due to the various restrictions imposed by the global situation of today, it was inevitable that all communications had to take place online. I thought it would be difficult to give technical support to this company directly. So, I made an arrangement so that a technical team was set up at the Design Center in Shanghai and I taught them technical details. The idea was that the members of this team would then provide the app developer company with necessary technical support. The overall progress of the project was managed from Atsugi, Japan, using a task management app, and I would only get involved directly when there were issues they could not resolve. It was my intention to minimize the stress from the remote communication on the development team in China.

Were there particular difficulties in collaborating entirely online, without once visiting the site?

Doi:I guess so. For some unknown reason, I came across as a scary man to Ivy-san (chuckle). I was simply giving technical advice, but she felt that I was always angry with something as I was saying “don’t do this” or “do it more like this.” There are some difficulties in online communication that the mood in the room is not easy to understand on the other side. I also felt the stress of miscommunication which would not happen in face-to-face meetings.

Ivy:Oh, I’m sorry, Doi-san (giggle).
For me, it was very hard trying to understand things he told us because I wasn’t so well-versed in technical stuff. I worked very hard to understand him correctly, otherwise our colleagues in China would not be able to solve their problems. It was a difficult challenge to understand technical issues.
As he mentioned it, I also felt it difficult to ensure zero miscommunication in our online meetings as the project developed simultaneously across five cities: Shanghai, Shenzhen, Beijing, Chengdu, and Atsugi.

Kamano:The effects we try to develop manifest in moving images, but the ideas are communicated and shared in still images until an actual demo is created. So, until then, it is up to each member to fill the gap and imagine how things move in those still images. Then, it is impossible that everyone has exactly the same idea about it.
Once the demo is created, everyone can see the details and give their opinions about some particular aspects, bringing the project forward. So, it was quite often the case that our in-depth discussions had to wait until we saw the actual moving images.

Pop:It was my first time to be involved in a project like this. So, to start with, I gathered people from different walks of life and held a workshop. In the workshop, I gave a presentation on what ToF image sensors could do by showing them some conceptual graphics, and we tried to form ideas.
I then prepared draft versions based on these ideas and created imageboards to make proposals. After this, I used Photoshop to develop storyboards and made it easy to understand the movements in each effect more specifically.
Much of my efforts went into these storyboards, trying not to miss out any detail in each frame, so that others would not imagine something else from these and get confused when they saw demos.
In terms of the difficulties in communication, I can point out two aspects.
One is the languages. We had to use three languages in one meeting, Chinese, Japanese, and English. This situation was prone to miscommunication.
Another is the cultural differences. When I found some buzz words or popular jokes on social platforms such as TikTok, these were in Chinese and very difficult to translate into Japanese or English, partly due to the differences in cultural perceptions or customs, and other members could not understand the funny side.

What was the most difficult part in this app development?

Pop:The most difficult part was to understand the technical details. Doi-san provided us with various materials, but it was still difficult to visually understand what kinds of effects could be realized or the technical limitations of the time-of-flight technology. We went through trials and errors over and over again.
Another difficulty was to distinguish between the effects realized through the ToF image sensor and those of the imaging through the RGB color model. I tried hard to create effects that leveraged the advantages of the sensor, but my ideas tended to be something that the same effects could be reproduced by the RGB model. Doi-san had to point out so many times that the same could be achieved by the RGB color recognition and AI processing. So, it was really challenging to come up with ideas for interesting effects that could only be achievable by ToF image sensors, making the best use of the accurate ranging of target objects and three-dimensional mapping.

Doi:I set high standards for the apps from the planning phase particularly because it would not work if the apps could not make the advantages of the ToF image sensor shine. I think the colleagues at the Design Center in China had to work very hard to absorb the technical knowledge, without which it would be difficult to distinguish what could be done by the ToF image sensor and not by the RGB color recognition and AI processing.

Ivy:At the Design Center, we spent about three months to learn the technology. I think we could not have achieved apps to make the best use of the sensor’s unique properties without having done the learning.

Apart from the technical understanding, this project also required that the apps were fun to use.

Doi:Yes, but the ideas proposed by the members of the Design Center were basically interesting. So, the main determining point was whether they were based on the ToF image sensor’s unique attributes.

Pop:I had more than 30 ideas for the apps. I then selected about 25, which I developed into storyboards. Five of them made to the release in the end.

Did the app development go smoothly?

Doi:We had some problems. The program development involved the recognition of finger-snapping gesture, which allowed the app to switch the background image. The first internal release did not work well in recognizing the gesture. The problem was basically due to the ambiguity involved in defining the finger snapping movements. The recognition is realized by AI that has learned specific hand movements to recognize finger snapping by feeding many moving images of this gesture.
I was undertaking this learning process by myself, but I did not realize at the beginning that different people had their own ways of snapping fingers. I would do it with the palm facing up and the middle and index fingers pointing up, and make the snapping sound with the middle finger and thumb. There are people who do it differently. Some people have the index finger curled in, and others do it with the palm looking downward. The initial model seemed fine in Japan, but this was because the people around me happened to have the same snapping gesture as mine. When the teams in China tried it, we found out that their snapping gestures were quite different.
So, I went back to the learning process and tried every snapping gesture possible, changing angles, fingers, etc. and recorded them over and over again, until my finger joints became swollen like balloons.

Depth information would allow many creative people to enhance their creativity not only in photos and movies, but potentially in more areas, such as music and other expressions of their emotions

What do you have in mind about possible future applications for ToF image sensors?

Kamano:I think the keyword is “bridging between the real and virtual worlds.” Today for this interview, Doi-san and I met up and sat in the same room. We did this because there is a lot more that can be communicated in meeting in person, including the passion and nuances. One day, when these aspects of communication can be conveyed and understood over online meetings, there will be a great deal of possibilities to enhance communication beyond temporal and spatial obstacles. And for this to be realized, three-dimensional recognition, as opposed to two-dimensional, should be a crucial part, and I believe that ToF image sensors will be a key player in the endeavor.

Doi:I currently work on the development of apps for VTubers. The apps available today use the face recognition technology to move avatars. The app we are currently developing will be superior to those because the sensor recognizes not only the facial expressions, but also the movements of the hands and arms, so the avatar can make more postures and their facial expressions will be richer. The expressions of emotions will certainly be enriched as movements of more parts of the body can be recognized, like the hands and torso. So, emotional expressions will be one of the areas for the ToF image sensor to be working on.

Ivy:The metaverse*2 is very popular in China at the moment. People use this virtual space to communicate with each other or operate their businesses. I think that the ToF image sensor would have a great chance to use its technology in this direction. I believe that the sensor will increase its additional value by facilitating more effective interactive communication for users by combining it with AR and other technologies.

*2: Refers to a three-dimensional virtual space created in the cyberspace and its related services.

Pop:So far, the depth information gained by the ToF image sensor has been deployed in visual effects. If the scope is expanded in the future to audio and other modes of emotional expressions, this would allow totally new expressions, and there is a good chance that more and more people want to use the sensor to create their unique ways to express their emotions.
For example, many artists are keen to use depth information in the audio AR environment, to know the best position in the sound field to place the sound source.

What will be your next challenges?

Kamano:I believe that our lives will be more convenient if people, things, and spaces are recognized in detail and the data about them are used effectively. I would like to be part of initiatives to enrich people’s lives through the use of ToF image sensors with other imaging technologies, platforms, and content that Sony can offer.
Our customers, too, often mention that they want to see “something more interesting” through combinations of Sony’s various technologies. I take it as their high expectationfor us, and it is a realistic possibility that breathtaking new ideas come into being through collaborations between our technologies and creators across the world.

Doi:I think it is very important that many people have the chance to experience the technology and use their imagination. We must develop and release easy-to-use SDK. The more developers use it, the more technical data will be accumulated online, and I believe this is how an ecosystem will emerge for the development of apps to leverage depth-sensing technology.
It is like in the case of smartphones, where there were only a few apps developed in the early days, but with the maturity of the development environment, more and more businesses and creators started developing and releasing various apps. It is the same story for AI, that the open SDK was made available, and the progress in AI development and deployment gained impetus all at once. So, my challenge will be the development of a good SDK that allows people to utilize depth information easily with their smartphones so that our sensors will become closer to many people.

Ivy:Similar to what Doi-san just said, I would like to work on the development of a depth technology community. Today, many artists are interested in the potential of this technology. So, I would like to contribute to enriching it and expanding the circle of people fascinated by it.

Pop:I would like to significantly improve modes of expressing the advantages of ToF image sensors. I also hope to see the sensor expanding the scope of its application beyond mobile devices. Smart TV, for example, is very popular in China today, and ToF image sensors will significantly enhance what it can do. I would love to explore these possibilities in the future.

  • facebook
  • twitter
  • linkedin
to PageTop