Add killer function to snatch the market? The Ali speaker is afraid to walk early

(Original title: add killer function to snatch the market? Ali speaker this step is afraid to go early)

As expected, Ali still released a smart speaker.

From Amazon's inadvertent three years ago to the arrival of Ali today, the explosion of the smart speaker market was unexpected, but it did happen.

Echo has cumulatively sold nearly 20 million units. Google, Microsoft, and Apple followed suit. In the following period, domestic software vendors, hardware manufacturers, and content manufacturers competed to enter and get busy.

Until the beginning of this month, Ali officially released the smart speaker Tmall Elf X1, both unexpected and reasonable, making the voice of the entrance battle for Ali to enter the game becomes more interesting.

In fact, the day before the launch of the $499 Lynx Elves, Lei Fengwang once wrote an article titled [Why China's Echo is not yet available, and tomorrow's AI's new product can bring surprises].

So, what are the surprises that Alibaba's smart speakers offer in comparison to other similar products?

The bright "surprise"

Prior to the media said that for this smart speaker, Alibaba even Ma will put billions of dollars in the Pepper robot project to stop, the staff split into the artificial intelligence laboratory. The products that cost a lot of money seem to be essentially indistinguishable from those of Echo and other speakers. The functions included broadcast music, takeout, weather check, alarm clock, and smart home appliance control.

According to the selling point of Tmall Elf's publicity, it is very important that even Echo does not have the voiceprint recognition function.

Ali said that through the voiceprint recognition technology, the speaker can distinguish everyone in the home, and according to everyone's preferences and set push different content, currently can identify up to 6 individuals; in addition, the user can also through their own voice Complete the shopping payment verification process. Echo, in distinguishing the identity of the person, still needs to obtain user's personal information through further operations.

Let Lei Feng Wang curiously, so cool features why Amazon has not been used in Echo.

It is reported that Amazon had long wanted to apply this technology, but according to Amazon employees, from the feedback of hardware and software companies in the field of voiceprint recognition, it seems that it is better for these voice control devices to identify different users' voices than they think. It's a lot harder.

"As the equipment needed to remove noise, echo, reverb, making it difficult to hear the person's identity vocal identification." Vineet Ganju, vice president of Conexant voice department said.

So does the Tmall Elf with voiceprint recognition really support this selling point of its key appeal?

I see hanging.

Why does voiceprint recognition hang?

First, from the aspect of voiceprint recognition algorithm, Dr. Chen Xiaoliang, the founder of Shengzhi Technology, said in an interview with Leifeng.com that voiceprint recognition is still a relatively narrow discipline and has relatively few applications. Nowadays, most of the researches are related to dynamic real-time detection. The dynamic detection methods naturally use various principles of static detection methods. At the same time, many other algorithms need to be added, such as AD, noise reduction, and dereverberation. The purpose of the VAD is to detect whether it is human voice, noise reduction and dereverberation is to exclude environmental interference.

VAD commonly uses two methods, based on energy detection and LTSD (Long-Term Spectral Divergence), the current use is more LTSD, in addition to feature extraction also needs: dynamic time warping (DTW), vector quantization ç‰â•’Q), support vector Machine model VM), the model needs Hidden Markov Model (HMM) and Gaussian Mixture Model (c/oMM).

It is not difficult to see from the above model that voiceprint recognition is still a problem based on data-driven pattern recognition because all pattern recognition problems exist. There are also some physical and computational problems that are not well resolved.

The uniqueness of voiceprint recognition is very good, but in fact the existing equipment and technology are still difficult to make accurate resolution. In particular, the voice of a person is also volatile and susceptible to physical conditions, age, emotions, and the like. If the environment is noisy and mixed with the speaker's environment, voiceprint features are also difficult to extract and model.çŠ— é”¥ é”¥ é”¥ é”¥ ä¾—é¹„ç ºé„„ ä¾—é¹„ç ºé„„ ä¾—é¹„ç ºé„„ ä¾—é¹„ç ºé„„ . . 2.

Chen Xiaoliang believes that deep learning brings about a great improvement in pattern recognition, and there are even open source related algorithms. However, the research progress of voiceprint recognition is still small, which is still constrained by the acquisition of voiceprints and the establishment of features.

Dr. Chen Dongpeng, senior scientist of voiceprint recognition provider SpeakIn, said that from the standpoint of voiceprint recognition, it is susceptible to various influences in the real environment, including: noise problems, speech of many people, physical condition, emotional influence, etc. It is really tricky now. Some companies, including them, are also making great efforts to optimize these common problems in the industry through software and hardware algorithms. Under the deep learning support, the entire industry has also made progress faster than ever. Dr. Chen added that the recognition of voiceprints is just a link, and the effect judgment needs to see factors such as the product itself and the usage scenario.

At the product level, the Himalayan who just released Xiaoya smart speakers expressed their views. Li Haibo, vice president of Himalayan, said that for the application of voiceprint recognition, the company has also tackled the problem for a long time, but it cannot be completely accurate. At present, it is only an experimental stage and the effect is general.

When talking about Ali Tmall Elf, he said that far-field speech recognition is usually effective within three to five meters, noise reduction is about 70dB, and ambient noise and acoustic sound are harder to wake up than this standard. Far-field voiceprint recognition is even more unstable under the same distance. At present, the living room, television, kitchen, and bedside are four common scenarios of smart speakers. Except for the headboard, the actual distances of the other three commonly used scenes are generally the same. More than three meters, so the specific practicality of Ali speaker voiceprint recognition is still not known.

As for why Amazon Echo hasnâ€™t used this feature yet, Li Haibo believes that the technology is not yet mature, although it is very dazzling but it is very risky.

In addition, Sensory's CEO Todd Mozer also believes that it is difficult to identify who is talking to far-end voice devices such as Echo. As the signal/noise ratio increases, the performance of the equipment deteriorates.

â€œThe process of denoising and separating speech from noise has a very big impact on the identification of users. So far, there is no product on the market that handles user identification, far-field speech and noise processing at the same time,â€ said Mozer.

And from the practical application of far-field voiceprint recognition point of view, Chinese Academy of Sciences Institute of Automation, intelligent voice algorithm limits yuan senior expert Liu Bin talked about his views to Lei Feng network. Dr. Liu said that far-field speech recognition is disturbed by noise, echo, and reverberation. Both speech recognition and voiceprint recognition are challenging.

At present, the reliable recognition distance of far-field speech recognition is about 3-5 meters; it is even harder for voiceprint recognition. Because the purpose of speech recognition is to understand the speech content in the speech signal, the speech content information is highly related to the resonance peak, and the resonance peak is mainly concentrated in the low frequency band. The speech signal has a low energy in the low frequency band and is relatively small due to external interference, and the speaker is related. wherein more concentrated in the high band, high-band speech energy is relatively low, more likely to receive the influence of various disturbances, and therefore more challenging distance voiceprint identification. He immediately said that because everyone's speaking characteristics will change with different factors, such as a cold when the pronunciation is certainly different from normal, so the near-field voiceprint recognition is still not guaranteed to be particularly mature, certainly not in the far-field conditions Easy to use. In general, for most users, the use of voiceprint recognition in smart speakers is not just needed. From a technical point of view, voiceprint recognition is not yet mature.

So, compared with far-field speech recognition, why is the more immature far-field voiceprint recognition technology being used by Ali in the speaker?

In addition to using this technology to meet the individual needs of users to seek differentiation to seize the market, Dr. Liu also mentioned that Ali's accumulation and advantages in the field of e-commerce, the application of e-commerce identity authentication is also a key direction of Ali.

Alibaba based on the huge resource advantages of Taobao, Tmall will be introduced to the sound of the shopping scene is not unreasonable, but from Amazon's previous application of this scene on the Echo to see, users use their shopping frequency is not high, the use of experience is not ideal.

Hu Yu, CEO of HKUST, said in an interview with Lei Feng. From the perspective of the entire market, shopping scenes are still very immature in speakers. The real demand must be used to meet the user's just-needed behavior. Although Echo is now selling well, after the investigation, it was discovered that the tools that users really use more are just setting a reminder, checking the weather, and so on. Before Amazon's vigorous push of Echo voice shopping function did not do it, when the user to use the form of voice interaction to buy things, you will find the various links and scenes inside it is very troublesome, it is not as easy to operate directly on the screen.

Therefore, this is why many companies have been stressing the importance of voice interaction and visual presentation. Because users don't have enough information to face visual presentation, it is very difficult to complete some complex operations at this time. Therefore, some functions and scenes were created by us on our own. We did not find that the user's thinking and behaviors were actually designed as products when it was actually put into practical use.

It can be seen here that if the user's habit of using the e-commerce function has not yet been developed, and the voiceprint technology is a problem, then the voiceprint recognition needs to be added to the e-commerce, and it is difficult for the visual inspection to stand the test of the market.

Overall, Aliâ€™s starting point for incorporating voiceprint recognition in smart speakers is very good: The functional marketing cards that neither Echo nor JD.com have, in the wave of homogenized products, use cutting-edge technologies to enhance competitiveness.

However, when the entire technology and market are still immature, Ali grafts voiceprint recognition in the speaker. This step is afraid to take a step earlier.

ç‡‘br>

PV Ribbon Intelligent Factory

Pv Ribbon Intelligent Factory,Solar Intelligent Workshop,Solar Welding Strip Production,Sunlight Solar Bus Bar Production Machinery

Jiangsu Lanhui Intelligent Equipment Technology Co., Ltd , https://www.lanhuisolar.com