Recently, speaker verification, which is under joint research by Pensees and HLT Lab of National University of Singapore, has set a world new record on the RSR2015 dataset. Compared with the current mainstream algorithms, this technology can still perform well in various comparison protocols with usage of 50% or less data for machine learning. This breakthrough demonstrates the two-way development of deeper and wider technology reserve by Pensees.
01
The Cool Tech of Recognition by Voiceprint
Speaker verification, also known as voiceprint recognition, is a technology of identifying a speaker by voice. It achieves identification of unknown sounds by analyzing characteristics of one or more types of speech signals. In short, it is a technology that identifies whether a certain sentence is spoken by a certain person.
As a type of biometric identification technology, speaker verification can be widely used in the fields of public security, finance, smart home and office, such as crime investigation, anti-telecom fraud, prevention and control of public security, identity verification, payment, access control, conference recording and etc. Compared to face recognition and fingerprint recognition, voiceprint collection requires only a microphone module, with fewer cost but more convenience and safety. In some special fields, speaker verification even has unique advantages.
In practical applications, speaker verification systems often require beforehand input of voice recording by users, and thus training with less data to complete such verification is greatly demanded under practical scenarios. In this sense, the new algorithm proposed by Pensees is of high practical value as it can achieve high verification accuracy with significantly reduced amount of data for training.
02
New Record Set on RSR2015 Dataset
Collected and published by the Information and Communication Research Institute of Agency for Science, Technology and Research of Singapore, the RSR2015 (Robust Speaker Recognition2015) database is widely used in speaker verification and other researches as one of the most mainstream large-scale speech databases. It is designed to provide relevant data resources for speaker verification, allowing the usage of different types of comparison protocols.
The most commonly used evaluation index in speaker verification is Equal Error Rate (EER). With adjustment of threshold, the false rejection rate (FRR) can be equal to the false acceptance rate (FAR). At this time, the values of FAR and FRR are called EER. In general, the lower the EER, the better the verification accuracy of the system.
Table 1 Comparison results in accordance with comparison protocols and trails [1] of RSR2015
Table 2 Part of the results of mixed-gender evaluation part 1 of RSR2015
Table 3 Part of the results of evaluation part 2 of RSR2015
Table 1 shows the evaluation results of four subsets of the first part of the database. TW (target wrong) refers to target speakers for voiceprint recognition with usage of wrong passwords; IC (imposter correct) represents imposters for voiceprint recognition with correct passwords; IW (imposter wrong) indicates imposters for voiceprint recognition with wrong passwords.
IC tops among the three categories above regarding importance. The new technology proposed by Pensees improves the accuracy of this category while taking into account the overall performance of the entire system.
Table 4 SV and UV performance of RSR2015 dataset
Speaker verification (SV) and utterance verification (UV) refer to performance evaluation methods for voiceprint recognition and password recognition, respectively. These two tasks comprehensively reflect the performance of the text-based voiceprint system on both tasks. Among them, the performance of the SV task is particularly important.
In all comparisons, the vast majority of algorithms use the background set and development set from RSR2015 for better results. Even in some algorithms, more data from other datasets are added for accuracy improvement. What differentiates the new technology proposed by Pensees from others is that this technology does not rely on these data. With very little training data, high accuracy can still be achieved. Relevant technical details will be introduced in the paper submitted to Interspeech2020 by Pensees and the HLT Lab of National University of Singapore. Please stay tuned.
03
A Breakthrough of Cutting-edge Technology for Intelligent Security Safeguard
Speaker verification has important application prospects and huge market demand in the field of public security, and has been the focus of research in the security industry.
As an AI company that focuses on computer vision and IoT, and provides people-oriented comprehensive application solutions, Pensees demonstrates the depth and breadth of the company's technology reserves with this breakthrough in speaker verification. For one thing, speaker verification is in line with the company's long-term vision in the field of intelligent security as new algorithm technologies and products shall be developed based on user's needs and pain points of the industry; for another, it enriches the technology reserves beyond computer vision, and improves the integrity of technical solutions, making preparation for development of industry and application scenarios.
In the future, the voice technologies of Pensees, including speaker verification, will gradually be applied to practical uses such as safe city, intelligent residential community, smart park, smart retail, and intelligent transport. With combination of computer vision and IoT, Pensees shall provide more effective and reliable products and solutions to push forward the industrialization of AI.
<The End>
Bình luận