Digital Hearing Aid Design Based on Chinese Speech Processing

At present, a hot topic in the research and development of hearing aids abroad is concentrated in China, specifically based on the study of Chinese language and speech, and the development of related speech recognition technologies and products. The center of Chinese listening is no exception. We already know that auditory science is a fast-growing, fast-changing subject. The subjects it studies are centered on human hearing. Now we will introduce and discuss how scientists and audiologists care more about how to hear. Science is applied to the Chinese people's hearing and speech.

Chinese is a characteristic tonal language, which has obvious phonetic differences with Other language families based on pinyin letters, such as the Slavic language. This difference is not only very clear in the language features, but also in the specific use. Whether different phonetic features of different language families can affect the understanding of speech in impaired hearing patients, especially when using hearing aids based on different language research results, whether this difference in speech plays an important role has recently become academic and scientific research. A hot topic. For example, one of the characteristics of the domestically developed cochlear implant is to consider the characteristics of Chinese speech when designing its algorithm. Foreign hearing aid manufacturers will introduce hearing aids featuring Chinese speech in the near future. Through years of research and experimentation, Canada's speech lab in China used leading digital signal processing (DSP) technology in 2000 to add Chinese speech algorithms to its digital hearing aids and applied for related patents. At present, they are the first new digital hearing aids based on Chinese speech processing technology, Intelligia, which has been recognized by the subjects in clinical trials. It is preliminarily proved that this new type of hearing aid is beneficial to Chinese-speaking patients.

The results of the current study show that different language families, such as Chinese and English, have their own characteristics, which are quite different in the process of auditory perception. There are important differences between English and Chinese in speech and spoken English. Ming-Xi Tsai el al (2000) believes that Chinese and English speech differ greatly in their structural features. Chinese words, words, syllables, and finals contain different levels of information and maintain complex relationships. In spoken language, the pronunciation of Chinese is also very different. Under different conversational conditions, it is affected by different levels of information in these structures.

The research on Chinese speech recognition and Chinese speech tones is shown in the algorithm of cochlear implant. The speech processing strategy is that the cochlear implant helps patients understand the core language technology, and a lot of research has been done. However, there are few studies on speech, especially tones and intonations, such as Chinese studies based on intonation. In a recent trial, they used Australian cochlear implants to observe the impact of Chinese phonetic comprehension. The results show that the use of Chinese in some speech processing strategies is higher than other time strategies. If we can improve the stimulation rate and enhance the understanding of speech and tone, they also believe that different speech processing strategies also have an understanding of Chinese. Research has once again proved that Chinese should have a certain voice system to deal with its own language, especially for the hearing impaired.

Michael Qin, a researcher at the Massachusetts Institute of Technology, studied the relationship between the recognition of Chinese Mandarin and the noise in his experiment "Identification of Noise Background Pronunciation and Tone". He believes that different languages ​​use different types of tones to make our spoken words rich in different meanings. In a noisy environment, these meaningful tones will be affected, so he needs to find out how Chinese people who speak Mandarin can recognize different tones in a noisy environment. . In the experiment he used 6 vowels and used 4 tones: yin and yang. The results show that the recognition of Chinese tones and vowels is greatly affected when the signal-to-noise ratio is reduced, which affects the ability to reduce speech. Therefore, the signal-to-noise ratio affects the important factors in understanding Chinese. This test is of great significance for hearing rehabilitation and designing targeted hearing aids.

At the same time, a comprehensive expert research group has been set up in the United States to develop hearing aids suitable for Chinese speech. The team includes the world-renowned House Institute and the Chinese University of Hong Kong ENT. Similar to the above study. They think that when listening to tones as a language for recognizing speech and semantics, such as Mandarin, Cantonese and Thai, it may be more important to listen to the basic frequency-related information to understand the language, which is different from other languages. Therefore, when developing hearing aids, we should consider the language characteristics of these patients.

Of course, the author is most interested in a recent trial sponsored by the Wellcome Trust, entitled "Chinese Mandarin Conversers Use Brain More Than the English Conversation When Understanding Language." Imaging technology to observe and study the different activities of the brains of Chinese native speakers and native English speakers. Dr. Sophie Gault, a psychologist who led the study, found that when English subjects heard English, their left temporal lobe became very active, and the researchers thought that this area was a combination of speech sounds to form an independent Words. But when Chinese subjects hear Mandarin, their left and right temporal lobe are active at the same time. Obviously, because different language subjects use different regions of their brain to decode different language stimuli. This has had a big impact on our understanding of these theories. They further believe that the left temporal lobe of Chinese subjects treats the speech signal while their right temporal lobe processes the tone while producing meaning. Speech is a very complicated voice, and correctly understand the meaning of speech transmission. In this case, the brain will make full use of the speaker's swaying tone to decode its speech, thus turning the spoken language into a meaningful signal.

The auditory region of the brain is easily influenced by external influences and changes the ability to distinguish sounds. Once the hearing is compromised, rehabilitation is necessary and the brain needs to be reconnected and coded. The plasticity of the brain is very strong. Understanding the brain's response to different languages ​​can effectively help hearing patients regain their understanding of the language. What is important is that based on these studies, we can clearly see the development of hearing rehabilitation equipment with Chinese phonetic features. I remember that at the opening ceremony of the Speech and Hearing Center of Peking University and the China Disabled Persons' Federation in 2002, Mr. Deng Pufang specifically spoke in his speech: He first heard the influence of Chinese speech processing features on hearing aid users, he thought it was a Important topics require a lot of work, and the development of an auditory rehabilitation device featuring Chinese speech will have important implications. According to the internationally recognized incidence of hearing loss, 10% of China's population, or 130 million people, have different levels of hearing loss. Therefore, using Chinese speech processing technology to help listeners with disabilities is very important.

one. Chinese speech technology processing principle

The English words of Chinese speech processing strategy include "Chinese speech processing strategy" or "Chinese speech recogniTIon", (Chinese speech recognition) and "hearing aid algorithm" (hearing aid algorithm). Among them, algorithm, the word "algorithm" is used more, especially related to the development of digital hearing aids. The "algorithm" represents the core of a particular technology. An "algorithm" can be viewed simply as a sequence of instructions that implement certain signal processing functions. Chinese speech features can be formed through algorithmic research. The digital signal processor and algorithm form the DSP line of the digital hearing aid. Including multi-channel dynamic range compression, noise attenuation and other processing, the main goal of the algorithm for designing hearing aids is to use Chinese speech processing technology, even in different listening environments, to ensure that speech is heard and comfortable. At the same time, the use of digital hearing aids to improve Chinese intelligibility, so that Chinese patients with hearing loss can understand Chinese more easily.

Chinese is a tonal single-word language, and tone is one of the important phonetic features of Chinese. The tonal characteristics are mainly reflected in the mode in which the fundamental frequency of the voice changes with time. Eady Technology (1982) has examined the tone language—the fundamental frequency pattern of Chinese and the accent language—what is different in English. The tone of Chinese has a disciplinary role in words. In life practice, everyone can also understand that the tone helps us to understand other people's words, and the "Southern North" is often difficult to understand and not very understandable and not very good. meaning.

For continuous speech, the long-term average positive and negative tremor factors are similar in all languages ​​and male and female speakers. Only negative tremor is always greater than positive tremors, and the frequency of occurrence is also higher. Eady's measurements show that Chinese speaks more slowly than English. This may be because when speaking Chinese, the speaker has to work harder to control the vocal cord movement on each syllable, that is to say, the syllable throat motion control of the tone language has a larger linguistic load, so it takes more time. . The result is that the speech is slower.

Therefore, the tone information mainly exists in the change of the fundamental frequency with time, the intensity change has a compensation effect on the tone information, and the presence or absence of the clear consonant has a certain influence on the tone definition.

1 Principle (Principles)

This paper introduces a speech processing method that can be applied to digital hearing aids to improve Chinese intelligibility. The goal is to make it easier for Chinese-speaking people with disabilities to understand the language. The idea of ​​enhancing verbal intelligibility comes from people's practical experience. Recall that when you make it easier for a hearing-impaired person to understand what they are saying: you not only need to increase the volume, but also change the way you pronounce it, making it slower and clearer. Some studies have shown that reading meaningless sentences clearly can increase word intelligibility by about 17% compared to everyday conversation sentences. The so-called clearer here refers to some hints in the speech signal that have many different forms, such as the duration of a particular segment, the formant position of a vowel, or the transition between phonemes.

Not everyone will simply and conveniently speak “clearly” to patients with hearing loss. Therefore, the way we use speech enhancement is to construct a processing model between the speaker and the listener. The model can emphasize and highlight the specific components of the sentence, making the statement sound clearer.

The reason why all voices can express meaning is because there is a difference between the individual sounds. These differences arise from the differences in the pronunciation and pronunciation of the organs and muscles, etc., which are determined by the activity inside the acoustic cavity, and at the same time as the differences in the acoustic characteristics of the speech. The speech enhancement method proposed in this paper is to strengthen these differences by reconstructing the speech signal. The so-called recombination refers to the identification of signals of different natures in the speech signal and the targeted processing, emphasizing the characteristics of the human perception, so as to achieve the purpose of improving speech clarity. The method can be summarized as: amplifying consonants, accentuating accents, and highlighting tones.

2 Perceived characteristics of Chinese speech signals

2.1 Tone

Tone tune.

The perception of tone.

Mainly based on changes in the fundamental frequency.

Changes in pitch pitch may have an effect on both the length and the intensity.

2.2 Stress

The acoustic properties of light and heavy sounds.

It is closely related to the actual sound intensity, but it is not equal.

It is also subject to the tone, pitch and length.

Perceptual characteristics: When distinguishing between light and heavy, the sound intensity is often not the decisive factor.

1) Consonant AmplificaTIon

The psychological experiment of speech perception confirms the following characteristics: In the process of speech perception, there is a strong difference between the perception ability of the speech signal load on the pronunciation method and the resolution information of the pronunciation part. In general, people have a better ability to distinguish the pronunciation method than the pronunciation part. The relationship between method definition and consonant clarity is very similar. Among the perceptual importance of the pronunciation method of Chinese consonants, there are relationship between strong and weak, clear and turbid, aspiration and non-aspiration, friction and non-friction. Studies have shown that relatively enhanced consonants help improve speech intelligibility.

Kates describes how to amplify consonants, and Figure 1 is a model that is widely used. The system decomposes the signal into several bands, detects short-term spectral shapes in each band, identifies vowels and consonants according to the spectral shape, and amplifies the consonants. It should be pointed out that Du Limin et al. proposed the concept of Chinese phonetic guidance features, and provided an auxiliary matching structure for the Chinese automatic speech recognition system from the perspective of acoustic information calculation and detection.

Digital Hearing Aid Design Based on Chinese Speech Processing

Figure 1 Consonant enhancement system


2) Stress (Stress)

The syllables that make up a stream of speech sounds are not exactly equal. Some syllables sound louder than other syllables in the stream, which is the accented syllable. Some accents are closely related to semantics and grammar, such as the accent of words in Mandarin Chinese. The word accent appears in the word because the meaning of the word is different and the position of the accented syllable is different. For example, "Technology" and "Count", the accent is in the first syllable and the second syllable, respectively. This semantic difference is expressed by the "supersegment feature".

In Chinese, the influence of accent on the prosodic characteristic parameters has received much attention. The prosodic feature in the flow of speech is expressed by the change in pitch, length, and intensity, that is, the "supersegment feature." Observed from the language map, the sound field clearly expands the characteristics of the accent. Gao Mingming studied the acoustic performance of accent stress in the summary of Putonghua sentences, pointing out:

(1) “The rise in pitch is an important prosodic feature that emphasizes accent in Mandarin sentences”.

(2) Pitch and duration play an equally important role in emphasizing the realization of stress. The relationship between them is complementary.

The experience of speech synthesis tells us that pitch is the most effective way to adjust accents, so the method of enhancing accent is mainly to improve the pitch.

3) Tone and InternaTIon

A syllable must include a certain pitch, pitch, and length in addition to vowels and consonants arranged in chronological order into a series of sound quality units. In some languages, the role of pitch in syllables can be said to be as important as vowels and consonants. The pitch that distinguishes the meaning of syllables is "tune." According to the presence or absence of tones, the world's languages ​​can be divided into two major categories: tonal and non-tonal. One of the most prominent features of the Han and Tibetan language languages ​​is the tone.

The tone of Mandarin Chinese plays a role in the formation of words. For a syllable with the same pinyin, it can have different meanings due to different tones. There are four modes of tone change in a monosyllabic syllable. Different tones are reflected in the speech parameters, which are different in the pitch of the pitch frequency. According to some rules defined by experimental observations, it can be considered that a certain parameter of the pitch frequency trajectory exceeds a predetermined threshold, and can be judged as a certain tone type. On this basis, the recognition mode proposed by Huang Zezhen and Yang Xingjun uses the first and second slopes, valley points and flatness of the pitch trajectory curve to have strong distinction between the four tones. Experiments show that the recognition rate of this algorithm can be Reached 99%.

Lin Maocan pointed out that tone information mainly exists in the main vowel (and its acoustic transition). Considering the change of pitch pitch, it may have an influence on the length and intensity of sound, that is, the shortest and strongest sound, the longest and weakest, the weight of Yinping and Yangping, and Yangping is often slightly longer than Yinping. Tone enhancement cannot simply amplify the main vowel, but different tones have different processing in pitch and intensity. In practice, we adopt the following strategies:

(1) Enhance the sound intensity of the sound.

(2) Increase the length of the sound.

(3) No change to Yinping and Yangping.

The four acoustic curves shown in Figure 3 depict the frequency characteristics of the four sounds at different times.

Figure 3: Acoustic characteristics of Chinese four tones


2. Methodology

The core part of the digital hearing aid is the gain calculation. Based on the frequency domain processing, it establishes the input instantaneous energy as a function of the gain for each frequency segment. As shown in Figure 3, the instantaneous energy of each frequency band is accumulated in a short time. And the long-term slow average averaging data necessary for signal identification and classification. among them:

( 1 ) E j (n)= a E j (n-1) where: a is a time constant.

(2) Use the cepstrum algorithm to extract the fundamental frequency, 512 point FFT, 40ms Hamming window, and the window shift to 10ms.

(3) Smoothing the fundamental frequency measured by each syllable with a simple moving average algorithm, and eliminating those values ​​whose deviations are too large in the smooth segment.

(4) The pitch and length are normalized separately.

(5) A quadratic curve is used to approximate the pitch trajectory in the sense of minimum mean square error. And calculate the slope, secondary slope, valley point and flatness of the curve.

The above algorithm is implemented in an assembly language based on the TOCCATA instruction system. 14-bit A/D with a sampling rate set to 32KHz.

Digital Hearing Aid Design Based on Chinese Speech Processing

Figure 3. Chinese speech enhancement system processing structure


1). Speech division (ClassificaTIons of Phonemes)

Sound waves are composed of four parts: sound quality (tone), pitch, intensity and length. These four parts play different roles in speech, but they coexist in time.

Sound quality components - divided by syllables, such as vowels and consonants.

Ultra-quality component - consists of three parts: pitch, intensity, and length, attached to a syllable or segment.

From the acoustic characteristics, the pitch can be determined from the fundamental frequency, the intensity is determined from the amplitude, and the length is determined according to the time.

2). Principles of Processing (Algorithm Principles)

Chinese speech processing is mainly reflected in:

In the fitting process, considering the frequency of the long-term spectral coverage of the Chinese speech for weighting, and raising the part of the speech frequency in the target curve, the effect of enhancing the speech understanding can be achieved.

In the signal processing program of the hearing aid, the compression controller is specially set to make the start time and release time of the high frequency signal compression short, to make the consonant clear and enhance the user's understanding of the speech. .

In the noise reduction process, according to the sampling analysis of Chinese speech in the noise environment, the noise reduction strategy optimized for Chinese speech is obtained. Experiments have confirmed that this strategy can increase the signal-to-noise ratio by 18dB.

two. Chinese speech processing technology in applications involving hearing aids

The following is a specific example of applying Chinese speech technology to designing a hearing aid. This technology uses the world's most advanced DSP digital technology, including low-power digital chips.

1. TOCCATA digital signal processing system

The ToccataTM system is a miniature, ultra-low power, high efficiency digital signal processing system. It includes a high-fidelity weighted filter bank (WOLA filter bank), a 16-bit DSP core, two 14-bit A/D converters, a 14-bit D/A converter, and other peripherals. Toccata TM technology provides a standard software-programmable DSP development platform and a miniature VLSI fabricated in a 0.18 μ process. It not only facilitates the development of audio processing system manufacturers but also other DSP-based miniature, low-power products.

1.1 Hardware Structure (Hardware Structure)

Figure 4 hardware system structure



The TOCCATA system consists of three chips, an "analog" chip (ALPHA), a "digital" chip (DELTA), and an E 2 PROM chip for uncharged storage.

1.2 ALPHA chip

The ALPHA chip includes input and output amplifiers, two A/D converters, a D/A converter, and a master clock and power supply system.

1.3 DELTA chip

The DELTA chip includes a 16-bit software programmable DSP core, a WOLA filter bank coprocessor, a DMA controller (input and output processor or IOP), and memory (RAM and ROM). The combination of a programmable core and a flexible filter allows the signal to be processed by software. Therefore, the structure can perform a conventional audio processing system processing scheme (for example, dual-channel compression), and of course, a more powerful processing scheme can be performed through the DSP core (for example, compression of 16 channels or more channels, noise reduction, feedback suppression, etc.) ).

1.4 DSP core and instruction system (DSP Core)

RCORE is a flexible DSP core that uses a dual Harvard architecture with a single-cycle multiply-accumulate operation and a 40-bit accumulator. Peripheral components are provided by a composite of extended registers, memory map registers, and shared memory.

1.5 signal path

Figure 5. Signal path provided by the Toccata system:



2 Intelligia digital hearing aid structure

The Intelligia all-digital hearing aid is designed based on the technical characteristics of the chip described above, and its structure can be illustrated by Figure 6. Although digital hearing aids use microphones and receivers as energy converters like analog hearing aids, level signals have been converted to digital codes after A/D sampling in digital signal processors. Digital coding can be used very flexibly to provide gain, improve frequency response, or otherwise process the patient's hearing requirements. When the DSP algorithm is completed, the digital code is converted to a level signal by D/A and converted to sound via the receiver.

The key to digital hearing aids is the information processing system. The all-digital hearing aid Intelligia, based on the current digital signal processing system ToccataTM, has a unique Chinese speech processing function. In the design, the hearing aid decomposes the signal into 16 bands of filtering processing, and then combines the signals of 16 bands into 10 groups of channels. Each channel independently uses the input automatic gain control method (AGCi) to compress the signal, and each channel is used. Fast and slow two time detectors, fast time detectors to monitor faster changes in signals, and slow time detectors to detect slower signal changes, ie syllable changes, and match the Chinese speech changes The compression and release time constants achieve better hearing results.

Full digital hearing aid technology features:

1) Chinese speech signal processing

After delving into the vocal characteristics of Chinese and other tonal languages, we put the original Chinese speech processing technology into place, which can greatly improve the intelligibility of listening in Chinese language environment.

2) faster

The 3rd generation digital hearing aid processing system TOCCATA, designed for digital hearing aids, has a powerful computing power that enables fast processing of various voice signals.

3) More power saving

The operating current is less than 1 mA, and it automatically enters the power saving mode when there is no signal input. This low energy consumption eliminates the wearer's frequent battery replacement.

4) fully programmable

Through its programmable advantages, the hearing impaired can be configured with the most suitable hearing compensation program and parameters to ensure that the wearer can get the best listening experience.

5) Multi-channel independent compression

The external sound is subdivided into multiple bands and channels by frequency, and the signals of each band and channel are processed differently to ensure that the wearer hears clearer and more realistic sound.

6) Noise reduction processing

It can effectively suppress environmental noise and improve the ability to distinguish language, thus ensuring that the wearer can hear clear sounds in noisy streets or in noisy supermarkets.

7) Directional processing

A directional microphone system and corresponding software can be configured to make the noise reduction better, thus ensuring that the wearer hears clearer, more natural sound.

8) Acoustic feedback suppression

Hearing aids are prone to whistling during use, which is acoustic feedback. The use of acoustic feedback suppression technology can effectively suppress the appearance of acoustic feedback, enabling the wearer to hear a more comfortable sound.

9) Easy to upgrade

Thanks to the fully open digital signal processing (DSP) platform TOCCATA technology, it offers programmable capabilities, full adaptability and upgradeability, so wearers can enjoy the latest features right away with our software. . The following is a comparison of the technical indicators of this Chinese speech processing:

Table 1 Technical comparison of Chinese speech technology for hearing aids and other hearing aids


In the laboratory, digital hearing aids with Chinese speech enhancement methods, the results of preliminary experiments show that the use of Chinese speech processing technology can help Chinese-speaking patients better understand language and improve rehabilitation. In clinical use, patients wearing Intelligia hearing aids work well, especially in noisy environments, enhancing speech intelligibility. In a sense, the patient's ability to understand the language is improved. Of course, we must realize that the use of Chinese speech processing technology in all digital hearing aids is still in the early stages of research. The author believes that audiologists and hearing aid experts should conduct more in-depth research from the following aspects:

The English and Chinese-based speech processing techniques should be compared in depth, especially in the noisy environment, and the effects of the two techniques on the different processing of the two speeches are observed. The most ideal experimental conditions should be the participation of subjects with bilingual ability.

Combining Chinese speech processing technology with the currently used nonlinear hearing aid fitting method, observing the English-based fitting method, whether it is more effective in helping Chinese-speaking patients in daily life with the support of Chinese speech processing technology Improve speech comprehension in life.

Chinese speech processing technology is currently one of the research hotspots of human-machine dialogue. The algorithm is complex and diverse. We should study the hearing aid technology algorithm with Chinese characteristics in depth and give full play to the great potential of digital chips.

The application of Chinese speech processing technology to the hearing device has only just begun. This is a very complicated subject involving many unsolved technical problems. However, the author believes that only the development of hearing aids with Chinese phonetic features can more effectively help many Chinese-speaking listeners.

Motorola radio

Motorola Radio,Motorola Two Way Radios,Motorola Talkabout Radios,Motorola Handheld Radio

Guangzhou Etmy Technology Co., Ltd. , https://www.digitaltalkie.com