Acoustic detection prototype using gcc-phat on microcontroller

pdf 7 trang Gia Huy 2630
Bạn đang xem tài liệu "Acoustic detection prototype using gcc-phat on microcontroller", để tải tài liệu gốc về máy bạn click vào nút DOWNLOAD ở trên

Tài liệu đính kèm:

  • pdfacoustic_detection_prototype_using_gcc_phat_on_microcontroll.pdf

Nội dung text: Acoustic detection prototype using gcc-phat on microcontroller

  1. Nguyen Trung Hieu, Nguyen Hai Nam, Phan Hoang Anh ACOUSTIC DETECTION PROTOTYPE USING GCC-PHAT ON MICRO- CONTROLLER Nguyen Trung Hieu*, Nguyen Hai Nam*, Phan Hoang Anh+ *Posts and Telecommunications Institute of Technology +VNU University of Engineering and Technology Abstract: This paper presents an acoustic directional quality, which is essential, are the sound source detect device’s development and design based on the localization technique. Generalized Cross-Correlation PHAseTransform (GCC- Many devices are capable of locating sound sources PHAT) technique. The topic’s primary goal is to build a that have been presented using different methods to serve KIT that supports audio processing, including a 4- a variety of purposes in today’s world. One of them is channel Microphone array and a popular microcontroller, more specific noise source localization, such as STM32F103c8t6, embedded with GCC-PHAT algorithm aeroacoustic measurements in wind tunnels, flying to estimate the direction and intensity of the sound source aircraft, and engine testbeds[2]. Characterized by tests captured. We also have tested, evaluated, and compared aimed at evaluating and verifying an aviation vehicle’s the performance with a sound source localization device efficiency, the methods used in this field place a heavy on the market. The article focuses on data processing and emphasis on accuracy and, in fact, use hefty hardware to optimizing computational methods to enhance achieve the most accurate results. Among them are performance while overcoming the limitation of the deconvolution methods such as DAMAS[3], CLEAN[4], microcontroller. LPD[5], beamformingbased methods like GIBF[6], RAB[7] or others like SEM[8], SODIX[9], IBIA[10]. Keywords: Microcontroller, STM32, sound source Another application of sound source locating is speech recognition and speech separation[11] for human-machine localization, GCC-PHAT, cross-correlation. interaction applications. The most prominent example in I. INTRODUCTION this area might be the ASIMO robot[12] using the HARK framework[13] or the Hadaly robot[14]. Other Several studies have focused on developing applications, such as multirotor-UAV[15], use the MUSIC humancomputer or human-mobile interfaces in intelligent method or the IHRTF[16] algorithm for robotic heads. environments to support user tasks and activities. However, one common feature of the applications Acknowledgement of voice/sound location and direction mentioned above is that they are all production prototypes provides valuable information allowing for a better using tightly linked hardware and software. On the other understanding of user activities and interactions in those hand, heavy and expensive hardware is also a limitation environments, such as analyzing group activities or for new researchers entering this field. Realizing this, behaviour, deciding who is the active speaker among the STMicroelectronics released STEVALBCNKT01V1[17], participants, or determining who is talking to whom [1]. also known as the BlueCoin kit. However, at $80, this Along with a vibrant scientific and technological device is not affordable for many people. Therefore, with revolution, the sound processing engineering field has also the idea of creating a simple, affordable acoustic direction experienced continuous development. A series of robots, detection device, suitable for students, engineers, and AI manufactured by research institutes worldwide, has researchers to assist the learning and research process. been announced with a great attraction. Their unique The topic content will present the device’s development feature that makes the whole world pay attention is the process and design to support the sound direction of ability to communicate and talk to people like tour-guide arrival (DOA) detection with a simple microcontroller’s robot, Siri (Apple), Cortana (Microsoft), or Sophia (the hardware platform and localization algorithm. first robot citizen of the world). In terms of technical This paper is composed of 5 sections. Section II aspects, to get the ability to communicate with people, the summarizes the theoretical basis. The process of recognition of sounds and voices is indispensable in the optimizing hardware and software is presented in section processing process. The identification technique’s quality III. While section IV covers system design and is of utmost importance and directly affects the entire evaluationg, finally concludes are presented in section V. communication system’s quality. Many methods and techniques developed to improve speech recognition II. THEORETICAL BASIS One of the most common ways to estimate DOA is to Correspondence: Nguyen Trung Hieu exploit the time-difference-of-arrival (TDOA) of the email: hieunt@ptit.edu.vn sound wave to a microphone array. Its popularity base on Manuscript communication: received: 10/03/2020, revised: the fact that it can be modified to match the researcher’s 11/23/2020, accepted: 11/30/2020.
  2. ACOUSTIC DETECTION PROTOTYPE USING GCC-PHAT ON MICRO-CONTROLLER interest, such as one or two-dimensional localization of other. It is commonly used for searching a long signal for single or multiple sound sources. But to understand the a shorter, known feature. It has applications in pattern basic principle, let’s look at an example on a pair of recognition techniques, electronic tomography, averaging microphones. techniques, decoding, and neurophysiology. Figure 2. Cross-Correlation example Consider two real-valued functions, f and g, differing only by an unknown shift along the x-axis. One can use the cross-correlation to find how much g must be shifted along the x-axis to make it identical to f. The formula Figure 1. Utilized feature example practically slides the g function along the x-axis, In a free/far-field propagation model, as shown in Fig. calculating the integral of their product at each position. 1, there is a sound source which is transmitting a sound When the functions match, the value of (f ∗ g) is signal to two microphones 1 and 2. Consider the sound maximized because when peaks (positive areas) are wave that approaching the microphone pair is a planar aligned, they make an enormous contribution to the wave, which means there is an amount of time difference integral. Similarly, when troughs (negative areas) align, of arrival (TDOA) between these microphones let us call they also make a positive contribution to the integral because the product of two negative numbers is positive. it tdelay. Therefore, there is a simple feature-to-location mapping between tdelay and the DOA of the source. For example, with the Fig. 2, the two sine wave signals, f and g, have been deviated by 20 unit in the time l Vtsound delay domain, their cross-correlation result has peaks located at  ==arccos arccos () dd position -20 in time lag axis. This implied that if we shift g sooner by 20 unit, we will get f. where θ is the angle created between the sound source However, this cross-correlation execution has been direction and the imaginary line between the microphone proved to be very sensitive to reverberation and noise. In pair; d is the spacing of the microphone pair; l is the these cases, the cross-correlation results are "spread" to distance difference of arrival between Mic1 and Mic2, other TDOA regions, making it difficult to determine the which is constructed from the multiplication of speed of right time difference. On the other hand, these resulting sound Vsound = 343.2(m/s) and TDOA as tdelay. peaks are quite dull, causing in low certainty. However, with each obtained tdelay, there is another DOA result that can be produced as θ’, which mirroring B. Cross correlation on frequency domain based on the actual source. This ghost source appearance is GCC-PHAT addressed as an ambiguity problem which has been The researchers have proposed cross-correlation in the existed since the beginning of the overall binaural frequency domain (CCF)[18] to solve the above problem approach. To avoid this, in section III-A, we will provide as an alternative to conventional crosscorrelation. This a simple square array solution and how to extract the technique gave a different result F-1 (CCF) with cross- incoming direction from it. correlation in the time domain CCT, but it has been shown TDOA processing mentioned in the article is an that the positions of the peaks match each other. effective way to solve the deviation between The primary purpose when doing cross-correlation microphones. Many algorithms, such as GCC, GCC- over the frequency domain is that we can apply some PHAT, and DFSE, can effectively estimate TDOA. weighting functions to the result; this technique is called However, we have chosen the GCC-PHAT method for general cross-correlation (GCCF). So if this weight is this research because the PHAT method has better equal to the magnitude of the correlation itself, we will performance in Gaussian noise distribution and the actual normalize the magnitudes in GCCF, causing the inverse noise environment. GCC-PHAT has also required less Fourier transformation results to have peaks in the Dirac computational resources, which is essential when delta function approximate form in the high correlation executing on a microcontroller. TDOA. This normalization left the information about the phase intact, so it is called phase transform (PHAT). A. Cross correlation in time domain (CCT) In signal processing, cross-correlation is a measure of Y01( f) Y( f ) the similarity of two signal chains or like a function of the GCC−= PHATF ( f ) () displacement between one signal series relative to the Y01( f) Y( f )
  3. Nguyen Trung Hieu, Nguyen Hai Nam, Phan Hoang Anh The GCC-PHATF is a cross-correlation in the frequency domain incorporate with phase transform. To put this in perspective, we show the theoretical comparison results (Fig. 4)[19] of two similar input signals, which is considered an audio source with one is delayed intentionally. Both graphs show the same deviation peaks, but in the graph (b) we can easily see that it has a sharper peak and is more recognizable. Considering a more interesting case as the second signal is an interfering component. Specifically, in the follow-up investigation, the signal was virtually experiencing reverberation with a known delay. The graph below (Fig. 5) shows the cross-correlation results of both Figure 5. Result difference between two method methods in this situation. in reverberation environment In graph (a) with the conventional cross-correlation, it was proved that it failed to separate the delay between the III. PROPOSED DESIGN echo and the actual signal, directly affecting the obtained deviation’s quality. In contrast, with GCC-PHAT, we A. TDOA to DOA separated two signals and obtained two separate sharp As mentioned in section II, if we only use two peaks, and showed the signal’s exact delay (Fig. 5(b)). microphones or a microphone array lined up in a straight GCC-PHAT distinguished them because it had sharp line, we will be challenging to determine whether the peaks that did not spread to neighbouring peaks like time- sound is coming from the front or the back of the domain cross-correlation. array(ambiguity). Therefore, to define TDOA on a plane Thus, with the GCC-PHAT technique applied to the (specifically in this topic is the azimuth plane), we need to system, the noise and echo problem have been resolved, have a microphone array made up of multiple microphone providing a more stable system and can be used in many pairs to form a parallel plane to the detecting one. different environments. To detect the sound source in a 1-dimensional plane, we consider a set of 4 microphones placed at the four angles of a square. Fig. 6 shows the layout of the Microphones on the device. The distance between the microphone pairs is a constant D, assuming that the sound source is the only source, no noise. With a free/far-field propagation model, the sound wave can be considered planar, its direction creat with the sides of the square two angles: α and β. The value l1, l2, indicates the distance difference between two microphone pairs to the sound source. Obtaining this difference, we can calculate the angles α and β or the DOA of the source. l1 = arccos D VV  ==arccos sound12 arccos sound 34 f D f D sample sample () l2  = arccos Figure 3. GCC-PHAT transformation D VV  ==arccossound23 arccos sound 14 fsample D f sample D However, if we only use the angle α or β, the ambiguity problem mentioned in section II begins to emerge. So using the microphone array as above will allow us to combine the resulting angle obtained from each microphone pair to produce a final result as µ, with Figure 4. Result difference between two method
  4. ACOUSTIC DETECTION PROTOTYPE USING GCC-PHAT ON MICRO-CONTROLLER sin() cos( ) layout in fig. 6, we can see the results of the correlations tan() == xc12 ~ xc34 and xc23 ~ xc14. Therefore, we add these two cos( ) cos( ) cross-correlations together to get xc1234 and xc2314; the goal () is to reduce later calculation and, at the same time, reduce cos( ) the error of the cross-correlation. The Max blocks are the  = arctan blocks to find the highest peaks to obtain the cos ( ) corresponding difference of l1234 and l2314. From there, by some trigonometric formulas related to the microphone array spacing D, we can calculate the incoming angle of the sound µ. C. Methods to enhance processing accuracy Since the entire algorithm will be executed on a microcontroller, we will encounter some limitations that affect the overall system’s efficiency. For that reason, in the research and development process, we have performed some investigations, tests to find out about these limitations and how to solve them, eventually improve the performance. 1) DC Offset Removal The DC Offset Removal technique is a technique to remove the mean amplitude deviation from the coordinate axis. In other words, the central horizontal axis of audio signals we receive does not match with the horizontal axis of the coordinate (Fig. 8). Figure 6. Array layout of the proposed device B. Overall algorithm So, after calculating the cross-correlation between two signals as mentioned in section II-B, the maximum value of the cross-correlation function will indicate when the signals are best aligned, or the time delay between the two signals is determined by the maximum value or the maximum angle of the cross-correlation. With this information, it is possible to exploit and calculate the incoming sound direction with a more extensive microphone array system, as mentioned in section III-A. From there, we created a block chart Figure 8. The DC Offset problem representing the calculation process of the system as follows. In particular, this error occurs because there is always an electric compensation fixed somewhere in the sound signal before it is converted from analog to digital. These offsets are usually so small that it is not noticeable, but it can become large enough to be a problem on low-quality hardware. There are many ways to solve this problem; the simplest one is that the signals after ADC will be subtracted with its average value using the Moving Average method. As a result, the signal will align with the horizontal coordinate axis. Figure 7. Overall algorithm of the device Yt1, =1 St = () Fig. 7 is the block diagram of processing audio Ytt +(1 −)  S−1, t 1 signals captured from the microphones. In turn, each microphone pair will be cross-correlated and determine 2) Signal interpolation the deviations in time between them. The xcorr blocks in the figure are cross-correlations; through the array
  5. Nguyen Trung Hieu, Nguyen Hai Nam, Phan Hoang Anh Another limitation worth mentioning when using incoming angle and intensity. These two values will be lightweight hardware is that the analyzed data will be displayed to the user via the LED Ring module. Figure 11 discrete. Therefore, the resolution of the obtained result is a picture of the audio direction detection device. will depend significantly on the number of samples taken. However, with limited hardware, this number must be carefully controlled to ensure the device’s performance. Therefore, to increase the resolution of the results without affecting the device, we decided to use the interpolation method. Figure 9. Interpolation process After the cross-correlation is completed, we interpolate immediately to optimize computing FFT and IFFT. Moreover, with the above transform, we can cancel the inverted Fourier transforms of the cross-correlation block with the interpolation block’s Fourier transforms. Figure 11. Acoustic detector prototype B. Experiment and evaluation To examine the device’s performance, we operate a test with the sound source placed at ten random angles (Fig.12), with each angle, the source play at different distances, respectively. Table I contains the results obtained from the device at each run. When analyzing the experimental results, we can see that the device performs relatively uniformly in all the angular cases on the azimuth plane. With sound sources play at a longer distance, the return error is more significant, which is expected. At a distance below 40cm, the device produces very accurate results at 1.47o to 2.40o of average error. From the distance of 80cm-100cm, the accuracy of the o Figure 10. System design device is still very high with an average error of about 3 . IV. SYSTEM DESIGN AND EVALUATION A. System design The diagram in Fig. 10 describes the audio directional detection device general model built on the STM32F103c8t6 microcontroller. The audio signal is sampled by the STM32F103c8t6 microcontroller from four microphones that already incorporated an op-amp. The microcontroller’s ADC samples the signal with a sampling frequency of 12KHz to obtain a series of digital audio signals. These are stored directly into BuffSize with a capacity of 2048 samples right after the ADC conversion is completed. DMA technique helps to reduce the information storage process, allowing the data to be calculated immediately. The signal through the processing unit using the algorithms described above will result in the Figure 12. Experiment setup
  6. ACOUSTIC DETECTION PROTOTYPE USING GCC-PHAT ON MICRO-CONTROLLER Figure 13. Performance in comparison To put this in perspective, we compare it with an density. Signal filtering and interpolation have also been audio-source localization device, ST-BlueCoin, which is added to improve accuracy. Survey results have shown the developed and competing in the market also by relative accuracy of the device and its quality compared to STMicroelectronics. The results showed the difference in a previously developed device. The device is just designed performance between the two in terms of accuracy (Fig. and presented as a prototype, so this opens up a lot of new 13). research directions for further development. For example, we can extend the effective working range of the device in At a distance of 10cm, both devices work well and terms of spatial dimension and distance. On the other both give accurate results when the two graph lines hand, we can also make more optimal use of the showing errors are low and almost identical. The hardware’s working capabilities, eliminate redundant difference begins to appear at a distance of 20-40cm when details, and reduce the physical size of the hardware and the BlueCoin device begins to show discrepancies in the complexity of the program. Furthermore, the device calculating the direction of the sound. At a distance of 80- can act as a module in a higher purpose system such as 100cm, the BlueCoin device is no longer stable while the audio source tracking, separation and recognition. device we developed still retains high accuracy. Table 1. The results of the experiment ACKNOWLEDGEMENT Distances (cm) The authors would like to acknowledge the PTIT Angle 10 20 40 80 100 Team lab of Posts and Telecommunications Institute of 35 34.502 35.554 36.558 38.398 38.072 Technology, Hanoi, Vietnam for their commitment and 70 67.490 65.127 69.175 70.112 75.311 support of the conducted research. 105 104.32 106.61 108.64 110.15 111.05 140 139.38 139.24 137.67 138.09 138.73 REFERENCES 175 173.59 175.05 175.48 173.85 171.36 [1] S. Haykin and Z. Chen, “The cocktail party problem,” 210 209.39 213.98 212.34 213.87 210.27 Neural computation, vol. 17, no. 9, pp. 1875–1902, 2005. 245 245.36 247.43 247.80 248.72 248.73 [2] R. Merino-Martínez, P. Sijtsma, M. Snellen, T. Ahlefeldt, 280 284.64 286.62 284.53 288.04 288.86 J. Antoni, C. J. Bahr, D. Blacodon, D. Ernst, A. Finez, S. 315 313.39 313.43 315.22 317.69 316.94 Funke et al., “A review of acoustic imaging methods using phased microphone arrays,” CEAS Aeronautical Journal, 350 351.75 348.49 351.16 350.26 346.7 vol. 10, no. 1, pp. 197–230, 2019. V. CONCLUSIONS [3] T. F. Brooks and W. M. Humphreys, “A deconvolution approach for the mapping of acoustic sources (damas) We discussed the solution to detect the sound source determined from phased microphone arrays,” Journal of direction in 2-dimensional space using the 4-Microphones Sound and Vibration, vol. 294, no. 4-5, pp. 856–879, 2006. square array model via the STM32F103c8t6 [4] P. Sijtsma, “Clean based on spatial source coherence,” International journal of aeroacoustics, vol. 6, no. 4, pp. microcontroller. We used the proposed GCC-PHAT 357–374, 2007. method to improve the quality of the latency estimation [5] R. P. Dougherty, R. C. Ramachandran, and G. Raman, algorithm. The GCC-PHAT technique was chosen “Deconvolution of sources in aeroacoustic images from because it performs well in environments with high noise phased microphone arrays using linear programming,”
  7. Nguyen Trung Hieu, Nguyen Hai Nam, Phan Hoang Anh International Journal of Aeroacoustics, vol. 12, no. 7-8, mảng Micro 4 kênh và bộ vi điều khiển STM32F103c8t6, pp. 699–717, 2013. được nhúng thuật toán GCC-PHAT để tính được hướng [6] T. Suzuki, “L1 generalized inverse beam-forming algorithm resolving coherent/incoherent, distributed and và cường độ của nguồn âm thanh thu được. Chúng tôi multipole sources,” Journal of Sound and Vibration, vol. cũng đã thử nghiệm, đánh giá và so sánh hiệu suất với 330, no. 24, pp. 5835–5851, 2011. thiết bị phát hiện hướng âm thanh đang có trên thị trường. [7] X. Huang, L. Bai, I. Vinogradov, and E. Peers, “Adaptive Bài báo này tập trung vào việc xử lý dữ liệu và tối ưu hóa beamforming for array signal processing in aeroacoustic các phương pháp tính toán nhằm nâng cao hiệu suất đồng measurements,” The Journal of the Acoustical Society of America, vol. 131, no. 3, pp. 2152–2161, 2012. thời khắc phục được những hạn chế của vi điều khiển. [8] D. Blacodon and G. Elias, “Level estimation of extended Từ khóa: Vi điều khiển, STM32, phát hiện hướng âm acoustic sources using a parametric method,” Journal of thanh, GCC-PHAT, tương quan chéo. Aircraft, vol. 41, no. 6, pp. 1360–1369, 2004. [9] H. A. Siller, J. Konig, S. Funke, S. Oertwig, and L. Hrit- ¨ Nguyen Trung Hieu, Received the sevskyy, “Acoustic source localization on a model engine B.S. and M.Sc. degrees in electronics jet with different nozzle configurations and wing Ảnh tác giả and telecommunications, and the Ph.D. installation,” International Journal of Aeroacoustics, vol. degree in electronics engineering from 16, no. 4-5, pp. 403–417, 2017. Posts and Telecommunications Institute [10] C. J. Bahr, W. M. Humphreys, D. Ernst, T. Ahlefeldt, C. of Technology, Hanoi, Vietnam, in 2006, Spehr, A. Pereira, Q. Leclère, C. Picard, R. Porteous, D. 2010, and 2018, respectively. From 2006 Moreau et al., “A comparison of microphone phased array to present, he was with the Faculty of methods applied to the study of airframe noise in wind Electronics Engineering, Hanoi, Vietnam. tunnel testing,” in 23rd AIAA/CEAS aeroacoustics He is currently Head of the Department of conference, 2017, p. 3718. Electronics and Computer Engineering. [11] P. Bofill and M. Zibulevsky, “Underdetermined blind His research interests include coding source separation using sparse representations,” Signal theory, signal processing, communication processing, vol. 81, no. 11, pp. 2353–2362, 2001. systems, IoT devices and systems, and [12] K. Nakadaij, H. Nakajima, M. Murase, H. G. Okuno, Y. electronics design. Hasegawa, and H. Tsujino, “Real-time tracking of multiple sound sources by integration of in-room and robot- Nguyen Hai Nam, Received the B.S. embedded microphone arrays,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and degree in electronics engineering from Systems. IEEE, 2006, pp. 852–859. Ảnh tác giả Posts and Telecommunications Institute of Technology (PTiT), Hanoi, Vietnam, in [13] K. Nakadai, T. Takahashi, H. G. Okuno, H. Nakajima, Y. 2020. He is currently Team leader of Hasegawa, and H. Tsujino, “Design and implementation of Signal Processing group, Department of robot audition system’hark’—open source software for Electronics and Computer Engineering, listening to three simultaneous speakers,” Advanced PTiT. His research interests include Robotics, vol. 24, no. 5-6, pp. 739–761, 2010. signal processing, Microcontroller. [14] S. Hashimoto, S. Narita, H. Kasahara, K. Shirai, T. Kobayashi, A. Takanishi, S. Sugano, J. Yamaguchi, H. Sawada, H. Takanobu et al., “Humanoid robots in waseda Phan Hoang Anh, Recieved the B.S. university—hadaly-2 and wabian,” Autonomous Robots, degree in electronics engineering from vol. 12, no. 1, pp. 25–38, 2002. Ảnh tác giả Posts and Telecommunications Institude [15] K. Furukawa, K. Okutani, K. Nagira, T. Otsuka, K. of Technology (PTiT), Hanoi, Vietnam, in Itoyama, K. Nakadai, and H. G. Okuno, “Noise correlation 2019. He currently works in Department matrix estimation for improving sound source localization of MicroElectro-Mechanical Systems and by multirotor uav,” in 2013 IEEE/RSJ International Micro Systems, VNU University of Conference on Intelligent Robots and Systems. IEEE, Engineering and Technology (VNU-UET). 2013, pp. 3943–3948. His research in interests include signal [16] M. Murase, S. Yamamoto, J.-M. Valin, K. Nakadai, K. processing, robotics, automation. Yamada, K. Komatani, T. Ogata, and H. G. Okuno, “Multiple moving speaker tracking by microphone array on mobile robot,” in Ninth European Conference on Speech Communication and Technology, 2005. [17] STMicroelectronics, “Bluecoin starter kit.” [Online]. Available: bcnkt01v1.html [18] J. Hassab and R. Boucher, “An experimental comparison of optimum and sub-optimum filters’ effectiveness in the generalized correlator,” Journal of Sound and Vibration, vol. 76, no. 1, pp. 117–128, 1981. [19] K. M. Varma, “Time delay estimate based direction of arrival estimation for speech in reverberant environments,” Ph.D. dissertation, Virginia Tech, 2002. PHÁT HIỆN HƯỚNG ÂM THANH SỬ DỤNG GCC-PHAT TRÊN VI ĐIỀU KHIỂN Tóm tắt: Bài báo này trình bày quá trình nghiên cứu và thiết kế một thiết bị phát hiện hướng âm thanh dựa trên kỹ thuật tương quan chéo kết hợp biến đổi pha (GCC-PHAT). Mục tiêu chính của nhóm nghiên cứu là xây dựng một bộ KIT hỗ trợ xử lý âm thanh, bao gồm