Measuring Smart Speaker Performance

 

At the core of the smart speaker is the Intelligent Virtual Assistant (IVA), enabling the use of voice commands to direct the device to do everything from playing audio content—new, music, podcasts, etc.—to control home automation systems, or even place online shopping orders. It’s worth noting that this same IVA technology, with microphones and loudspeakers to support it, is being added to all sorts of home appliances—thermostats, set top boxes, refrigerators—enabling voice control and thus turning them into “smart devices.” Obviously, most smartphones can also play the role of a smart speaker.

While this section of AP.com, and the variety of resources found here, will primarily be discussing smart speaker testing, most of the content is equally applicable to the broader category of smart devices and the measurement of their audio performance.

Measuring the performance of a smart speaker presents a variety of challenges, whether the testing is focused on a subsystem or the entire device. Many of the challenges are related to the IVA, complexities of the various subsystems and the ensuing audio signal paths.

Smart Speaker IVAs

An interaction with a smart speaker begins with a specific “wake word” or phrase, followed by a command. In their normal operating mode, smart speakers are in a semi-dormant state, but are always “listening” for the wake word, which triggers them to acquire and process a spoken command. In terms of speech recognition, smart speakers themselves are only capable of recognizing the wake word (or phrase). The more computationally-intensive speech recognition and subsequent processing is done by the Intelligent Virtual Assistant on a connected server. Depending upon the evaluation being performed, the wake word may be an integral part of the test process.

Audio Subsystems

Smart speakers contain several distinct audio subsystems, including:

  • Microphone array
  • Powered loudspeaker system
  • Signal processing algorithms (front-end processing for beamforming, noise suppression, etc.)

Audio Signal Paths

The primary audio paths for a smart speaker are between the device and the IVA, using the Internet with a Wi-Fi or wired connection. On the input side, a speech signal containing a spoken command is sensed with the device’s microphone array, digitized and uploaded to the IVA for signal processing and command interpretation. On the output side, digital audio content is transmitted from a web server to the device, where it is converted from digital to analog, then finally to an acoustic signal as it is played over the device’s loudspeaker system. Smart speakers may also have several secondary audio paths (e.g., analog output and input jacks, network connections to other smart speakers, etc.)

Audio Testing

The audio subsystems of smart speakers have a multitude of components that contribute to overall performance and audio quality. At some stage, each of these components and systems must be tested, followed eventually by an end-to-end performance evaluation of the overall smart speaker system.

Testing a smart speaker’s primary input and output audio paths can be quite challenging for the following reasons:

1. Input to, and output from, a smart speaker are both acoustic, and acoustic test is by its nature more complex than electronic (analog or digital) audio test. Acoustic tests require calibrated microphones, usually an anechoic test chamber, and a quality loudspeaker system to stimulate DUT microphones.

2. Smart speakers are inherently open loop devices. On the input side, a signal (typically speech) is captured, digitized and transmitted to a server somewhere as a digital audio file. To assess the input path performance, the audio file must be retrieved from the server and analyzed in comparison to the signal that was generated in the first place. On the output side, audio content which originates as an audio file on a server is streamed to the device where it is converted to analog and played on the device’s loudspeaker system. To assess the output path performance, the device’s loudspeaker output must be measured with a measurement microphone and compared with the original signal from the server. The original signal is often in the form of an encoded audio signal (e.g., MP3 or AAC), which requires that it be decoded before analysis.

3. The A/D and D/A converters in the device will invariably have different sample rates than the audio analyzer, requiring some form of compensation during analysis.

 AppNote: Smart Speaker Acoustic Measurements

Smart speakers are a relatively new class of consumer audio device with unique characteristics that make testing their audio performance difficult. In this 17-page application note, we provide an overview of smart speaker acoustic measurements with a focus on frequency response – the most important objective measurement of a device’s audio quality.

Technote 138: Transfer Function Measurements with APx500 Audio Analyzers

Technote 138 discusses the Transfer Function measurement added to APx500 audio measurement software in release version 5.0. We provide background information on transfer function measurements in general, followed by some practical examples of applying this measurement technique to some difficult audio test problems.

 

One of the key attributes of transfer function analysis is that it provides a means of measuring the frequency response of a device using any broadband signal, including speech and music. This makes it an ideal choice for analyzing devices used for speech communication (i.e., smart speakers, smartphones, headset microphones, etc.). Many of these devices incorporate DSP algorithms that require the use of speech signals, and some are designed to block sinusoidal signals altogether. Transfer function analysis greatly simplifies measuring the frequency response of such devices.

Smart Speaker Audio Test

This brief video provides an overview of smart speaker testing in the context of AP’s upcoming software release. The release will add the ability to use the log-swept sine – also called chirp or continuous sweep – signal in an open-loop test setup. (Includes brief audio demonstration at time 2:10.)