AI in AV: 3 Big Questions, Answered

Aug 6

Making sense of artificial intelligence in audio-visual applications

AI — artificial intelligence — is a hot topic. Media coverage of its possibilities has ranged from positive (“It can act as a wonderful virtual assistant.”) to negative (“Kids are using it to do their homework for them.”) to downright terrifying (“After the robots take our jobs, they’ll kill us all.”).

While there’s great potential for both good and bad inherent in the aspects of the technology, it’s best to remember that AI is a tool — a tool that can be put to very good use.

And that’s especially true in the audio-visual systems we use in the modern hybrid workplace.

For our part, Crestron’s AI solutions feature a line of 1 Beyond intelligent cameras and the Crestron Automate VX voice-activated speaker tracking solution. These products deliver an outstanding videoconferencing experience through their application of what’s called “Visual AI,” and they work brilliantly with platforms such as Microsoft Teams® Rooms and Zoom Rooms® software, leveraging all the AI solutions they bring to the table.

What’s all that mean, exactly? Let’s break it down by answering the three most common questions we hear:

What is “Visual AI,” and is it different from intelligent video?

You may have seen the terms “intelligent video” and “Visual AI” used interchangeably. A more accurate way to frame the two concepts: Visual AI enables the experiences we call intelligent video. The result is a system that can automatically track and frame a presenter in a room based on facial and motion detection — which is incredibly important when a meeting includes remote participants. You want those virtual attendees to see the gestures and expressions of those in the room where the meeting is based. Remote workers remain much more engaged when they can receive all those non-verbal cues.

Crestron’s Rony Sebok, director of product management, intelligent video, explained the power of this technology in an article for the online publication AI for All:

Visual AI can be used to create a variety of experiences, including “group framing” (adjusting the frame to show all participants), “auto-framing” (adjusting the frame as one person speaks), and “presenter tracking” (following a moving presenter around a space). It can further automatically switch between active talkers in the room (“speaker tracking”), provide a composite of more than one view of the room into a single video feed, and more.

Just like other examples of AI, Visual AI is getting better. “AI has been built into unified communications for a while now, but even more effective ‘robot director in a box’ solutions are being developed,” says Crestron’s Senior Director of Product Marketing Sam Kennedy. AI is being applied to audio solutions, too, gaining the ability to block extraneous noise and even identify people by their voices.

Soon, AI will help these systems “read the room” — in other words, gather a lot more info on the space. “These programs are learning to see if a room has a whiteboard and how the system’s cameras need to adjust to make that board visible for everyone joining remotely,” says Kennedy. “Soon, AI will notice if that board — or even the room itself — needs to be cleaned up for the next meeting.”

These systems will soon be able to gather more environmental info, says Kennedy: “Do the shades need to come down for a presentation? Does the room need to be cooled better when the system senses that the space is full of people?” Ultimately, AI impacts both the remote and the in-room experience.

What do I need hardware-wise?

There are several options. The most basic solutions are often found in video bars — some of which are outfitted with multiple cameras that can cut between speakers. Larger systems — those built for your most impactful meeting spaces — can be driven by cameras with intelligent video capability or combined with a speaker-tracking solution that keys on signals from microphones to follow a presenter or a conversation.

Crestron offers all these options, including our 1 Beyond intelligent PTZ cameras with optical zoom designed to capture every participant in the room — even those up to 60 feet away from the lens. Optical zoom occurs within the physical lens of a camera, while digital zoom enlarges and crops an image in close-up. Digital zoom reduces the pixel density of an image, decreasing its clarity as distances increase, thereby reducing the camera's ability to pick up those critical nonverbal cues.

Another option is the Crestron Automate VX voice-activated speaker tracking solution. This system is best for larger spaces, as you can configure up to 12 cameras to handle high-impact rooms.

The goal is to achieve smooth, Visual AI tracking and framing that delivers clear close-ups and multiple angles, creating a superior, broadcast-quality video image for remote participants. The Automate VX solution auto-frames the speaker, centering them in the frame even if they move from a position where a microphone has been sending location data. Participants can move around freely without worrying about “staying in the frame.”

The Automate VX solution also has a “reframing” function that centers people in the shot. AI plays an important function here, as it can discern between large and small movements. “If someone shifts slightly in their seat, for example, the AI doesn’t read that as a need to reframe the shot,” says Kennedy. Reducing all those unnecessary camera movements keeps people from getting queasy from the constant motion.”

What do I need to be concerned about with these systems?

Short answer: Privacy and security, and they’re both moving targets.

On the privacy front, Visual AI doesn’t start to raise red flags until it begins to recognize individual people. Those functions branch into other aspects beyond visually tracking people: When transcripts and summaries are generated, questions arise. For example, if an AI program identifies your face, is that a violation of privacy? What about the ethics of a program reporting on the “mood” of a meeting? Does AI “get” sarcasm — and can it tell the difference between a joke and a comment that’s meant to be truly negative?

Kennedy says that any system's default setting should be “opt-out.” “I think it’s ethical to ask people if they want to be identified and tracked, especially if we’re talking about generative AI or programs that are referred to as virtual assistants,” he says. “If so, they can click a button and immediately opt-in.” Local laws are addressing this, too. “There are states in the U.S. and places across the globe where it’s simply illegal to identify someone via an AI program in these settings,” says Kennedy.

As far as security is concerned, sending data to the cloud is unacceptable in certain environments. “You don’t want an AI program sending anything outside the room in situations where all that info is classified — by a government or a business,” says Kennedy. “That’s where devices with AI built right into the camera — ‘edge-based’ AI — are the answer.”

Gathering data as a meeting progresses has a huge upside, however. “Suppose one participant is what we’d call the quiet type,” Kennedy explains. “Imagine if the system told the meeting moderator that an individual hadn’t said anything — they may be shy and need that little nudge to share their great ideas.”

“We talk about ‘meeting equity’ — making sure everyone can see and be seen — as a visual thing. But the ability to hear and be heard, to help create and to share — those are just as important.

For more information on how to integrate Crestron’s AI solutions into your AV systems, contact Communication Company today.

Halle Hill

AI in AV: 3 Big Questions, Answered

School Security Takes Center Stage: Four Things to Consider

The Importance of Access Control in Hospitals and Healthcare Facilities

Protecting lives, securing futures