Voicemeeter Banana Replacement in Linux

This post describes how I replaced Voicemeeter Banana running on Windows with a Linux alternative.

Here’s why I needed to do this — My wife, my daughter and myself like to watch a certain streaming TV show together.  My daughter is in the USA and we’re in the UK. My solution in the Windows world was to use Voicemeeter Banana to provide virtual sound interfaces and mixing. I ran a browser window with the TV show, and routed the output to a Voicemeeter virtual device. I ran Jitsi Meet in a browser window (Brave or Vivaldi) and routed the output and input to/from the aux virtual devices in Voicemeeter.  My wife and I had USB headsets on,  and each had an input and output channel in Voicemeeter. Then, by the magic of twiddling the right knobs in Voicemeeter,  my daughter could hear both of our headsets, plus the TV program. My wife could hear my daughter and myself, and the TV program.  And I could hear my daughter and the TV program (there was no need for me to hear my wife because I was wearing an open headset and could hear her well enough without any electrons being involved).

I wanted to replace this with a purely Linux setup, but not because I don’t like Voicemeeter Banana.  On the contrary,  I think it’s a great piece of software and gladly paid for a license. It’s because I want to move from the Microsoft-controlled world of Windows to an open-source alternative.  I’ve been running Linux on my main desktop and servers for years, so I’m comfortable using it.

Linux sound really is a mess.  Linux sound is way over-complicated due to its history – i.e. if you sat down and designed it from scratch, you wouldn’t design what we now have. I spent literally a year periodically searching for instructions on how to support my usage model.   I experimented with different distros and kind-of got it working on some of them, but with unexpected behaviour.

I did eventually get it to work. The working configuration uses ALSA, Pipewire,  Pipewire’s emulation of Pulseaudio, Pipewire’s emulation of Jack and Ardour,  all running on Fedora 39 on my HP Envy laptop,  with Amazon Gaming and Sennheiser USB headsets.

A quick explanation of what the various software sound system components do.  ALSA is the interface to the sound hardware.   Pulseaudio is what things like browsers know how to connect to to make sound.  While the Pulseaudio server is not running in this configuration,  Pipewire runs an emulation of it so that existing apps know where to send sound.  Jack is a server and set of interfaces that allow Jack-aware applications to route sound to more than just a hardware sound device.  Pipewire runs an emulation of it so that the Jack server is not actually present,  but apps that rely on it can still use it as if it were.  Pipewire also provides the ability to create virtual devices that Pulseaudio apps and Jack apps can use and Jack apps can route sound to and from.

I messed around with Debian 12,  Neon,  Ubuntu studio.   In none of these did I get “expected behaviour” – i.e., it works as other people said it did.  I didn’t try too hard to find out why.  It might have been my own stupidity.  Fedora 39 running a KDE desktop was the first I tried that worked as I expected.

Fedora 39 comes with Pipewire installed. I installed Ardour from the Fedora flatpak repo.

I created a Pipewire configuration file “jitsi.conf” in ~/.config/pipewire/pipewire.conf.d

context.objects = [
{ factory = adapter
args = {
factory.name = support.null-audio-sink
node.name = “from-browser”
media.class = Audio/Sink
object.linger = true
audio.position = [ FL FR ]
}
}
{ factory = adapter
args = {
factory.name = support.null-audio-sink
node.name = “from-jitsi”
media.class = Audio/Sink
object.linger = true
audio.position = [ FL FR ]
}
}

{ factory = adapter
args = {
factory.name = support.null-audio-sink
node.name = “to-jitsi”
media.class = Audio/Source/Virtual
object.linger = true
audio.position = [ FL FR ]
}
}

]

This configuration is loaded by the Pipewire server whenever it starts (i.e., the user logs in).  It creates two virtual sinks (pretend sound output devices) and one virtual source (pretend microphone).

The system sound device is set to “from-browser” using the KDE volume-control widget.  When a browser runs and displays a video,  its sound goes to this virtual device. I run Jitsi Meet in a browser window,  and select “to-jitsi” as its microphone and “from-jitsi” as its speaker.

Then I create a session in Ardour that does the mixing.   Ardour is capable of much, much more than I’m using it for.  But it can act as a mixer without the need to record or play back.   In Ardour, I created a number of Audio busses:

– From Sennheiser, with input connected to the Sennheiser headset
– From Gaming, with input connected to the Amazon Gaming headset
– From Browser, with input connected to the from-browser virtual device
– From Jitsi, with input connected to the from-jitsi virtual device
– To Jitsi, with output connected to the to-jitsi virtual device and inputs connected to the “From Sennheiser” and “From Gaming” busses
– To Gaming with output connected to the Amazon Gaming headset and inputs connected to the “From Browser”, “From Jitsi” and “From Sennheiser” busses
– To Sennheiser with output connected to the Sennheiser headset and inputs connected to the “From Browser” and “From Jitsi” busses

In operation the “from” busses are balanced to give similar levels, and the “to” busses are adjusted to give comfortable levels.

Jitsi is used to share the browser tab containing the video source, and “share audio” is selected.   I believe this might give a different (higher) quality on the shared audio than mixing the browser sound into the “To Jitsi” bus,  but I have nothing to back this up.