Thank you for evaluating the ai-coustics SDK! This version offers a collection of neural networks for your evaluation. Our SDK is designed to seamlessly integrate speech enhancement technology into real-time audio systems. Please follow this integration guide, and feel free to contact us if you have any questions or requests.

Integration Guide

Linking

Linux

You can link it in your CMakeLists.txt file like this:

target_link_libraries(target-name PRIVATE ${CMAKE_SOURCE_DIR}/libaic.a)

If you want to reduce binary size, add the --gc-sections, this will only link the models you are actually using.

set(CMAKE_EXE_LINKER_FLAGS "-Wl,--gc-sections")

macOS

You can link it in your CMakeLists.txt file like this, while providing the following system libraries:

target_link_libraries(target-name
  PRIVATE
    ${CMAKE_SOURCE_DIR}/libaic.a
    "-framework Foundation")

If you want to reduce binary size, add the -dead_strip flag, this will only link the models you are actually using.

set(CMAKE_EXE_LINKER_FLAGS "-Wl,-dead_strip")

Windows

You can link it in your CMakeLists.txt file like this, while providing the following system libraries:

target_link_libraries(target-name
  PRIVATE
    ${CMAKE_SOURCE_DIR}/aic.lib
    WS2_32
    bcrypt
    ntdll
    userenv
    dxcore
    d3d12
    directml
    dxgi)

Logging

To receive logs from our SDK, you can call the function aic_log_init with a logging callback. The function should only be called once, or it will return an error! An example logging callback could look like this:

void log_callback(const char *message, aic::LogLevel level) {
  std::string log_level;
  switch (level) {
    case aic::LogLevel::Error:
      log_level = "[ERROR]";
      break;
    case aic::LogLevel::Trace:
      log_level = "[TRACE]";
      break;
    case aic::LogLevel::Debug:
      log_level = "[DEBUG]";
      break;
    case aic::LogLevel::Info:
      log_level = "[INFO ]";
      break;
    case aic::LogLevel::Warn:
      log_level = "[WARN ]";
      break;
  }
  std::cout << log_level << " " << message << std::endl;
}
 
aic_log_init(log_callback);

Create and Initialize the Runtime

You can create a new runtime by calling aic_new_{model} with the model you wish to use. You will find all possible options in the C++ header.

The aic_init function must be called before processing can become active, as the audio settings must be known.

struct AicModel *model = aic_new_model_l();

aic_init(model, NUM_CHANNELS, SAMPLE_RATE, NUM_FRAMES);

The following models are currently contained in the library. Please beware that depending on the sample rate and the number of frames of your audio callback, the latency can be higher.

Model S (aic_new_model_s)
- Native number of frames: 512
- Native sample rate: 48000 Hz
- Algorithmic delay: 5.33 ms (256 samples)
Model M (aic_new_model_m)
- Native number of frames: 512
- Native sample rate: 48000 Hz
- Algorithmic delay: 5.33 ms (256 samples)
Model L (aic_new_model_l)
- Native number of frames: 1024
- Native sample rate: 48000 Hz
- Algorithmic delay: 10.6 ms (512 samples)
Model Z (aic_new_model_z)
- A different approach of a model, makes fewer errors, but is slightly less effective
- Native number of frames: 768
- Native sample rate: 48000 Hz
- Algorithmic delay: 8 ms (384 samples)

Call the Process Function

In your audio thread, you can call the process function. You can select the interleaved or deinterleaved version, depending on how the data is stored in your buffer. The interleaved version is preferred, as it is more efficient.

Interleaved audio data is expected like this: [Ch1, Ch2, Ch1, Ch2, Ch1, Ch2]

auto *bufferInterleaved = new float[NUM_FRAMES * NUM_CHANNELS];

aic_process_interleaved(model, bufferInterleaved, NUM_CHANNELS, NUM_FRAMES);

Deinterleaved audio data is expected like this: [[Ch1, Ch1, Ch1], [Ch2, Ch2, Ch2]]

auto **buffer = new float *[NUM_CHANNELS];
for (int i = 0; i < NUM_CHANNELS; i++) {
  buffer[i] = new float[NUM_FRAMES];
}
aic_process_deinterleaved(model, buffer, NUM_CHANNELS, NUM_FRAMES);

Changing Parameters

All setters and getters to the parameters are thread-safe, so you don't have to call them on the audio thread.

Currently, we support the following parameters:

Enhancement Strength
- Sets the level of the enhancement from 0.0 to 1.0, essentially a dry/wet control.
- 0.0 is like a bypass; no enhancement will be active, while the algorithmic delay will stay the same.
- 1.0 is maximum enhancement, so you will hear the full processing of the model.
- Default: 1.0
Voice Gain
- Allows adjusting the level of the voice audio while keeping the enhancement strength.
- 1.0 does not change the gain of the voice.
- 2.0 boosts the gain by 6 dB.
- 0.5 lowers the gain by -6 dB.
- Default: 1.0

The processing looks like this:

(dry_signal * (1.0 - enhancement_strength)) + (wet_signal * enhancement_strength * voice_gain)

Example of setting and getting a parameter value:

// Set a value
aic_set_enhancement_strength(model, 0.5f);
 
// Get a value
float dry_wet = 1.0f;
aic_get_enhancement_strength(model, &dry_wet);

Finishing Up

Before closing the program, don't forget to free your model(s):

aic_free(model);

Enhancing Performance

To use the lowest latency and best performance possible, you should run the models with their native sample rate and number of frames in the audio buffer. These values can be retrieved with the following functions:

size_t num_frames;
aic_get_optimal_num_frames(model, &num_frames);
 
size_t sample_rate;
aic_get_optimal_sample_rate(model, &sample_rate);