Thank you for evaluating the ai-coustics SDK! This version offers a collection of neural networks for your evaluation. Our SDK is designed to seamlessly integrate speech enhancement technology into real-time audio systems. Please follow this integration guide, and feel free to contact us if you have any questions or requests.
See the rendered documentation on our website: https://sdk.ai-coustics.com/.
Integration Guide
Contents
This SDK includes:
- Static Library:
libaic.a
or aic.lib
- Dynamic Library:
libaic.so
or libaic.dylib
or aic.dll
- C++ Header File:
aic.h
These libraries are available for your selected target platform and contain everything required to run our models. No additional files are needed.
Linking
Linux
You can link it in your CMakeLists.txt file like this:
target_link_libraries(target-name PRIVATE ${CMAKE_SOURCE_DIR}/libaic.a)
If you want to reduce binary size, add the --gc-sections
, this will only link the models you are actually using.
set(CMAKE_EXE_LINKER_FLAGS "-Wl,--gc-sections")
macOS
You can link it in your CMakeLists.txt file like this, while providing the following system libraries:
target_link_libraries(target-name
PRIVATE
${CMAKE_SOURCE_DIR}/libaic.a
"-framework Foundation")
If you want to reduce binary size, add the -dead_strip
flag, this will only link the models you are actually using.
set(CMAKE_EXE_LINKER_FLAGS "-Wl,-dead_strip")
Windows
You can link it in your CMakeLists.txt file like this, while providing the following system libraries:
target_link_libraries(target-name
PRIVATE
${CMAKE_SOURCE_DIR}/aic.lib
WS2_32
bcrypt
ntdll
userenv
dxcore
d3d12
directml
dxgi)
Logging
To receive logs from our SDK, you can call the function aic_log_init
with a logging callback. The function should only be called once, or it will return an error! An example logging callback could look like this:
std::string log_level;
switch (level) {
case aic::LogLevel::Error:
log_level = "[ERROR]";
break;
case aic::LogLevel::Trace:
log_level = "[TRACE]";
break;
case aic::LogLevel::Debug:
log_level = "[DEBUG]";
break;
case aic::LogLevel::Info:
log_level = "[INFO]";
break;
case aic::LogLevel::Warn:
log_level = "[WARN]";
break;
}
std::cout << log_level << " " << message << std::endl;
}
uint32_t aic_log_init(LogCallback log_callback)
Initializes the SDK logger.
LogLevel
Definition aic.h:48
Create and Initialize the Runtime
You can create a new runtime by calling aic_new_{model}
with the model you wish to use. You will find all possible options in the C++ header.
The aic_init
function must be called before processing can become active, as the audio settings must be known.
struct AicModel *model = aic_new_model_l();
aic_init(model, NUM_CHANNELS, SAMPLE_RATE, NUM_FRAMES);
The following models are currently contained in the library. Please beware that depending on the samplerate and the number of frames of your audio callback, the latency can be higher.
- Model S (
aic_new_model_s
)
- Native Number of Frames: 512
- Native Samplerate: 48000 Hz
- Algorithmic Delay: 5.33 ms (256 samples)
- Model M (
aic_new_model_m
)
- Native Number of Frames: 512
- Native Samplerate: 48000 Hz
- Algorithmic Delay: 5.33 ms (256 samples)
- Model L (
aic_new_model_l
)
- Native Number of Frames: 1024
- Native Samplerate: 48000 Hz
- Algorithmic Delay: 10.6 ms (512 samples)
- Model Z (
aic_new_model_z
)
- A different approach of a model, makes fewer errors, but is slightly less effective
- Native Number of Frames: 768
- Native Samplerate: 48000 Hz
- Algorithmic Delay: 8 ms (384 samples)
Call the Process Function
In your audio thread, you can call the process function. You can select the interleaved or deinterleaved version, depending on how the data is stored in your buffer. The interleaved version is preferred, as it is more efficient.
- Interleaved audio data is expected like this:
[Ch1, Ch2, Ch1, Ch2, Ch1, Ch2]
auto *bufferInterleaved = new float[NUM_FRAMES * NUM_CHANNELS];
aic_process_interleaved(model, bufferInterleaved, NUM_CHANNELS, NUM_FRAMES);
- Deinterleaved audio data is expected like this:
[[Ch1, Ch1, Ch1], [Ch2, Ch2, Ch2]]
auto **buffer = new float *[NUM_CHANNELS];
for (int i = 0; i < NUM_CHANNELS; i++) {
buffer[i] = new float[NUM_FRAMES];
}
aic_process_deinterleaved(model, buffer, NUM_CHANNELS, NUM_FRAMES);
Changing Parameters
All setters and getters to the parameters are thread-safe, so you don't have to call them on the audio thread.
Currently, we support the following parameters:
- Enhancement Strength
- Sets the level of the enhancement from 0.0 to 1.0, essentially a dry/wet control.
- 0.0 is like a bypass; no enhancement will be active, while the algorithmic delay will stay the same.
- 1.0 is maximum enhancement, so you will hear the full processing of the model.
- Default: 1.0
- Voice Gain
- Allows adjusting the level of the voice audio while keeping the enhancement strength.
- 1.0 does not change the gain of the voice.
- 2.0 boosts the gain by 6 dB.
- 0.5 lowers the gain by -6 dB.
- Default: 1.0
The processing looks like this:
(dry_signal * (1.0 - enhancement_strength)) + (wet_signal * enhancement_strength * voice_gain)
Example of setting and getting a parameter value:
aic_set_enhancement_strength(model, 0.5f);
float dry_wet = 1.0f;
aic_get_enhancement_strength(model, &dry_wet);
Finishing Up
Before closing the program, don't forget to free your model(s):
Enhancing Performance
To use the lowest latency and best performance possible, you should run the models with their native sample rate and number of frames in the audio buffer. These values can be retrieved with the following functions:
size_t num_frames;
aic_get_optimal_num_frames(model, &num_frames);
size_t sample_rate;
aic_get_optimal_sample_rate(model, &sample_rate);