Adding E4B audio encoder to larger models

“`html

A Reddit user proposed a method for adding the E4B audio encoder to larger models using PyTorch. They outlined steps such as extracting the E4B or E2B encoder, creating a new linear projection layer, and training only this new layer while freezing other components.
This approach aims to improve compatibility between text-based models and audio data without significantly impacting existing model performance. The goal is to leverage the benefits of both modalities efficiently.

“`