Elastic Attention Cores for Scalable Vision Transformers [R]

“`html

A British research team has introduced a new architectural approach to Vision Transformers (ViTs), focusing on an alternative backbone with core-periphery block-sparse attention structures. This design aims to mitigate the computational costs associated with dense self-attention, which scales as (2NC + N²) for C core tokens.
The proposed method, named Elastic Attention Cores (EAC), employs nested dropout during training to enable dynamic adjustments of the number of cores at test time. This flexibility allows for a trade-off between inference cost and model performance across different resolutions, from 256×256 up to 1024×1024.

“`

### Takeaways:
– **Efficiency Gain**: The EAC architecture reduces computational complexity by scaling as (2NC + N²) instead of traditional dense self-attention, which scales as N².
– **Scalability**: This innovation enables the model to maintain high accuracy across various resolutions without significant performance degradation.
– **Dynamic Adjustments**: The use of nested dropout allows for flexible control over the number of core tokens during inference, enabling efficient trade-offs between computation and performance.

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Elastic Attention Cores for Scalable Vision Transformers [R]

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

TextGen is now a…

AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Adaption aims big with…