“`html
- A British research team has introduced a new architectural approach to Vision Transformers (ViTs), focusing on an alternative backbone with core-periphery block-sparse attention structures. This design aims to mitigate the computational costs associated with dense self-attention, which scales as (2NC + N2) for C core tokens.
- The proposed method, named Elastic Attention Cores (EAC), employs nested dropout during training to enable dynamic adjustments of the number of cores at test time. This flexibility allows for a trade-off between inference cost and model performance across different resolutions, from 256×256 up to 1024×1024.
“`
### Takeaways:
– **Efficiency Gain**: The EAC architecture reduces computational complexity by scaling as (2NC + N2) instead of traditional dense self-attention, which scales as N2.
– **Scalability**: This innovation enables the model to maintain high accuracy across various resolutions without significant performance degradation.
– **Dynamic Adjustments**: The use of nested dropout allows for flexible control over the number of core tokens during inference, enabling efficient trade-offs between computation and performance.
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

![Elastic Attention Cores for Scalable Vision Transformers [R]](https://ai-maestro.online/wp-content/uploads/2026/05/elastic-attention-cores-for-scalable-vision-transformers-r--1024x576.jpg)