Speaker
Description
Vision transformers (ViTs) have emerged as powerful tools in computer vision. However, ViTs can be resource-intensive because of their reliance on the self-attention mechanism that involves $\mathcal{O}(n^2)$ complexity, where $n$ is the sequence length. These challenges become even more pronounced in quantum computing, where handling large-scale models is constrained by limited qubit resources, inefficient data encoding, and the lack of native support for operations like softmax, necessitating alternative approaches that are better suited to quantum architectures. Recent advances have introduced ViT architectures in which the quadratic self-attention mechanism is replaced with FFT-based spectral filtering for improved efficiency. Building on this foundation, we propose QFT-ViT, a quantum-compatible extension of the FFT-based ViT, aims to address the computational constraints of quantum hardware. By leveraging the structural similarity between FFT and the quantum Fourier transform (QFT), QFT-ViT enables efficient global token mixing in quasi-linearithmic time $\mathcal{O}(n \log n)$, avoiding operations such as softmax and dot products that are not natively supported in quantum circuits. The model adaptively filters frequency components in the spectral domain to capture global context with reduced resource overhead. Experimental results on benchmark datasets demonstrate that QFT-ViT achieves competitive accuracy and offers a scalable solution for applying transformer models in quantum machine learning.
Broad physics domain | Quantum Machine Learning |
---|---|
AI/ML technique(s) to be presented | Quantum ViTs |