Abstract
This article examines the RoCR framework, a Retrieval-Augmented Generation (RAG) system optimized for edge deployment in latency-sensitive environments such as real-time search, product recommendation, and dynamic content generation in eCommerce platforms. RoCR leverages Compute-in-Memory (CiM) architectures to enable fast, energy-efficient inference at scale. At the core of the solution is the CiM-Retriever, a module optimized for performing max inner product search (MIPS). Two architectural variants of the generator are analyzed—decoder-only (RA-T) and encoder–decoder with kNN cross-attention—both demonstrating improved accuracy across various tasks while maintaining scalability to millions of documents. The aim of this study is to analyze the architectural characteristics of RAG systems enhanced with external memory modules, focusing on their applicability to eCommerce-scale tasks requiring sub-second response times and contextual relevance. The methodology is based on a review of recent scientific publications, enabling an in-depth exploration of the system-level design of RAG solutions leveraging memory augmentation. The insights from this analysis will be particularly relevant to AI practitioners and system architects working on scalable, high-performance retrieval systems for domains such as personalized retail, product search, and dynamic user engagement optimization. Moreover, the results are of interest to hardware-software co-design specialists and architects of scalable distributed platforms focused on integrating external memory modules in the context of cognitive and neural network applications.
Keywords
- Retrieval-Augmented Generation
- Compute-in-Memory
- Edge LLM
- noise-aware training
- contrastive learning
- external memory
- non-volatile crossbars.
References
- 1. Shelby L., da Silva R. V. M. A. Retrieval-augmented Generation: Empowering Landscape Architects with Data-driven Design //Journal of Digital Landscape Architecture. 2024. pp. 267-276.
- 2. Sarto S. et al. Towards retrieval-augmented architectures for image captioning //ACM Transactions on Multimedia Computing, Communications and Applications. – 2024. Vol. 20 (8). pp. 1-22.
- 3. Qin R. et al. Robust implementation of retrieval-augmented generation on edge-based computing-in-memory architectures //Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 2024. pp. 1-9.
- 4. Radford A. et al. Learning transferable visual models from natural language supervision //International conference on machine learning. PmLR, 2021. pp. 8748-8763.
- 5. Malkov Y. A., Yashunin D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs //IEEE transactions on pattern analysis and machine intelligence. 2018. Vol.42 (4). pp. 824-836.
- 6. Gao T., Yao X., Chen D. Simcse: Simple contrastive learning of sentence embeddings //arXiv preprint arXiv:2104.08821. 2021. pp. 1-9.
- 7. Cornia M., Baraldi L., Cucchiara R. Explaining transformer-based image captioning models: An empirical analysis //AI Communications. 2022. Vol. 35 (2). pp. 111-129.
- 8. Qiu Y. et al. Landscape Architecture Professional Knowledge Abstraction: Accessing, Applying and Disseminating //Land. 2023. Vol. 12 (11). pp.2061.
- 9. Zhu F. et al. Retrieving and reading: A comprehensive survey on open-domain question answering //arXiv preprint arXiv:2101.00774. 2021. pp. 1-8
- 10. Fruchard B. et al. User preference and performance using tagging and browsing for image labeling //Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 2023. pp.1-13.
- 11. Bloomreach Revolutionizes the Future of Ecommerce Search, Powered by NVIDIA NeMo [Electronic resource] Access mode: https://www.bloomreach.com/en?p=48921 (date of request: 05/14/2025).
- 12. Agent Co-Pilot: Wayfair's Gen-AI Assistant for Digital Sales Agents [Electronic resource] Access mode: https://www.aboutwayfair.com/careers/tech-blog/agent-co-pilot-wayfairs-gen-ai-assistant-for-digital-sales-agents (date of request: 05/14/2025).
- 13. Klevu AI Ecommerce Search & Discovery [Electronic resource] Access mode: https://www.klevu.com/ (date of request: 05/14/2025).