How I used RAG, AWS Bedrock, dltHub and Pinecone to Explore UK Property Data
I recently started exploring the idea of building an AI agent focused on UK government property datasets. This stemmed from an evening gathering with a few friends, where we casually discussed UK property prices, property types, and housing shortages.
While creating Power BI visuals based on datasets was straightforward for me, I wanted to experiment with building something even simpler — something that wouldn’t require me to create or update visuals manually based on queries.
Foundation models were capable of answering many queries, but I wanted to develop a solution specifically tailored to UK property pricing and property statistics, addressing the kinds of questions my friends and I often had.
This led me to explore RAG (Retrieval-Augmented Generation) architecture, creating a knowledge base from the last 10 years of data to enable querying of these specific datasets.
I experimented with Amazon Web Services (AWS) Bedrock, Streamlit, DuckDB, dltHub, and Pinecone (the free tier was sufficient for my needs). With just a few code adjustments, I was able to generate visuals (primarily bar charts) and provide explanations based on user queries. It was fascinating to see the results!
Key Learnings:
(1) Context-Aware Data is Essential: Simply feeding raw data into your GenAI app won’t make it efficient. You need to work with contextually relevant data, which involves understanding the entire data engineering process: from ingestion to transformation, ensuring data quality, and building observability. These steps are crucial to developing true GenAI RAG-based applications.
(2) Semantic Data is Key: Transformed data, or what I call “semantic data” (data aligned with the context of a specific business problem), is critical for success.
(3) Understanding Models and Techniques: A solid understanding of embedding models, tokenization, and the capabilities of various foundation models is essential.
This also means, generating data visuals are getting automated and I am quite optimistic that developing complex data visuals will be made so easy using GenAI in the coming future.