Retrieval-Augmented Generation (RAG) App

What is Retrieval-Augmented Generation (RAG) is a technique that enhances the accuracy and reliability of generative AI models by incorporating facts from external knowledge bases. It supplements the internal representation of information in large language models (LLMs) by grounding them on external sources.

Synopsis - I had the idea to create this RAG strictly as a learning exercise. Easier said than done, right? There were a lot of challenges beyond just getting the proof of concept on my desktop to work, which was simply the initial task at hand. Putting this in the cloud and exposing this to the world took things to the next level. It is not as simple as deploying a service with an endpoint. To make this work, there had to be an exposed endpoint (http's port 80) along with the supporting AI/ML environment (python scripts running OpenAI and Pinecone libraries) to generate an English like response; chat model gtp-3.5-turbo with the new RAG data. I decided on an Ubuntu VM to make this all happen. Another challenge was setting up the Pinecone DB as an AWS serverless instance; a fairly cost effective solution rather than setting up a hosted database.

Behind the scenes

  • OpenAI
    • Vector Generation
    • Chat Model = gpt-3.5-turbo
  • Pinecone Vector Database
    • Vector Storage
  • Cloud Environment (serving up this app)
    • Ubuntu Virtual Machine - running in Azure
      • Supporting Libraries et.al. - Pinecone, OpenAI, Supervisor, pip, .Net Core (SDK & Runtime), Nginx reverse proxy, Python3

New topics covered as of 9/30/2025 (what RAG adds to the gpt-3.5-turbo model)

  • Shooting in North Carolina Marina
  • China Mexican Pecan Imports
  • China youth unemployment
  • Healthier school lunches movement
  • Next steps, US-Led Gaza peace plan
  • National Guard deployment in Portland
  • Government shutdown agreement failed
  • SNAP food restrictions considered

Question
Response

An unhandled error has occurred. Reload 🗙