Retrieval-Augmented Generation (RAG) App

What is Retrieval-Augmented Generation (RAG) is a technique that enhances the accuracy and reliability of generative AI models by incorporating facts from external knowledge bases. It supplements the internal representation of information in large language models (LLMs) by grounding them on external sources.

Synopsis - I had the idea to create this RAG strictly as a learning exercise. Easier said than done, right? There were a lot of challenges beyond just getting the proof of concept on my desktop to work, which was simply the initial task at hand. Putting this in the cloud and exposing this to the world took things to the next level. It is not as simple as deploying a service with an endpoint. To make this work, there had to be an exposed endpoint (http's port 80) along with the supporting AI/ML environment (python scripts running OpenAI and Pinecone libraries) to generate an English like response; chat model gtp-3.5-turbo with the new RAG data. I decided on an Ubuntu VM to make this all happen. Another challenge was setting up the Pinecone DB as an AWS serverless instance; a fairly cost effective solution rather than setting up a hosted database.

Behind the scenes

OpenAI
- Vector Generation
- Chat Model = gpt-3.5-turbo
Pinecone Vector Database
- Vector Storage
Cloud Environment (serving up this app)
- Ubuntu Virtual Machine - running in Azure
  - Supporting Libraries et.al. - Pinecone, OpenAI, Supervisor, pip, .Net Core (SDK & Runtime), Nginx reverse proxy, Python3

New topics covered (what RAG adds to the gpt-3.5-turbo model)

Crime in large cities. Joe Biden drops out of Presidential race
AI, have we hit a bottleneck? Joe Biden drops out of Presidential race
Inflation revisited. Persecution of Falun Gong. Supreme Court reform.
Various olymic events. Fires out West. Elections around the world
Global computer outages. JD Vance, What he brings to the Trump ticket

Question

Response

An unhandled error has occurred. Reload 🗙