Categories
Artificial Intelligence Computer Science

Retrieval Augmented Generation (RAG)

There is a trending term that is floating around in the Artificial Intelligence (AI) field, i.e., “RAG”. So, to satisfy the curiosity, let’s get to know what RAG is. Before that, let us have a brief idea of Generative AI.

Generative AI is an Artificial Intelligence system capable of creating new and original content in the form of text, code, images, audio, and video by learning patterns from large datasets or Large Language Models (LLMs) and analyzing and applying them to produce contextually relevant outputs.

How does it work?

Training: Deep Learning models are trained on large datasets to learn patterns and relationships.

Tuning: Fine-tuning the AI model with LoRA/QLoRA ranking techniques or Reinforcement Learning from Human Feedback (RLHF).

Generation: The AI responds to user queries and prompts by generating text, images, audio, or video based on up-to-date, factual data.

The generative models use “Transformers” to predict the next tokens based on context and produce logical text.

Below is an example of a code snippet that uses transformers to generate the response to a user query:

transformer.py
Python
from transformers import pipeline
# Load a pre-trained text generation pipeline
generator = pipeline("text-generation", model="gpt5")
# Generate text based on a prompt
prompt = "In the future, AI will"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
Types of models:

Transformers: Text/code generation based on LLMs and uses self-attention for context capture.

Diffusion models: Generate high-quality images/audio by iterative denoising.

GANs and VAEs: Image synthesis, style transfer, data augmentation

Encoder-Decoder: Translation, Summarization, and Multimodal tasks.

Generative AI Applications:

Text-generation (chatbots, summarization, and code generation), Image-generation (Art, medical images), Audio-generation (voice synthesis, music creation), Video-generation (animation, simulation).

Limitations of Generative AI
  • Generative AI models are prone to hallucinations and thus are less accurate.
  • Generative AI is not real-time. It is limited to its training cut-off, i.e., it does not access updated information until it is retrained.
  • It lacks access to the internal and proprietary data (For example, company reports, release notes, etc.).
  • It works with Large models and datasets. So it is resource-intensive with respect to compute and storage. So, fine-tuning becomes difficult.

These limitations make the urge to think about an improved methodology and architecture. Here is where “RAG” comes into the picture.

Retrieval Augmented Generation (RAG) is a technique that adds relevant context to AI, resulting in improved and accurate responses.

RAG Architecture
Generative AI vs RAG Comparison:
AspectGenAIRAG
AccuracyProne to hallucinationsGrounded in retrieved sources
Knowledge FreshnessStatic, limited to training cutoffDynamic, can access real-time data
Domain AdaptabilityWeak with proprietary/internal dataStrong, integrates custom datasets
Resource NeedsHigh (training/fine-tuning)Lower (retrieval pipeline setup)
CreativityStrong (novel, diverse outputs)Moderate (depends on retrieved context)
TraceabilityLimited (no source attribution)High (answers linked to documents)

Knowledge Index – An external knowledge source is a foundation for a RAG system. The knowledge source can be any domain-specific custom dataset, documents, databases, APIs, or structured tables.

Document Loader – The document loader standardizes and normalizes the documents from knowledge index data sources such as local files, web pages, cloud storage, or databases. The text splitter extracts the text, splits the text into chunks, and enriches it with metadata for the embedding phase.

Embedding – The text chunks are converted into numerical vectors using embedding models and capturing semantic meaning.

Vector Store – The embeddings are stored in a vector database or vector store. The vector database enables fast similarity searches and retrieves relevant context based on the user’s query.

Retriever – The query encoder converts the user input into a vector representation. The retriever then searches the vector database using semantic similarity or other search techniques to fetch the most relevant chunks of information.

Ranker – The ranker will carry out duplication, relevance ranking, and context enrichment on the vector embeddings. The retrieved and ranked chunks are then combined with the user query to generate a better and more accurate response.

Generator – The generator is the large language model (LLM) that synthesizes the retrieved context and user query to produce a grounded response. The modern RAG systems may use generators for query rewriting, self-evaluation, and corrective re-retrieval.

Output response – Output response is a formatted final response that is sent to the user.

Updator (Optional) – Some RAG systems use an updator to refresh and re-embed the data to ensure the knowledge base remains current and updated. The updator can be equipped with an agentic framework for automated refreshment of knowledge base.

RAG stands for Retrieval Augmented Generation.

  • Retrieval – Find relevant information.
  • Augmentation – Add data to AI’s knowledge.
  • Generation – Generate a better and more accurate response.

The purpose of RAG is to add relevant context to AI and generate an accurate response.

Categories
Computer Science

Types of AI

Artificial Intelligence (AI) is a trending technology around the world. Let’s understand its types.

Artificial Intelligence (AI) is the capability of a computational system to pursue human intelligence, like learning, reasoning, perception, problem solving, and decision making.

Narrow AI (Weak AI): Narrow AI is designed and trained on a specific task or a narrow range of tasks. They perform their designated tasks but cannot generalize tasks. For example, Voice Assistants (Alexa, Siri), Face Recognition Systems, Recommendation systems like Netflix, etc.

General AI (General AI): General AI refers to machines that can perform any intellectual task like humans, with the ability to learn and adapt across tasks, though it remains theoretical and still not fully developed. For example, Autonomous Robots, AI diagnostics, Autonomous driving, cooking, and Coding.

Super AI (Super Intelligent AI): Super AI is a theoretical concept where AI surpasses human intelligence. They can make decisions of their own and solve problems on their own. For example, outperforms humans in all fields, including creative and Decision-making AI, raises ethical concerns, and controls.

This classification is based on how AI handles data, memory, and decision-making in different scenarios.

1. Reactive Machines

Reactive machines purely operate based on the present data and do not store any previous experiences or learn from past actions. These systems respond to specific inputs with fixed outputs and are unable to adapt. Examples: AI Chess Bots, Pattern Recognition AI.

2. Limited Memory in AI

Limited Memory AI practices past data to make better decisions and predictions, but lacks long-term memory, and most modern AI applications belong to this type. Examples: Self-driving cars, Chatbots.

3. Theory of Mind

Theory of Mind AI tries to understand human emotions, beliefs, and intentions, enabling more sophisticated and responsive interactions. Examples: Human-Robot interface detecting emotions, Collaborative Robots in Healthcare.

4. Self-Awareness AI

Self-Aware AI is an advanced AI that holds consciousness, enabling it to understand emotions and have self-awareness like humans. Examples: Fully autonomous moral decision-making systems, environment-sensing robots.

This classification is generally based on what the AI can do in real-world systems.

1. Generative AI (Gen AI)

Gen AI creates new content like text, images, audio, or code by learning patterns from data. It uses deep learning models like transformers. Example: Chatbots generating answers, AI image generators, and code generation tools.

2. Agentic AI

Agentic AI acts autonomously to achieve goals, making choices and executing tasks without constant human input. It can plan, execute, and adapt. Example: AI that books tickets after comparing prices, Task automation agents, and multi-step problem-solving systems.

3. Natural Language Processing (NLP)

NLP allows machines to understand, interpret, and communicate using human language. Works with text and speech. Example: Chatbots, Language translation, Sentiment analysis.

4. Computer Vision

Computer Vision allows machines to analyse, recognize, and interpret images and videos. It detects objects, faces, and patterns from visuals. Example: Face recognition, medical image analysis, and self-driving car vision systems.

Categories
Computer Science

Understanding Computer Architecture

The Basics – You Must Know

A computer is an electronic machine or a programmable device that can store, retrieve and process data.

Computer ArchitectureThe von Neumann Architecture

The CPU is the brain of the computer, consisting of:

Arithmetic Logic Unit (ALU): Performs arithmetic and logical operations.

Control Unit (CU): Directs the flow of data and instructions.

Registers: Small, high-speed storage for temporary data.

Example: In an Intel i7 processor, the ALU handles integer and floating-point calculations, while the CU manages instruction sequencing.

Memory stores data and instructions for processing. It is organized in a hierarchy:

Primary Memory: RAM (volatile) and cache (fast access).

Secondary Memory: Hard drives, SSDs (non-volatile).

Virtual Memory: Extends RAM using disk space.

Example: A system with 16GB RAM and 512GB SSD uses paging to manage virtual memory.

Handles communication between the CPU and external devices.

I/O Interfaces: Memory-mapped or isolated I/O.

Interrupts & DMA: Efficient data transfer without CPU intervention.

Example: A keyboard sends interrupts to the CPU, while a disk uses DMA for bulk data transfer.

Buses are communication pathways for data, addresses, and control signals.

Data Bus: Transfers actual data.

Address Bus: Specifies memory locations.

Control Bus: Manages read/write operations.

Example: PCIe bus connects GPUs to the CPU for high-speed data exchange.

Improves performance by overlapping instruction execution stages.

Instruction Level Parallelism (ILP) and Branch Prediction reduce delays.

Example: Modern CPUs like ARM Cortex-A use 5-stage pipelines to execute multiple instructions simultaneously.

Defines the CPU’s commands, addressing modes, and data formats.

Reduced Instruction Set Computer (RISC): Simple, fast instructions (e.g., ARM).

Complex Instruction Set Computer (CISC): Complex instructions (e.g., x86).

Example: ARM ISA powers most smartphones for energy efficiency.