Categories
Artificial Intelligence Computer Science

Retrieval Augmented Generation (RAG)

There is a trending term that is floating around in the Artificial Intelligence (AI) field, i.e., “RAG”. So, to satisfy the curiosity, let’s get to know what RAG is. Before that, let us have a brief idea of Generative AI.

Generative AI is an Artificial Intelligence system capable of creating new and original content in the form of text, code, images, audio, and video by learning patterns from large datasets or Large Language Models (LLMs) and analyzing and applying them to produce contextually relevant outputs.

How does it work?

Training: Deep Learning models are trained on large datasets to learn patterns and relationships.

Tuning: Fine-tuning the AI model with LoRA/QLoRA ranking techniques or Reinforcement Learning from Human Feedback (RLHF).

Generation: The AI responds to user queries and prompts by generating text, images, audio, or video based on up-to-date, factual data.

The generative models use “Transformers” to predict the next tokens based on context and produce logical text.

Below is an example of a code snippet that uses transformers to generate the response to a user query:

transformer.py
Python
from transformers import pipeline
# Load a pre-trained text generation pipeline
generator = pipeline("text-generation", model="gpt5")
# Generate text based on a prompt
prompt = "In the future, AI will"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
Types of models:

Transformers: Text/code generation based on LLMs and uses self-attention for context capture.

Diffusion models: Generate high-quality images/audio by iterative denoising.

GANs and VAEs: Image synthesis, style transfer, data augmentation

Encoder-Decoder: Translation, Summarization, and Multimodal tasks.

Generative AI Applications:

Text-generation (chatbots, summarization, and code generation), Image-generation (Art, medical images), Audio-generation (voice synthesis, music creation), Video-generation (animation, simulation).

Limitations of Generative AI
  • Generative AI models are prone to hallucinations and thus are less accurate.
  • Generative AI is not real-time. It is limited to its training cut-off, i.e., it does not access updated information until it is retrained.
  • It lacks access to the internal and proprietary data (For example, company reports, release notes, etc.).
  • It works with Large models and datasets. So it is resource-intensive with respect to compute and storage. So, fine-tuning becomes difficult.

These limitations make the urge to think about an improved methodology and architecture. Here is where “RAG” comes into the picture.

Retrieval Augmented Generation (RAG) is a technique that adds relevant context to AI, resulting in improved and accurate responses.

RAG Architecture
Generative AI vs RAG Comparison:
AspectGenAIRAG
AccuracyProne to hallucinationsGrounded in retrieved sources
Knowledge FreshnessStatic, limited to training cutoffDynamic, can access real-time data
Domain AdaptabilityWeak with proprietary/internal dataStrong, integrates custom datasets
Resource NeedsHigh (training/fine-tuning)Lower (retrieval pipeline setup)
CreativityStrong (novel, diverse outputs)Moderate (depends on retrieved context)
TraceabilityLimited (no source attribution)High (answers linked to documents)

Knowledge Index – An external knowledge source is a foundation for a RAG system. The knowledge source can be any domain-specific custom dataset, documents, databases, APIs, or structured tables.

Document Loader – The document loader standardizes and normalizes the documents from knowledge index data sources such as local files, web pages, cloud storage, or databases. The text splitter extracts the text, splits the text into chunks, and enriches it with metadata for the embedding phase.

Embedding – The text chunks are converted into numerical vectors using embedding models and capturing semantic meaning.

Vector Store – The embeddings are stored in a vector database or vector store. The vector database enables fast similarity searches and retrieves relevant context based on the user’s query.

Retriever – The query encoder converts the user input into a vector representation. The retriever then searches the vector database using semantic similarity or other search techniques to fetch the most relevant chunks of information.

Ranker – The ranker will carry out duplication, relevance ranking, and context enrichment on the vector embeddings. The retrieved and ranked chunks are then combined with the user query to generate a better and more accurate response.

Generator – The generator is the large language model (LLM) that synthesizes the retrieved context and user query to produce a grounded response. The modern RAG systems may use generators for query rewriting, self-evaluation, and corrective re-retrieval.

Output response – Output response is a formatted final response that is sent to the user.

Updator (Optional) – Some RAG systems use an updator to refresh and re-embed the data to ensure the knowledge base remains current and updated. The updator can be equipped with an agentic framework for automated refreshment of knowledge base.

RAG stands for Retrieval Augmented Generation.

  • Retrieval – Find relevant information.
  • Augmentation – Add data to AI’s knowledge.
  • Generation – Generate a better and more accurate response.

The purpose of RAG is to add relevant context to AI and generate an accurate response.

Categories
Computer Science

Solving Problem: Count Elements Greater Than the Previous Average

Given an array of positive integers, return the number of elements that are strictly greater than the average of all previous elements. Skip the first element.

Example

Input

responseTimes = [100, 200, 150,300]

Output

2
responsetimes_regressions.py
Python
def countResponseTimeRegressions(responseTimes):
# Write your code here
count = 0
for i in range(1, len(responseTimes)):
if responseTimes[i] > sum(responseTimes[:i])/i:
count += 1
return count
if __name__ == '__main__':
responseTimes_count = int(input().strip())
responseTimes = []
for _ in range(responseTimes_count):
responseTimes_item = int(input().strip())
responseTimes.append(responseTimes_item)
result = countResponseTimeRegressions(responseTimes)
print(result)

Compiler Message

Success

Input (stdin)

1

100

Output (stdout)

0

Expected Output

0

Count Elements Greater Than Previous Average | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Sum of Multiples

The sum of multiples of k below n is:

Formula:

Sk=km(m+1)2

where m=n1k.

Find the sum of multiples of 3 or 5 below N.

For example:

If we list all the natural numbers below that are multiples of  3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23.

Input Format

The first line contains  T, which denotes the number of test cases. This is followed by T lines, each containing an integer, N.

Constraints

  • 1 <= T <= 10^5
  • 1 <= N <= 10^9

Output Format

For each test case, print an integer denoting the sum of all the multiples of 3 or 5 below N.

Sample Input 0

2
10
100

Sample Output 0

23
2318

Explanation 0

For if we list all the natural numbers below 10 that are multiples of  3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23.

Similarly, for N=100, we get 2318.

sum_of_multiples.py
Python
#!/bin/python3
import sys
t = int(input().strip())
if 1 <= t <= pow(10,5):
def sum_of_multiples(k, limit):
m = (limit - 1) // k
return k * m * (m+1) // 2
for a0 in range(t):
n = int(input().strip())
s3 = sum_of_multiples(3, n)
s5 = sum_of_multiples(5, n)
s15 = sum_of_multiples(15, n)
total = s3 + s5 - s15
print(total)

Input (stdin)

  • 2
  • 10
  • 100

Your Output (stdout)

  • 23
  • 2318

Expected Output

  • 23
  • 2318
sum_of_multiples1.py
Python
import sys
t = int(input().strip())
if 1 <= t <= pow(10,5):
for a0 in range(t):
n = int(input().strip())
total = 0
# print("n = ", n)
if 1 <= n <= pow(10,9):
if n == 1:
total = n
li = [i for i in range(1, n) if (i%3 == 0 or i%5 == 0)]
total = sum(li)
print(total)

The above code has O(n) complexity; however, it fails under memory constraints. When used with the arithmetic formula, the time complexity becomes O(1).

Contests | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Incorrect Regex

(Python)

You are given a string S.
Your task is to check whether S is a valid regex.

Input Format

The first line contains an integer T, the number of test cases.
The next T line contains the string S.

Constraints

0 < T < 100

Output Format

Print “True” or “False” for each test case without quotes.

Sample Input

2
.*\+
.*+

Sample Output

True
False

Explanation

.*\+ : Valid regex.
.*+: Has the error multiple repeat. Hence, it is invalid.

validate_regex.py
Python
import re
# validate regex
def is_valid_regex(T, patterns):
for i in range(0, T):
S = str(raw_input())
try:
re.compile(S)
print(True)
except:
print(False)
if __name__ == "__main__":
# User input
T = int(input())
patterns = [r".*\+"]
output = is_valid_regex(T, patterns)

Incorrect Regex | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Combinations

(Python)

You are given a string S.
Your task is to print all possible combinations, up to size, of the string in lexicographically sorted order.

A single line containing the string and integer value separated by a space.

0 < k <= len(S)


The string contains only UPPERCASE characters.

Print the different combinations of the string on separate lines.

HACK 2

Sample Output

A
C
H
K
AC
AH
AK
CH
CK
HK
combinations.py
Python
from itertools import combinations
# Input from user
inp = input().split()
S = inp[0]
k = int(inp[1])
li1 = []
# Create a list with combinations
for i in range(1, k+1):
li1.extend(list(combinations(S, i)))
# Sort the list lexicographically
for i in range(0, len(li1)):
li1[i] = str(''.join(sorted(list(li1[i]))))
li1 = sorted(li1, key=lambda s: (len(s), s.lower())) # sort the list alphabetically ascending
# print the string
for i in range(0, len(li1)):
print(li1[i])

itertools.combinations() | HackerRank

Categories
Computer Science

BFS and DFS

(DSA – Tree Traversal)

BFSDFS
(Breadth First Search)(Depth First Search)
ParameterBFSDFS
Full FormBreadth First SearchDepth First Search
DefinitionBFS (Breadth First Search) is a graph traversal concept where nodes are traversed on same level before moving to next level.DFS (Depth First Search) is a graph traversal concept where nodes are traversed to depth until a node is reached with no unvisited neighbours.
Data StructureQueueStack
ConceptTree builds level by level.Tree builds sub-tree by sub-tree.
ApproachFirst In First Out (FIFO)Last In First Out (LIFO)
SourceBetter when target is closer to given source.Better when target is farther from given source.
ApplicationsBipartite Graphs, Shortest Path etc.Acyclic Graphs, Find Strongly Connected Components etc.
Python
#DFS and BFS
from collections import deque
# Define the graph
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B'],
'E': ['B', 'F'],
'F': ['C', 'E']
}
# BFS - Breadth First Search
def bfs(graph, start):
visited = {start}
queue = deque([start]) # FIFO: First In First Out
while queue:
node = queue.popleft() # return leftmost element
print(node, end=' ')
for neighbour in graph[node]:
if neighbour not in visited:
visited.add(neighbour)
queue.append(neighbour)
# DFS - Depth First Search
def dfs(graph, start):
visited = set()
stack = [start] # LIFO: Last In First Out
while stack:
node = stack.pop() # return rightmost element
if node not in visited:
print(node, end= ' ')
visited.add(node)
stack.extend(reversed(graph[node]))
print("BFS Traversal:")
bfs(graph, 'A') # Output: A B C D E F
print("\nDFS Traversal:")
dfs(graph, 'A') # Output: A B D E F C

V = number of vertices (nodes)

E = number of edges

Time and Space Complexity:

  • BFS: Complete, finds the shortest path in unweighted graphs, O(V+E) time, O(V) space due to the queue.
  • DFS: Not always complete without safeguards, may not find the shortest path, O(V+E) time, space O(h) for recursion depth or O(V) for iterative stack.
  • BFS is slower and requires more memory space than DFS.
  • Choice depends on graph size, depth, and whether shortest path or memory efficiency is the priority.

Categories
Computer Science

Solving Problem: Leap Year

(If-elif-else Control Flow)

Check if a year is a leap year.

 As per the Gregorian calendar (in 1582), the following rule is used to determine the kind of year:

  • If the year number isn’t divisible by four, it’s a common year.
  • Otherwise, if the year number isn’t divisible by 100, it’s a leap year.
  • Otherwise, if the year number isn’t divisible by 400, it’s a common year.
  • Otherwise, it’s a leap year.

The task is to determine if the given year is a leap year or not.

Output messages:

“Leap year.” if the year is a leap year.

“Common year.” if the year is common.

“Not within the Gregorian era.” If the year falls out of the Gregorian era.

Input: 2000

Output: Leap year.

Input: 1999

Output: Common year.

Input: 1996

Output: Leap year.

Input: 1500

Output: Not within the Gregorian era.

Python
year = int(input("Enter a year: "))
#
# Write your code here.
#
if year > 1582:
if (year%4 == 0 and year%100 != 100) or year%400 == 400:
print("Leap year.")
else:
print("Common year.")
else:
print("Not within the Gregorian era.")

Output:

Categories
Computer Science

Solving Problem: Symmetric Difference

(Sets)

Given 2 sets of integers M and N, print their symmetric difference in ascending order. The term symmetric difference indicates those values that exist in either but do not exist in both.

Python
# Enter your code here. Read input from STDIN. Print output to STDOUT
M = int(input())
li_M = list(map(int, input().split()))
N = int(input())
li_N = list(map(int, input().split()))
sM = set(li_M)
sN = set(li_N)
sD = sM.symmetric_difference(sN)
sorted_S = sorted(sD)
for i in sorted_S:
print(i)

https://www.hackerrank.com/challenges/symmetric-difference/problem

Categories
Computer Science

Solving Problem: Trigonometry

(Python)

 ABC is a right triangle,  90° at  B.
Therefore,  ∠ ABC = 90.

Point  M is the midpoint of the hypotenuse.

You are given the lengths AB and BC.
Your task is to find MBC (angle θ, as shown in the figure) in degrees.

Input Format

The first line contains the length of side AB.
The second line contains the length of the side BC.

Constraints

  • 0 < AB <= 100
  • 0 < BC <= 100
  • Lengths  AB and  BC are natural numbers.

Output Format

Output  MBC in degrees.

Note: Round the angle to the nearest integer.

Examples:
If the angle is 56.5000001°, then output 57°.
If the angle is 56.5000000°, then output 57°.
If the angle is 56.4999999°, then output 56°.

0° < θ° < 90°

Sample Input

10
10

Sample Output

45°
Python
# Enter your code here. Read input from STDIN. Print output to STDOUT
import math
AB = int(input())
BC = int(input())
Angle_ABC = 90
AC = pow(AB^2+BC^2, 1/2)
theta = round(math.degrees(math.atan(AB/BC)))
#print(f"{theta}\u00B0")
print(str(theta)+chr(176))

Note:

a + b + c = 180
ac = sqrt(a^2 + b^2)
ma = mb = mc – midpoint theorem
theta = mbc
mcb = mbc – midpoint theorem
sin(theta) = ab/ac
cos(theta) = bc/ac
tan(theta) = ab/bc
theta = arctan(ab/bc)

arctan is the inverse tangent of a given angle.

You can use numpy arctan or math atan.

Refer:

numpy.arctan2 — NumPy v2.4 Manual

math — Mathematical functions — Python 3.14.4 documentation

Find Angle MBC | HackerRank

Categories
Computer Science

Solving Problem: Tax Calculator

(If-elif control flow)

Once upon a time, there was a country inhabited by happy and prosperous people. The people paid taxes, of course – their happiness had limits. The most important tax, called the Personal Income Tax (PIT ), had to be paid yearly and was evaluated using the following rule:

  • If the citizen’s income was not higher than 85,528 INR, the tax was equal to 18% of the income minus 556 INR and 2 paisa (this was the so-called tax relief)
  • If the income was higher than this amount, the tax was equal to 14,839 INR and 2 paisa, plus 32% of the surplus over 85,528 INR.

Your task is to write a tax calculator.

  • It should accept one floating-point value: the income.
  • Next, it should print the calculated tax, rounded to the full INR. There’s a function named round() which will do the rounding for you – you’ll find it in the skeleton code in the editor.

Note: This happy country never returns money to its citizens. If the calculated tax is less than zero, it means there is no tax (the tax is zero). Take this into consideration during your calculations.

Sample input: 10000

Expected output: The tax is: 1244.0 INR

Sample input: 100000

Expected output: The tax is: 19470.0 INR

Sample input: 1000

Expected output: The tax is: 0.0 INR

Sample input: -100

Expected output: The tax is: 0.0 INR

Python
income = float(input("Enter the annual income: "))
tax = 0
if income <= 0:
tax = 0.0
elif 0 < income <= 85528:
tax = (18/100)*income - 556.2
elif income >= 85528:
tax = 14839.2 + (32/100)*(income - 85528)
else:
print("Invalid Input!")
tax = round(tax, 0)
print("The tax is:", tax, "INR")