Categories
Artificial Intelligence Computer Science

Retrieval Augmented Generation (RAG)

There is a trending term that is floating around in the Artificial Intelligence (AI) field, i.e., “RAG”. So, to satisfy the curiosity, let’s get to know what RAG is. Before that, let us have a brief idea of Generative AI.

Generative AI is an Artificial Intelligence system capable of creating new and original content in the form of text, code, images, audio, and video by learning patterns from large datasets or Large Language Models (LLMs) and analyzing and applying them to produce contextually relevant outputs.

How does it work?

Training: Deep Learning models are trained on large datasets to learn patterns and relationships.

Tuning: Fine-tuning the AI model with LoRA/QLoRA ranking techniques or Reinforcement Learning from Human Feedback (RLHF).

Generation: The AI responds to user queries and prompts by generating text, images, audio, or video based on up-to-date, factual data.

The generative models use “Transformers” to predict the next tokens based on context and produce logical text.

Below is an example of a code snippet that uses transformers to generate the response to a user query:

transformer.py
Python
from transformers import pipeline
# Load a pre-trained text generation pipeline
generator = pipeline("text-generation", model="gpt5")
# Generate text based on a prompt
prompt = "In the future, AI will"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
Types of models:

Transformers: Text/code generation based on LLMs and uses self-attention for context capture.

Diffusion models: Generate high-quality images/audio by iterative denoising.

GANs and VAEs: Image synthesis, style transfer, data augmentation

Encoder-Decoder: Translation, Summarization, and Multimodal tasks.

Generative AI Applications:

Text-generation (chatbots, summarization, and code generation), Image-generation (Art, medical images), Audio-generation (voice synthesis, music creation), Video-generation (animation, simulation).

Limitations of Generative AI
  • Generative AI models are prone to hallucinations and thus are less accurate.
  • Generative AI is not real-time. It is limited to its training cut-off, i.e., it does not access updated information until it is retrained.
  • It lacks access to the internal and proprietary data (For example, company reports, release notes, etc.).
  • It works with Large models and datasets. So it is resource-intensive with respect to compute and storage. So, fine-tuning becomes difficult.

These limitations make the urge to think about an improved methodology and architecture. Here is where “RAG” comes into the picture.

Retrieval Augmented Generation (RAG) is a technique that adds relevant context to AI, resulting in improved and accurate responses.

RAG Architecture
Generative AI vs RAG Comparison:
AspectGenAIRAG
AccuracyProne to hallucinationsGrounded in retrieved sources
Knowledge FreshnessStatic, limited to training cutoffDynamic, can access real-time data
Domain AdaptabilityWeak with proprietary/internal dataStrong, integrates custom datasets
Resource NeedsHigh (training/fine-tuning)Lower (retrieval pipeline setup)
CreativityStrong (novel, diverse outputs)Moderate (depends on retrieved context)
TraceabilityLimited (no source attribution)High (answers linked to documents)

Knowledge Index – An external knowledge source is a foundation for a RAG system. The knowledge source can be any domain-specific custom dataset, documents, databases, APIs, or structured tables.

Document Loader – The document loader standardizes and normalizes the documents from knowledge index data sources such as local files, web pages, cloud storage, or databases. The text splitter extracts the text, splits the text into chunks, and enriches it with metadata for the embedding phase.

Embedding – The text chunks are converted into numerical vectors using embedding models and capturing semantic meaning.

Vector Store – The embeddings are stored in a vector database or vector store. The vector database enables fast similarity searches and retrieves relevant context based on the user’s query.

Retriever – The query encoder converts the user input into a vector representation. The retriever then searches the vector database using semantic similarity or other search techniques to fetch the most relevant chunks of information.

Ranker – The ranker will carry out duplication, relevance ranking, and context enrichment on the vector embeddings. The retrieved and ranked chunks are then combined with the user query to generate a better and more accurate response.

Generator – The generator is the large language model (LLM) that synthesizes the retrieved context and user query to produce a grounded response. The modern RAG systems may use generators for query rewriting, self-evaluation, and corrective re-retrieval.

Output response – Output response is a formatted final response that is sent to the user.

Updator (Optional) – Some RAG systems use an updator to refresh and re-embed the data to ensure the knowledge base remains current and updated. The updator can be equipped with an agentic framework for automated refreshment of knowledge base.

RAG stands for Retrieval Augmented Generation.

  • Retrieval – Find relevant information.
  • Augmentation – Add data to AI’s knowledge.
  • Generation – Generate a better and more accurate response.

The purpose of RAG is to add relevant context to AI and generate an accurate response.

Categories
ecosystem environment sustainability

Bioplastic vs Plastic

A bioplastic is a material that is derived from renewable biological sources.

Key Properties

Biodegradability: Many bioplastics can decompose naturally through microbial action, turning into water, carbon dioxide, and biomass. This property helps mitigate plastic pollution.

Mechanical Properties: Bioplastics exhibit a range of mechanical properties, including flexibility, strength, and durability. Polylactic acid (PLA) is known for its rigidity, while polyhydroxyalkanoates (PHA) offer flexibility and toughness.

Thermal Properties: Bioplastics can have varying heat resistance, which affects their processing and application. Some bioplastics, such as PLA, have lower melting points, making them suitable for certain applications but limiting their use in high-temperature environments.

Barrier Properties: Certain bioplastics provide excellent barrier properties against gases and moisture, making them suitable for food packaging and other applications where preservation is crucial.

Sustainability: By utilizing renewable resources and often requiring less energy to produce than conventional plastics, bioplastics can contribute to a lower carbon footprint. They can also be produced from waste materials, further enhancing their sustainability.

PropertyBioplasticPlastic
DurabilityLess DurableMore Durable
BiodegradabilityBio-degradableNon-biodegradable
CompositionMade of renewable biomass sources like corn starch, sugarcane, potato starch, algae, vegetable oils etc.Made of fossil fuels such as crude oil and natural gas
Production processFermentation, enzymatic reactions, chemical synthesis from biomassPolymerization of petrochemicals
Environmental impactLower carbon footprint, potential soil benefitsHigh pollution, long lasting waste harmful for land and waterbodies
CostHigher (2x -3x)Low
ApplicationsSustainable packaging, eco-friendly products and disposable items Packaging, consumer goods, and automotive

Although traditional plastics are widely used in day-to-day products, it is very harmful to the nature – land, soil, waterbodies, plants, and animals if disposed of carelessly.

Moreover, there is no awareness among people about how a simple polyethene bag or packet of chips thrown on the streets will create a nuisance in the surroundings. Stray animals on the streets and in the surrounding areas are unaware of the toxic material, and they can consume it, causing a serious health hazard. People should take responsibility for their own actions and the nature in which they live.

Plastic pollution is a serious issue as it takes about 500 years for plastic to decompose. Moreover, it releases toxic chemicals and also microplastics during decomposition. The rivers, the oceans, the forests, and the public places are getting clogged with plastic waste. There are mountains of waste forming in landfills, which, if untreated, cause groundwater pollution and, if openly incinerated, cause air pollution – all because of poor waste management, ignorance, lack of knowledge, and awareness.

It is time to stop producing additional plastic and start recycling and treating existing plastic. It is time to switch to more sustainable materials and products – for our own good and the future of humanity.

I hope everyone understands the seriousness of pollution and starts adapting to eco-friendly products and a lifestyle.

Categories
Computer Science

Solving Problem: Count Elements Greater Than the Previous Average

Given an array of positive integers, return the number of elements that are strictly greater than the average of all previous elements. Skip the first element.

Example

Input

responseTimes = [100, 200, 150,300]

Output

2
responsetimes_regressions.py
Python
def countResponseTimeRegressions(responseTimes):
# Write your code here
count = 0
for i in range(1, len(responseTimes)):
if responseTimes[i] > sum(responseTimes[:i])/i:
count += 1
return count
if __name__ == '__main__':
responseTimes_count = int(input().strip())
responseTimes = []
for _ in range(responseTimes_count):
responseTimes_item = int(input().strip())
responseTimes.append(responseTimes_item)
result = countResponseTimeRegressions(responseTimes)
print(result)

Compiler Message

Success

Input (stdin)

1

100

Output (stdout)

0

Expected Output

0

Count Elements Greater Than Previous Average | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Sum of Multiples

The sum of multiples of k below n is:

Formula:

Sk=km(m+1)2

where m=n1k.

Find the sum of multiples of 3 or 5 below N.

For example:

If we list all the natural numbers below that are multiples of  3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23.

Input Format

The first line contains  T, which denotes the number of test cases. This is followed by T lines, each containing an integer, N.

Constraints

  • 1 <= T <= 10^5
  • 1 <= N <= 10^9

Output Format

For each test case, print an integer denoting the sum of all the multiples of 3 or 5 below N.

Sample Input 0

2
10
100

Sample Output 0

23
2318

Explanation 0

For if we list all the natural numbers below 10 that are multiples of  3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23.

Similarly, for N=100, we get 2318.

sum_of_multiples.py
Python
#!/bin/python3
import sys
t = int(input().strip())
if 1 <= t <= pow(10,5):
def sum_of_multiples(k, limit):
m = (limit - 1) // k
return k * m * (m+1) // 2
for a0 in range(t):
n = int(input().strip())
s3 = sum_of_multiples(3, n)
s5 = sum_of_multiples(5, n)
s15 = sum_of_multiples(15, n)
total = s3 + s5 - s15
print(total)

Input (stdin)

  • 2
  • 10
  • 100

Your Output (stdout)

  • 23
  • 2318

Expected Output

  • 23
  • 2318
sum_of_multiples1.py
Python
import sys
t = int(input().strip())
if 1 <= t <= pow(10,5):
for a0 in range(t):
n = int(input().strip())
total = 0
# print("n = ", n)
if 1 <= n <= pow(10,9):
if n == 1:
total = n
li = [i for i in range(1, n) if (i%3 == 0 or i%5 == 0)]
total = sum(li)
print(total)

The above code has O(n) complexity; however, it fails under memory constraints. When used with the arithmetic formula, the time complexity becomes O(1).

Contests | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Incorrect Regex

(Python)

You are given a string S.
Your task is to check whether S is a valid regex.

Input Format

The first line contains an integer T, the number of test cases.
The next T line contains the string S.

Constraints

0 < T < 100

Output Format

Print “True” or “False” for each test case without quotes.

Sample Input

2
.*\+
.*+

Sample Output

True
False

Explanation

.*\+ : Valid regex.
.*+: Has the error multiple repeat. Hence, it is invalid.

validate_regex.py
Python
import re
# validate regex
def is_valid_regex(T, patterns):
for i in range(0, T):
S = str(raw_input())
try:
re.compile(S)
print(True)
except:
print(False)
if __name__ == "__main__":
# User input
T = int(input())
patterns = [r".*\+"]
output = is_valid_regex(T, patterns)

Incorrect Regex | HackerRank

Categories
Coding Computer Science Python

Solving Problem: Combinations

(Python)

You are given a string S.
Your task is to print all possible combinations, up to size, of the string in lexicographically sorted order.

A single line containing the string and integer value separated by a space.

0 < k <= len(S)


The string contains only UPPERCASE characters.

Print the different combinations of the string on separate lines.

HACK 2

Sample Output

A
C
H
K
AC
AH
AK
CH
CK
HK
combinations.py
Python
from itertools import combinations
# Input from user
inp = input().split()
S = inp[0]
k = int(inp[1])
li1 = []
# Create a list with combinations
for i in range(1, k+1):
li1.extend(list(combinations(S, i)))
# Sort the list lexicographically
for i in range(0, len(li1)):
li1[i] = str(''.join(sorted(list(li1[i]))))
li1 = sorted(li1, key=lambda s: (len(s), s.lower())) # sort the list alphabetically ascending
# print the string
for i in range(0, len(li1)):
print(li1[i])

itertools.combinations() | HackerRank

Categories
Computer Science

BFS and DFS

(DSA – Tree Traversal)

BFSDFS
(Breadth First Search)(Depth First Search)
ParameterBFSDFS
Full FormBreadth First SearchDepth First Search
DefinitionBFS (Breadth First Search) is a graph traversal concept where nodes are traversed on same level before moving to next level.DFS (Depth First Search) is a graph traversal concept where nodes are traversed to depth until a node is reached with no unvisited neighbours.
Data StructureQueueStack
ConceptTree builds level by level.Tree builds sub-tree by sub-tree.
ApproachFirst In First Out (FIFO)Last In First Out (LIFO)
SourceBetter when target is closer to given source.Better when target is farther from given source.
ApplicationsBipartite Graphs, Shortest Path etc.Acyclic Graphs, Find Strongly Connected Components etc.
Python
#DFS and BFS
from collections import deque
# Define the graph
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B'],
'E': ['B', 'F'],
'F': ['C', 'E']
}
# BFS - Breadth First Search
def bfs(graph, start):
visited = {start}
queue = deque([start]) # FIFO: First In First Out
while queue:
node = queue.popleft() # return leftmost element
print(node, end=' ')
for neighbour in graph[node]:
if neighbour not in visited:
visited.add(neighbour)
queue.append(neighbour)
# DFS - Depth First Search
def dfs(graph, start):
visited = set()
stack = [start] # LIFO: Last In First Out
while stack:
node = stack.pop() # return rightmost element
if node not in visited:
print(node, end= ' ')
visited.add(node)
stack.extend(reversed(graph[node]))
print("BFS Traversal:")
bfs(graph, 'A') # Output: A B C D E F
print("\nDFS Traversal:")
dfs(graph, 'A') # Output: A B D E F C

V = number of vertices (nodes)

E = number of edges

Time and Space Complexity:

  • BFS: Complete, finds the shortest path in unweighted graphs, O(V+E) time, O(V) space due to the queue.
  • DFS: Not always complete without safeguards, may not find the shortest path, O(V+E) time, space O(h) for recursion depth or O(V) for iterative stack.
  • BFS is slower and requires more memory space than DFS.
  • Choice depends on graph size, depth, and whether shortest path or memory efficiency is the priority.

Categories
Computer Science

Solving Problem: Leap Year

(If-elif-else Control Flow)

Check if a year is a leap year.

 As per the Gregorian calendar (in 1582), the following rule is used to determine the kind of year:

  • If the year number isn’t divisible by four, it’s a common year.
  • Otherwise, if the year number isn’t divisible by 100, it’s a leap year.
  • Otherwise, if the year number isn’t divisible by 400, it’s a common year.
  • Otherwise, it’s a leap year.

The task is to determine if the given year is a leap year or not.

Output messages:

“Leap year.” if the year is a leap year.

“Common year.” if the year is common.

“Not within the Gregorian era.” If the year falls out of the Gregorian era.

Input: 2000

Output: Leap year.

Input: 1999

Output: Common year.

Input: 1996

Output: Leap year.

Input: 1500

Output: Not within the Gregorian era.

Python
year = int(input("Enter a year: "))
#
# Write your code here.
#
if year > 1582:
if (year%4 == 0 and year%100 != 100) or year%400 == 400:
print("Leap year.")
else:
print("Common year.")
else:
print("Not within the Gregorian era.")

Output:

Categories
Computer Science

Solving Problem: Symmetric Difference

(Sets)

Given 2 sets of integers M and N, print their symmetric difference in ascending order. The term symmetric difference indicates those values that exist in either but do not exist in both.

Python
# Enter your code here. Read input from STDIN. Print output to STDOUT
M = int(input())
li_M = list(map(int, input().split()))
N = int(input())
li_N = list(map(int, input().split()))
sM = set(li_M)
sN = set(li_N)
sD = sM.symmetric_difference(sN)
sorted_S = sorted(sD)
for i in sorted_S:
print(i)

https://www.hackerrank.com/challenges/symmetric-difference/problem

Categories
Computer Science

Solving Problem: Trigonometry

(Python)

 ABC is a right triangle,  90° at  B.
Therefore,  ∠ ABC = 90.

Point  M is the midpoint of the hypotenuse.

You are given the lengths AB and BC.
Your task is to find MBC (angle θ, as shown in the figure) in degrees.

Input Format

The first line contains the length of side AB.
The second line contains the length of the side BC.

Constraints

  • 0 < AB <= 100
  • 0 < BC <= 100
  • Lengths  AB and  BC are natural numbers.

Output Format

Output  MBC in degrees.

Note: Round the angle to the nearest integer.

Examples:
If the angle is 56.5000001°, then output 57°.
If the angle is 56.5000000°, then output 57°.
If the angle is 56.4999999°, then output 56°.

0° < θ° < 90°

Sample Input

10
10

Sample Output

45°
Python
# Enter your code here. Read input from STDIN. Print output to STDOUT
import math
AB = int(input())
BC = int(input())
Angle_ABC = 90
AC = pow(AB^2+BC^2, 1/2)
theta = round(math.degrees(math.atan(AB/BC)))
#print(f"{theta}\u00B0")
print(str(theta)+chr(176))

Note:

a + b + c = 180
ac = sqrt(a^2 + b^2)
ma = mb = mc – midpoint theorem
theta = mbc
mcb = mbc – midpoint theorem
sin(theta) = ab/ac
cos(theta) = bc/ac
tan(theta) = ab/bc
theta = arctan(ab/bc)

arctan is the inverse tangent of a given angle.

You can use numpy arctan or math atan.

Refer:

numpy.arctan2 — NumPy v2.4 Manual

math — Mathematical functions — Python 3.14.4 documentation

Find Angle MBC | HackerRank