Author Information

Author Information#

Author: Zeynab Teymoori

Bachelor’s Degree in Computer Engineering, Ferdowsi University of Mashhad, Razavi Khorasan, Iran (2020-2024)

Supervisor: Prof. Hadi Sadoghi Yazdi

Laboratory: Pattern Recognition Laboratory, Department of Engineering

Address: Ferdowsi University of Mashhad, Razavi Khorasan, Iran

What is LLM#

LLM stands for Large Language Model. It refers to a type of artificial intelligence (AI) model that is trained on a massive amount of text data to generate language outputs that are coherent and natural-sounding.

LLMs are typically based on deep learning architectures, such as transformer models, and are trained on vast amounts of text data, often in the order of billions or even trillions of words. This training enables the model to learn patterns, relationships, and structures of language, allowing it to generate text that is often indistinguishable from human-written text.

Different types pf LLM#

Here’s a table that discusses the three main types of large language models:

Type	Function	Training Objective	Strengths	Challenges	Example
Autoregressive Language Models	Generate text by predicting the next word based on preceding words	Maximize likelihood of generating the correct next word given context	Excel at generating coherent and contextually relevant text	Computationally expensive, prone to repetitive or irrelevant responses	GPT-3
Transformer-based Models	Utilize transformer architecture to process and generate text	Capture long-range dependencies and contextual information	Effective in processing and generating text with rich contextual understanding	Requires large amounts of data and computational resources for training, may suffer from model bias	RoBERTa (Robustly Optimized BERT Pretraining Approach) by Facebook AI
Encoder-Decoder Models	Used for machine translation, summarization, and question-answering tasks	Encode input into a fixed-length representation for output generation	Versatile for various NLP tasks	Complexity in training, can struggle with very long sequences and maintaining context across them	MarianMT (Marian Neural Machine Translation) by the University of Edinburgh

LLM Architecture#

“The architecture of Large Language Model primarily consists of multiple layers of neural networks, like recurrent layers, feedforward layers, embedding layers, and attention layers. These layers work together to process the input text and generate output predictions.”

Source: Analyticsvidhya

The embedding layer in neural networks transforms discrete input data into dense vector representations. By converting categories, such as words in text, into continuous vectors, the embedding layer helps the model learn and retain the relationships and similarities between these categories. This dense representation allows the model to process and understand the data more effectively, making it a crucial component in tasks like sentiment analysis, machine translation, and recommendation systems. Through training, the embeddings capture nuanced patterns and contextual information, enhancing the model’s ability to make accurate predictions and generate relevant outputs.
The recurrent layer processes sequences of data, such as sentences, by interpreting each word one at a time. It maintains a hidden state that updates with each new word, enabling the model to understand the context and relationships between words over time. This sequential processing allows the recurrent layer to capture dependencies and patterns within the text, making it essential for tasks like language modeling, speech recognition, and sequence prediction. Through its ability to maintain and update context, the recurrent layer enhances the model’s capability to generate coherent and contextually relevant outputs.
The attention mechanism empowers a language model to concentrate on specific segments of the input text that are important for the current task. By utilizing this layer, the model can produce highly precise results.

RAG#

Working with large language models (LLMs) presents numerous challenges, including domain knowledge gaps, issues with factual accuracy, and potential generation of incorrect information. Retrieval Augmented Generation (RAG) emerges as an effective solution to address these challenges by enhancing LLMs with external knowledge sources like databases. RAG proves especially valuable in scenarios demanding substantial knowledge or specific domain expertise that requires constant updating. An inherent advantage of RAG is its ability to adapt without the need for extensive retraining for task-specific applications. Recently, RAG has gained popularity for its utilization in conversational agents.

Here is an implementation of RAG for the question-answering task, especially for interacting with documents.

Imports#

The notebook requires Hugging Face sentence_transformers and PyPDF2 as additional dependencies. If you have not already installed them, you can use these commands:

python -m pip install sentence_transformers python -m pip install PyPDF2

Or, run the next cell to install them directly within the notebook:

necessary requirments#

!pip install sentence_transformers !pip install PyPDF2

Create Prompt#

Retrieve and augment phaze

Note#

This function is used to extract text from a PDF file uploaded by the user.

Guide1#

This function clean_text that cleans the extracted text by removing non-ASCII characters, control characters, and image tags. This function is used to preprocess the extracted text before feeding it into the LLM.

Guide2#

This function chunks the cleaned text into sentences and then groups them into fixed-size chunks (e.g., 3 sentences per chunk). This function is used to prepare the text for encoding with the LLM.

Generate Phaze#

This function initializes four models:

model_summary: a SentenceTransformer model for generating summary embeddings.
model_hypothetical: a SentenceTransformer model for generating hypothetical embeddings.
model_full_text: a SentenceTransformer model for generating full-text embeddings.
qa_pipeline: a question-answering pipeline using the EleutherAI/gpt-neo-2.7B model.

These models are used for encoding the text chunks and the user’s query.

Main#

This function orchestrates the entire RAG system. Here’s a overview of what the main function does:

Uploads a PDF file and extracts its text using extract_text_from_pdf.
Cleans and chunks the text using clean_text and chunk_text_by_sentences.
Encodes the text chunks using the three SentenceTransformer models.
Concatenates and normalizes the embeddings.
Encodes the user’s query using the same three SentenceTransformer models.
Calculates the cosine similarity between the query embedding and the text chunk embeddings.
Retrieves the top-k relevant documents based on the similarity scores.
Uses the question-answering pipeline to generate an answer to the user’s query based on the most relevant document. The main function is called when the script is run, and it interacts with the user to upload a PDF file and input a query.

The system extracts relevant text from the uploaded PDF file, encodes the text and the query, calculates the similarity scores, and generates an answer to the query based on the most relevant document.