Langchain, Bedrock, Llama, GPT: Building a Pdf Image Chat Bot
profile picture Pratik Raj
9 min read Jul 29, 2024
GPT

Langchain, Bedrock, Llama, GPT: Building a Pdf Image Chat Bot

GPT

Over the past year, the advancement of Natural Language Processing (NLP) has been nothing short of remarkable since the introduction of ChatGPT. With its remarkable capability to generate text that closely resembles human language, developers globally have been engaging in a multitude of experiments and innovations, leveraging this technology to create a diverse array of applications. Beyond the developer community, individuals ranging from content creators to customer service representatives have embraced and utilised ChatGPT to enhance their workflows, ultimately delivering improved experiences for their customers. The widespread adoption of this technology underscores its transformative impact across various domains.

Ever wondered how to create a super cool chatbot that can actually read info from images in PDFs? Well, we’re about to dive into an exciting journey where we’ll build just that! Our chatbot will be like a tech wizard, using cool tools like Amazon Textractor, Amazon Bedrock, Llama, GPT-3.5 model, Langchain, and FAISS Vector DB. Imagine a chatbot that not only understands text but also extracts gems from images in PDFs. It’s like magic, right? 🌟 Stick around as we explore this amazing fusion of tech to make your chatbot a real NLP superhero!

Note -: Click here to go through the notebook.

Getting Started -:

  • An OpenAI account with an API Key.
  • AWS Account with access to bedrock models(eg: Llama Chat 2 13b).
  • Familiarity with Natural Language Processing (NLP) concepts and techniques.
  • Familiarity with Langchain Library
  • Jupyter Notebook.

Understanding the File -:

For the Sake of this blog, we will be using a Floor Plan PDF of a Building with the floor plans of each of the flat units on each page.Pdf Link

In floor plan PDFs, each page contains both text and a floor plan image. While tools like PyPDF excel in extracting text, they often overlook embedded images. This limitation means a direct parsing approach might miss these vital visual elements in the floor plans.

Here’s how we tackle the challenge of not missing any data from the floor plan PDFs: First, we take each page of the PDF and transform it into an image. This is done using a handy tool called pdf2image, which neatly stores all these page-images in a folder. Now, the trick is getting the text out of these images. For that, we use something called OCR – Optical Character Recognition. It’s like teaching the computer to read the text in the images. We’ve got some cool tools for this, like Amazon Textract and UnstructuredImageLoader, which are great at picking out all the text details we need.

#Convert PDF Pages to Images
from pdf2image import convert_from_path
def pdf_to_image(file_path,image_dir):
       os.makedirs(image_dir, exist_ok=True)
       images = convert_from_path(file_path, fmt='jpg', output_folder=image_dir,output_file="page")
       print(len(images))

pdf_to_image("./floor-plans.pdf","floor-plan-images")
import glob
def fetch_pages(image_dir):
       '''This Function will iterate through all the images and store the path of each image.'''

       image_files_path = glob.glob(f'{image_dir}/*.jpg')

       return image_files_path

image_files_path = fetch_pages('./floor-plan-images')
print(len(image_files_path))

alt_text

Following the steps outlined, I’ve successfully converted each page of the PDF into individual images. These images are then meticulously stored, and I maintain a list of their file paths. This list serves as a handy reference for easy access and processing in subsequent steps.

Extracting Data from Image -:

At this vital juncture of NLP, we focus on extracting and structuring data, a key step for training Large Language Models (LLMs). Langchain supports two effective methods for this purpose:

  • Amazon Textractor: Known for its precision, it’s ideal for detailed image data extraction.
  • UnstructuredImageLoader: Offers versatility in handling various image formats.

Both options yield similar quality results with our Image Data. Depending on your images’ specific attributes, testing each method will help you choose the most suitable one for your data.

For reference, I am adding Extraction of Data from Images including both the options. \

  1. Using Amazon Textract
from langchain.document_loaders import AmazonTextractPDFLoader
def get_pages_data(textract_client, image_files_path):
       '''Amazon Textract uses OCR technique to extract data from the image.'''

       all_pages = []
       count = 1
       for path in image_files_path:

           loader = AmazonTextractPDFLoader(path, client=textract_client)
           documents = loader.load()
           all_pages.extend(documents)
           count += 1
       return all_pages

pages_data = get_pages_data(boto3.client("textract", region_name="us-east-1"),image_files_path)
print(len(pages_data))
pages_data
  1. Langchain In-Built Unstructured Image Loader

    from langchain.document_loaders.image import UnstructuredImageLoader
    def get_pages_data(image_files_path):
       pages_data = []
       count = 1
    
       for path in image_files_path:
           print(f"Processing image: {path}")
    
           try:
               loader = UnstructuredImageLoader(path)
               documents = loader.load()
               pages_data.extend(documents)
               print(f"Extracted Text from Page No. - {count}")
    
           except Exception as e:
               print(f"Error processing image {path}: {e}")
               continue
    
           count += 1
    
       print("All Pages of the Document Loaded")
       return pages_data
    
    pages_data = get_pages_data(image_files_path)
    print(f"Number of pages processed: {len(pages_data)}")
    pages_data
    

Vector Store vs Prompting Data:

With the data extraction complete, we’re ready to progress to the pivotal phase of training our Large Language Model (LLM) using the newly extracted data. There are two main strategies for this training process:

  1. Vectorization Approach: Here, we convert the extracted data into vectors using embedding models. These vectors are then meticulously stored in a vector database, setting the stage for efficient model training.
  2. Prompt-Based Approach: Alternatively, we can craft prompts from the structured extracted data. These prompts are directly fed into the LLM, providing a more immediate and integrated training approach.

Each method offers unique advantages, and the choice largely depends on the specific requirements and goals of our LLM project.

For larger documents, it’s recommended to opt for the Vector Database approach. This strategy not only helps circumvent token limit errors but also plays a significant role in minimising LLM query costs. By efficiently managing large volumes of data through vectorization, we ensure smoother and more cost-effective operations of the Large Language Model.

Embedding the Extracted Data-:

To effectively store our extracted data as vectors in a vector database, we need to follow a couple of key steps:

  1. Embedding the Data: The first step involves embedding the extracted data into vectors. This process transforms our textual data into a numerical vector format, making it suitable for efficient storage and retrieval in vector databases.
  2. Selecting a Vector Database: Next, we need to choose an appropriate vector database. This database will house our embedded vectors, so it’s important to select one that aligns with our data requirements and offers the desired performance and scalability.

We have several options for embedding our extracted data, like Amazon Titan Embedding Model, OpenAI’s Text-Embedding-Ada-002, and Cohere-Embed-English-V3. For our project, we’re using the Amazon Titan Embedding Model via AWS Bedrock. This model efficiently embeds our data, aligning well with our needs. For storing these vectors, we’ve chosen FAISS, known for its speedy handling of large vector datasets.

from langchain.embeddings import BedrockEmbeddings
embedding_model = BedrockEmbeddings(
   credentials_profile_name="default", region_name="us-east-1", model_id='amazon.titan-embed-text-v1'
)
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
def embed_data(all_pages,embedding_model):

       print(all_pages[1])
       print(len(all_pages))
       text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=200)
       documents = text_splitter.split_documents(all_pages)

       #self.print_steps(f"Converting Pages Data to Vector Database ....")

       vectordb = FAISS.from_documents(
       documents,
       embedding=embedding_model,
       #persist_directory=f'./{file_path}'
       )
       #vectordb.persist()
       return vectordb

vector_db = embed_data(pages_data,embedding_model)

Initializing Langchain Chain

Langchain offers a variety of chains suited for different applications. As our goal is to develop a conversational bot, we’re specifically opting for the ConversationalRetrievalChain. This choice aligns perfectly with our objective of creating a responsive and interactive conversational experience.

To integrate an LLM model into our Chain, we turn to Bedrock, which offers a wide array of Large Language Models. For our project, we’ve selected the Llama 13B model. However, Bedrock’s diverse options allow for experimentation, and I encourage exploring different models to find the one that best suits your specific needs.

from langchain.llms import Bedrock

bedrock_llm = Bedrock(
   credentials_profile_name="default", model_id="meta.llama2-13b-chat-v1",region_name='us-east-1'
)
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI
def create_chain(vector_db, memory_chat_qa):


       qa_chain = ConversationalRetrievalChain.from_llm(
               bedrock_llm,
               retriever=vector_db.as_retriever(search_kwargs={"k": 7}),
               return_source_documents=True,
               memory=memory_chat_qa,
               verbose=False,

           )
       return qa_chain

qa_chain_chat = create_chain(vector_db,memory_chat_qa)

It’s amazing how effortlessly Langchain enables the initialization and setup of a Conversational Chain. With this powerful tool at our disposal, querying our documents becomes a breeze, allowing us to swiftly extract relevant information with precision.

Interacting with the Conversational Chain is simple and intuitive. Just type in your query in a natural, conversational manner. Whether it’s a specific question, a keyword-focused inquiry, or a broader request for information, the Chain is adept at interpreting and providing accurate responses based on your documents.

result = qa_chain_chat({"question":"What is the total area of X03?"})
print(result["answer"])

Agents

To further elevate the efficacy of our Conversational ChatBot, we can incorporate ‘Agents’ from Langchain. These Agents are an advanced tool that harness the power of AI and Large Language Models (LLMs). They intelligently determine the next steps in a conversation and craft queries accordingly. This integration not only streamlines the chatbot’s responses but also ensures a more dynamic and intuitive interaction experience.

from langchain.agents import Tool

tool_desc = """Use this tool to answer users questions about the floor plan data. This tool can also be used for follow up questions from the user"""
tools = [Tool(
   func=qa_chain_chat, #Defining Conversational Chain we have initialised before
   description=tool_desc,
   name='builder_bot'
)]

from langchain.memory import ConversationBufferMemory
def initialize_memory():



   memory = ConversationBufferMemory(
                   memory_key="chat_history",
                   return_messages=True,
               )
   return memory

memory_agent = initialize_memory()

from langchain.agents import initialize_agent
from langchain.agents import AgentType
conversational_agent = initialize_agent(
   agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
   tools=tools,
   llm=ChatOpenAI(temperature=0.1, model_name=open_ai_models[6]),
   verbose=False,
   memory=memory_agent,
)
sys_msg = "You are a helpful assistant for a Construction Company ,and strictly answer user queries related to floor plans of flat units and buildings."
prompt = conversational_agent.agent.create_prompt(
   system_message=sys_msg,
   tools=tools
)
conversational_agent.agent.llm_chain.prompt = prompt
conversational_agent.run({'input': "What is the total are of the Unit X03?"})

And that’s all it takes! Just a few simple steps to integrate our Chain into the Langchain Agent, and we’re ready to start querying again. This time, however, we have the added advantage of a Large Language Model (LLM) assisting in decision-making. This powerful combination elevates our chatbot’s capabilities, making it more responsive, intelligent, and effective in handling queries.

Conclusion

We’ve achieved it! We’ve successfully created a Conversational ChatBot capable of extracting and interpreting data from images within PDFs. This entire process opens up exciting possibilities, such as building chatbots that can process images directly, creating databases from the extracted image data. Looking ahead, there are numerous opportunities to further refine data structuring and processing techniques. These enhancements will undoubtedly boost our chatbot’s performance, making it even more efficient and effective in handling diverse and complex queries.

Application Modernization Icon

Innovate faster, and go farther with serverless-native application development. Explore limitless possibilities with AntStack's serverless solutions. Empowering your business to achieve your most audacious goals.

Build with us

Author(s)

Tags

Your Digital Journey deserves a great story.

Build one with us.

Recommended Blogs

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors on this website.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.

Build With Us