Build your own ChatBot to chat with your Documents with LlamaIndex and TypeScript

Over the past year, the focus on Generative AI has been expanding, and if you are a developer who has been in the JavaScript ecosystem for a long time, having to switch to Python would be holding you back from building AI-first applications.

Knowing this learning curve, popular frameworks like LangChain and LLamaIndex have introduced TypeScript equivalents of their Python frameworks. In this post, we are going to build a Document Summary Agent using LLamaIndex and TypeScript. The agent we are implementing would be able to read a directory of files, index the content, and will be able to answer questions related to summarizing parts of the document or entire document.

Install Dependencies and Project Setup

Currently, LLamaIndex supports Node.JS versions 18, 20 and 22. Let’s install LLamaIndex and dotenv packages via NPM (LLamaIndex expects OPENAI_API_KEY to be available as an environment variable, you can use export before running instead of using dotenv)

mkdir rag-agent-typescript  && cd rag-agent-typescript
npm init
npm install --save LLamaIndex dotenv

We will create index.ts on the root directory and import necessary functions from LLamaIndex and set up dotenv. Make sure you create a .env file in the project directory and add OPENAI_API_KEY value.

import { OpenAIAgent, QueryEngineTool, SimpleDirectoryReader, SummaryIndex } from 'llamaindex';
import * as dotenv from 'dotenv';
dotenv.config({ path: __dirname + '/.env' });

SimpleDirectoryReader: Reads and retrieves files from a given directory.

SummaryIndex: Creates a vector index of content from the fetched local files, typically used for efficient retrieval and search operations using the query engine.

QueryEngineTool: This creates a query tool from the indexed data, which will be passed to the OpenAI agent for performing specific queries or operations.

Loading Data from Documents

We create a directory called data in the root directory of the project which has a pdf file, which is a research paper titled ChatGPT is Bullshit. We are only using one file for the sake of this example. We will be using LLamaIndex to query and summarize the content of this paper. We will use the content from the document to create a vector index and a query engine using the previously imported functions.

const documents = await new SimpleDirectoryReader().loadData({
directoryPath: './data',
});

const documentSummaryIndex = await SummaryIndex.fromDocuments(documents);
const documentSummaryEngine = documentSummaryIndex.asQueryEngine();

Create the Summary Agent

We will now create an Engine Tool that will be provided to the OpenAI Agent for performing the specific task with the help of the LLM. This can take text instructions for specific tasks or functions to do operations like Math.

const summaryEngineTool = new QueryEngineTool({
queryEngine: documentSummaryEngine,
metadata: {
name: 'document_summary_engine',
description: 'Use this for summarizing the document content.',
},
});

const agent = new OpenAIAgent({
tools: [summaryEngineTool],
verbose: true,
});

We can create and pass multiple tools to the OpenAI Agent and it will pick based on the instructions provided. Setting the verbose property to true will give us detailed information on the execution.

Chat with the Agent

const response = await agent.chat({
message: 'What is the intention of the author in the paper ChatGPT is bullshit?',
});
console.log(response.response);

Finally, to run the code, we will use ts-node.

$ npx ts-node index.ts
Starting step(id, 98103b30-42cf-42e4-a58d-67348ee157ea).
Enqueueing output for step(id, 98103b30-42cf-42e4-a58d-67348ee157ea).

Tool document_summary_engine (remote:document_summary_engine) succeeded.

Output: "The intention of the authors in the paper \"ChatGPT is bullshit\" is to argue that the inaccuracies produced by ChatGPT and similar large language models (LLMs) should not be described as \"hallucinations\" or \"lies.\" Instead, they propose that these inaccuracies are better understood as \"bullshit\" in the Frankfurtian sense. The authors contend that LLMs are indifferent to the truth of their outputs and are designed to produce text that appears truth-apt without any actual concern for truth…."

Finished step(id, 98103b30-42cf-42e4-a58d-67348ee157ea).
Starting step(id, 83626421-b781-46c7-9ea0-2400b55af336).
Enqueueing output for step(id, 83626421-b781-46c7-9ea0-2400b55af336).
Finished step(id, 83626421-b781-46c7-9ea0-2400b55af336).
Final step(id, 83626421-b781-46c7-9ea0-2400b55af336) reached, closing task.
{
raw: {
id: 'chatcmpl-9bmmvMecVxqtrFfbuIw5BgFG7b97H',
object: 'chat.completion',
created: 1718792705,
model: 'gpt-4o-2024-05-13',
choices: [ … ],
usage: { prompt_tokens: 720, completion_tokens: 385, total_tokens: 1105 },
system_fingerprint: 'fp_5e6c71d4a8'
},
message: {
content: 'The intention of the authors in the paper "ChatGPT is bullshit" is to argue that the inaccuracies produced by ChatGPT and similar large language models (LLMs) should not be described as "hallucinations" or "lies." Instead, they propose that these inaccuracies are better understood as "bullshit" in the Frankfurtian sense. The authors contend that LLMs are indifferent to the truth of their outputs and are designed to produce text that appears truth-apt without any actual concern for truth…..',
role: 'assistant',
options: {}
}
}
Done

Conclusion

By having the TypeScript version of their Python frameworks, tools like LLamaIndex and LangChain are reducing the entry barrier to getting started with building AI-powered applications for developers from the JavaScript ecosystem. More power to Gen AI!