Company Knowledge Bases – Built with Large Language Models (LLM)

 

Most companies are leery to have employees use ChatGPT, fearing Intellectual property and trade secrets can be exposed. There have been multiple instances of this happening already. So what can be done?

OpenAI has an enterprise tier and while they don’t give upfront quotes, it’s safe to assume pricing will be per user, per month with a term contract.  

So, what can small to mid-size companies do? They can use AI tools from multiple vendors including OpenAI to create custom knowledge bases using their own documents and putting a front-end on the data. I want to give an overview of the process for doing this. This is not theoretical as I built a solution for a non-profit to prove this out.

Overview of the process.

  1. Assemble a list of documents, text files, etc, that you want people to chat with.
  2. Load the documents using one of the various document loaders found in Langchain or LLLMAindex
  3. Use a text-splitter to convert the documents into chunks 
  4. Create embeddings with the chunks
    1. Note: I used OpenAI’s embedding function which does not store the embeddings or the data used to create the embeddings.
  5. Load the embeddings into a vector such as FAISS, Chroma, or Pinecone 
    1. in my case, I used Pinecone which is hosted but the others will work as well 
  6. You still need an LLM to access your vector database and respond to queries – I used OpenAI but now others from multiple vendors. 
    1. You can go Models – Hugging Face to pick a model to use. 
  7. You need to add the following features or add-ons that make querying the knowledge base easier.
    1. A Q&A retrieval component – This responds to questions with other information
    2. Conversation Memory component – this is what helps keep context so you can ask repeated questions related to your original question
    3. Most recently, a Smart Agent component has been used. These are specialized and can sift through answers to provide the best answer suited to the question. Depending on how they are configured they can reduce the noise returned.\
  8. Lastly, you need a front-end for the knowledge base. I have used Gradio and Streamlit which both have pros and cons but others are coming out regularly.

I am currently working on deploying this to AWS so it’s scalable and secure. I will do a post about that in the near future.

You will come across multiple apps that will let you upload files and documents and then let you query them online. This approach I covered gives an organization complete control of the knowledge base(s). I say bases as you can have an HR, Finance, and Legal knowledge bases in your org with different access lists and features based on requirements.

One final thing to note. AI is changing at an unbelievable pace. It can be a challenge to keep up. Companies must plan to stay abreast of the changes or get left behind.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Translate »