Productivity
šCheck out Model Depot
Are you using a Windows/Linux x86 machine?
llmware
provides a unified framework for building LLM-based applications (e.g., RAG, Agents), using small, specialized models that can be deployed privately, integrated with enterprise knowledge sources safely and securely, and cost-effectively tuned and adapted for any business process.
llmware
has two main components:
RAG Pipeline - integrated components for the full lifecycle of connecting knowledge sources to generative AI models; and
50+ small, specialized models fine-tuned for key tasks in enterprise process automation, including fact-based question-answering, classification, summarization, and extraction.
By bringing together both of these components, along with integrating leading open source models and underlying technologies, llmware
offers a comprehensive set of tools to rapidly build knowledge-based enterprise LLM applications.
Most of our examples can be run without a GPU server - get started right away on your laptop.
Join us on Discord | Watch Youtube Tutorials | Explore our Model Families on Huggingface
New to Agents? Check out the Agent Fast Start series
New to RAG? Check out the Fast Start video series
š„š„š„ Multi-Model Agents with SLIM Models - Intro-Video š„š„š„
Intro to SLIM Function Call Models
Can't wait? Get SLIMs right away:
from llmware.models import ModelCatalog
ModelCatalog().get_llm_toolkit() # get all SLIM models, delivered as small, fast quantized tools
ModelCatalog().tool_test_run("slim-sentiment-tool") # see the model in action with test script included
Writing code withllmware
is based on a few main concepts:
# 150+ Models in Catalog with 50+ RAG-optimized BLING, DRAGON and Industry BERT models
# Full support for GGUF, HuggingFace, Sentence Transformers and major API-based models
# Easy to extend to add custom models - see examples
from llmware.models import ModelCatalog
from llmware.prompts import Prompt
# all models accessed through the ModelCatalog
models = ModelCatalog().list_all_models()
# to use any model in the ModelCatalog - "load_model" method and pass the model_name parameter
my_model = ModelCatalog().load_model("llmware/bling-phi-3-gguf")
output = my_model.inference("what is the future of AI?", add_context="Here is the article to read")
# to integrate model into a Prompt
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")
response = prompter.prompt_main("what is the future of AI?", context="Insert Sources of information")
from llmware.library import Library
# to parse and text chunk a set of documents (pdf, pptx, docx, xlsx, txt, csv, md, json/jsonl, wav, png, jpg, html)
# step 1 - create a library, which is the 'knowledge-base container' construct
# - libraries have both text collection (DB) resources, and file resources (e.g., llmware_data/accounts/{library_name})
# - embeddings and queries are run against a library
lib = Library().create_new_library("my_library")
# step 2 - add_files is the universal ingestion function - point it at a local file folder with mixed file types
# - files will be routed by file extension to the correct parser, parsed, text chunked and indexed in text collection DB
lib.add_files("/folder/path/to/my/files")
# to install an embedding on a library - pick an embedding model and vector_db
lib.install_new_embedding(embedding_model_name="mini-lm-sbert", vector_db="milvus", batch_size=500)
# to add a second embedding to the same library (mix-and-match models + vector db)
lib.install_new_embedding(embedding_model_name="industry-bert-sec", vector_db="chromadb", batch_size=100)
# easy to create multiple libraries for different projects and groups
finance_lib = Library().create_new_library("finance_q4_2023")
finance_lib.add_files("/finance_folder/")
hr_lib = Library().create_new_library("hr_policies")
hr_lib.add_files("/hr_folder/")
# pull library card with key metadata - documents, text chunks, images, tables, embedding record
lib_card = Library().get_library_card("my_library")
# see all libraries
all_my_libs = Library().get_all_library_cards()
from llmware.retrieval import Query
from llmware.library import Library
# step 1 - load the previously created library
lib = Library().load_library("my_library")
# step 2 - create a query object and pass the library
q = Query(lib)
# step 3 - run lots of different queries (many other options in the examples)
# basic text query
results1 = q.text_query("text query", result_count=20, exact_mode=False)
# semantic query
results2 = q.semantic_query("semantic query", result_count=10)
# combining a text query restricted to only certain documents in the library and "exact" match to the query
results3 = q.text_query_with_document_filter("new query", {"file_name": "selected file name"}, exact_mode=True)
# to apply a specific embedding (if multiple on library), pass the names when creating the query object
q2 = Query(lib, embedding_model_name="mini_lm_sbert", vector_db="milvus")
results4 = q2.semantic_query("new semantic query")
from llmware.prompts import Prompt
from llmware.retrieval import Query
from llmware.library import Library
# build a prompt
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")
# add a file -> file is parsed, text chunked, filtered by query, and then packaged as model-ready context,
# including in batches, if needed, to fit the model context window
source = prompter.add_source_document("/folder/to/one/doc/", "filename", query="fast query")
# attach query results (from a Query) into a Prompt
my_lib = Library().load_library("my_library")
results = Query(my_lib).query("my query")
source2 = prompter.add_source_query_results(results)
# run a new query against a library and load directly into a prompt
source3 = prompter.add_source_new_query(my_lib, query="my new query", query_type="semantic", result_count=15)
# to run inference with 'prompt with sources'
responses = prompter.prompt_with_source("my query")
# to run fact-checks - post inference
fact_check = prompter.evidence_check_sources(responses)
# to view source materials (batched 'model-ready' and attached to prompt)
source_materials = prompter.review_sources_summary()
# to see the full prompt history
prompt_history = prompter.get_current_history()
""" This 'Hello World' example demonstrates how to get started using local BLING models with provided context, using both
Pytorch and GGUF versions. """
import time
from llmware.prompts import Prompt
def hello_world_questions():
test_list = [
{"query": "What is the total amount of the invoice?",
"answer": "$22,500.00",
"context": "Services Vendor Inc. \n100 Elm Street Pleasantville, NY \nTO Alpha Inc. 5900 1st Street "
"Los Angeles, CA \nDescription Front End Engineering Service $5000.00 \n Back End Engineering"
" Service $7500.00 \n Quality Assurance Manager $10,000.00 \n Total Amount $22,500.00 \n"
"Make all checks payable to Services Vendor Inc. Payment is due within 30 days."
"If you have any questions concerning this invoice, contact Bia Hermes. "
"THANK YOU FOR YOUR BUSINESS! INVOICE INVOICE # 0001 DATE 01/01/2022 FOR Alpha Project P.O. # 1000"},
{"query": "What was the amount of the trade surplus?",
"answer": "62.4 billion yen ($416.6 million)",
"context": "Japanās September trade balance swings into surplus, surprising expectations"
"Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, "
"beating expectations from economists polled by Reuters for a trade deficit of 42.5 "
"billion yen. Data from Japanās customs agency revealed that exports in September "
"increased 4.3% year on year, while imports slid 16.3% compared to the same period "
"last year. According to FactSet, exports to Asia fell for the ninth straight month, "
"which reflected ongoing China weakness. Exports were supported by shipments to "
"Western markets, FactSet added. ā Lim Hui Jie"},
{"query": "When did the LISP machine market collapse?",
"answer": "1987.",
"context": "The attendees became the leaders of AI research in the 1960s."
" They and their students produced programs that the press described as 'astonishing': "
"computers were learning checkers strategies, solving word problems in algebra, "
"proving logical theorems and speaking English. By the middle of the 1960s, research in "
"the U.S. was heavily funded by the Department of Defense and laboratories had been "
"established around the world. Herbert Simon predicted, 'machines will be capable, "
"within twenty years, of doing any work a man can do'. Marvin Minsky agreed, writing, "
"'within a generation ... the problem of creating 'artificial intelligence' will "
"substantially be solved'. They had, however, underestimated the difficulty of the problem. "
"Both the U.S. and British governments cut off exploratory research in response "
"to the criticism of Sir James Lighthill and ongoing pressure from the US Congress "
"to fund more productive projects. Minsky's and Papert's book Perceptrons was understood "
"as proving that artificial neural networks approach would never be useful for solving "
"real-world tasks, thus discrediting the approach altogether. The 'AI winter', a period "
"when obtaining funding for AI projects was difficult, followed. In the early 1980s, "
"AI research was revived by the commercial success of expert systems, a form of AI "
"program that simulated the knowledge and analytical skills of human experts. By 1985, "
"the market for AI had reached over a billion dollars. At the same time, Japan's fifth "
"generation computer project inspired the U.S. and British governments to restore funding "
"for academic research. However, beginning with the collapse of the Lisp Machine market "
"in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began."},
{"query": "What is the current rate on 10-year treasuries?",
"answer": "4.58%",
"context": "Stocks rallied Friday even after the release of stronger-than-expected U.S. jobs data "
"and a major increase in Treasury yields. The Dow Jones Industrial Average gained 195.12 points, "
"or 0.76%, to close at 31,419.58. The S&P 500 added 1.59% at 4,008.50. The tech-heavy "
"Nasdaq Composite rose 1.35%, closing at 12,299.68. The U.S. economy added 438,000 jobs in "
"August, the Labor Department said. Economists polled by Dow Jones expected 273,000 "
"jobs. However, wages rose less than expected last month. Stocks posted a stunning "
"turnaround on Friday, after initially falling on the stronger-than-expected jobs report. "
"At its session low, the Dow had fallen as much as 198 points; it surged by more than "
"500 points at the height of the rally. The Nasdaq and the S&P 500 slid by 0.8% during "
"their lowest points in the day. Traders were unclear of the reason for the intraday "
"reversal. Some noted it could be the softer wage number in the jobs report that made "
"investors rethink their earlier bearish stance. Others noted the pullback in yields from "
"the dayās highs. Part of the rally may just be to do a market that had gotten extremely "
"oversold with the S&P 500 at one point this week down more than 9% from its high earlier "
"this year. Yields initially surged after the report, with the 10-year Treasury rate trading "
"near its highest level in 14 years. The benchmark rate later eased from those levels, but "
"was still up around 6 basis points at 4.58%. 'Weāre seeing a little bit of a give back "
"in yields from where we were around 4.8%. [With] them pulling back a bit, I think thatās "
"helping the stock market,' said Margaret Jones, chief investment officer at Vibrant Industries "
"Capital Advisors. 'Weāve had a lot of weakness in the market in recent weeks, and potentially "
"some oversold conditions.'"},
{"query": "Is the expected gross margin greater than 70%?",
"answer": "Yes, between 71.5% and 72.%",
"context": "Outlook NVIDIAās outlook for the third quarter of fiscal 2024 is as follows:"
"Revenue is expected to be $16.00 billion, plus or minus 2%. GAAP and non-GAAP "
"gross margins are expected to be 71.5% and 72.5%, respectively, plus or minus "
"50 basis points. GAAP and non-GAAP operating expenses are expected to be "
"approximately $2.95 billion and $2.00 billion, respectively. GAAP and non-GAAP "
"other income and expense are expected to be an income of approximately $100 "
"million, excluding gains and losses from non-affiliated investments. GAAP and "
"non-GAAP tax rates are expected to be 14.5%, plus or minus 1%, excluding any discrete items."
"Highlights NVIDIA achieved progress since its previous earnings announcement "
"in these areas: Data Center Second-quarter revenue was a record $10.32 billion, "
"up 141% from the previous quarter and up 171% from a year ago. Announced that the "
"NVIDIAĀ® GH200 Graceā¢ Hopperā¢ Superchip for complex AI and HPC workloads is shipping "
"this quarter, with a second-generation version with HBM3e memory expected to ship "
"in Q2 of calendar 2024. "},
{"query": "What is Bank of America's rating on Target?",
"answer": "Buy",
"context": "Here are some of the tickers on my radar for Thursday, Oct. 12, taken directly from "
"my reporterās notebook: Itās the one-year anniversary of the S&P 500ā²s bear market bottom "
"of 3,577. Since then, as of Wednesdayās close of 4,376, the broad market index "
"soared more than 22%. Hotter than expected September consumer price index, consumer "
"inflation. The Social Security Administration issues announced a 3.2% cost-of-living "
"adjustment for 2024. Chipotle Mexican Grill (CMG) plans price increases. Pricing power. "
"Cites consumer price index showing sticky retail inflation for the fourth time "
"in two years. Bank of America upgrades Target (TGT) to buy from neutral. Cites "
"risk/reward from depressed levels. Traffic could improve. Gross margin upside. "
"Merchandising better. Freight and transportation better. Target to report quarter "
"next month. In retail, the CNBC Investing Club portfolio owns TJX Companies (TJX), "
"the off-price juggernaut behind T.J. Maxx, Marshalls and HomeGoods. Goldman Sachs "
"tactical buy trades on Club names Wells Fargo (WFC), which reports quarter Friday, "
"Humana (HUM) and Nvidia (NVDA). BofA initiates Snowflake (SNOW) with a buy rating."
"If you like this story, sign up for Jim Cramerās Top 10 Morning Thoughts on the "
"Market email newsletter for free. Barclays cuts price targets on consumer products: "
"UTZ Brands (UTZ) to $16 per share from $17. Kraft Heinz (KHC) to $36 per share from "
"$38. Cyclical drag. J.M. Smucker (SJM) to $129 from $160. Secular headwinds. "
"Coca-Cola (KO) to $59 from $70. Barclays cut PTs on housing-related stocks: Toll Brothers"
"(TOL) to $74 per share from $82. Keeps underweight. Lowers Trex (TREX) and Azek"
"(AZEK), too. Goldman Sachs (GS) announces sale of fintech platform and warns on "
"third quarter of 19-cent per share drag on earnings. The buyer: investors led by "
"private equity firm Sixth Street. Exiting a mistake. Rise in consumer engagement for "
"Spotify (SPOT), says Morgan Stanley. The analysts hike price target to $190 per share "
"from $185. Keeps overweight (buy) rating. JPMorgan loves elf Beauty (ELF). Keeps "
"overweight (buy) rating but lowers price target to $139 per share from $150. "
"Sees āstill challengingā environment into third-quarter print. The Club owns shares "
"in high-end beauty company Estee Lauder (EL). Barclays upgrades First Solar (FSLR) "
"to overweight from equal weight (buy from hold) but lowers price target to $224 per "
"share from $230. Risk reward upgrade. Best visibility of utility scale names."},
{"query": "What was the rate of decline in 3rd quarter sales?",
"answer": "20% year-on-year.",
"context": "Nokia said it would cut up to 14,000 jobs as part of a cost cutting plan following "
"third quarter earnings that plunged. The Finnish telecommunications giant said that "
"it will reduce its cost base and increase operation efficiency to āaddress the "
"challenging market environment. The substantial layoffs come after Nokia reported "
"third-quarter net sales declined 20% year-on-year to 4.98 billion euros. Profit over "
"the period plunged by 69% year-on-year to 133 million euros."},
{"query": "What is a list of the key points?",
"answer": "ā¢Stocks rallied on Friday with stronger-than-expected U.S jobs data and increase in "
"Treasury yields;\nā¢Dow Jones gained 195.12 points;\nā¢S&P 500 added 1.59%;\nā¢Nasdaq Composite rose "
"1.35%;\nā¢U.S. economy added 438,000 jobs in August, better than the 273,000 expected;\n"
"ā¢10-year Treasury rate trading near the highest level in 14 years at 4.58%.",
"context": "Stocks rallied Friday even after the release of stronger-than-expected U.S. jobs data "
"and a major increase in Treasury yields. The Dow Jones Industrial Average gained 195.12 points, "
"or 0.76%, to close at 31,419.58. The S&P 500 added 1.59% at 4,008.50. The tech-heavy "
"Nasdaq Composite rose 1.35%, closing at 12,299.68. The U.S. economy added 438,000 jobs in "
"August, the Labor Department said. Economists polled by Dow Jones expected 273,000 "
"jobs. However, wages rose less than expected last month. Stocks posted a stunning "
"turnaround on Friday, after initially falling on the stronger-than-expected jobs report. "
"At its session low, the Dow had fallen as much as 198 points; it surged by more than "
"500 points at the height of the rally. The Nasdaq and the S&P 500 slid by 0.8% during "
"their lowest points in the day. Traders were unclear of the reason for the intraday "
"reversal. Some noted it could be the softer wage number in the jobs report that made "
"investors rethink their earlier bearish stance. Others noted the pullback in yields from "
"the dayās highs. Part of the rally may just be to do a market that had gotten extremely "
"oversold with the S&P 500 at one point this week down more than 9% from its high earlier "
"this year. Yields initially surged after the report, with the 10-year Treasury rate trading "
"near its highest level in 14 years. The benchmark rate later eased from those levels, but "
"was still up around 6 basis points at 4.58%. 'Weāre seeing a little bit of a give back "
"in yields from where we were around 4.8%. [With] them pulling back a bit, I think thatās "
"helping the stock market,' said Margaret Jones, chief investment officer at Vibrant Industries "
"Capital Advisors. 'Weāve had a lot of weakness in the market in recent weeks, and potentially "
"some oversold conditions.'"}
]
return test_list
# this is the main script to be run
def bling_meets_llmware_hello_world (model_name):
t0 = time.time()
# load the questions
test_list = hello_world_questions()
print(f"\n > Loading Model: {model_name}...")
# load the model
prompter = Prompt().load_model(model_name)
t1 = time.time()
print(f"\n > Model {model_name} load time: {t1-t0} seconds")
for i, entries in enumerate(test_list):
print(f"\n{i+1}. Query: {entries['query']}")
# run the prompt
output = prompter.prompt_main(entries["query"],context=entries["context"]
, prompt_name="default_with_context",temperature=0.30)
# print out the results
llm_response = output["llm_response"].strip("\n")
print(f"LLM Response: {llm_response}")
print(f"Gold Answer: {entries['answer']}")
print(f"LLM Usage: {output['usage']}")
t2 = time.time()
print(f"\nTotal processing time: {t2-t1} seconds")
return 0
if __name__ == "__main__":
# list of 'rag-instruct' laptop-ready small bling models on HuggingFace
pytorch_models = ["llmware/bling-1b-0.1", # most popular
"llmware/bling-tiny-llama-v0", # fastest
"llmware/bling-1.4b-0.1",
"llmware/bling-falcon-1b-0.1",
"llmware/bling-cerebras-1.3b-0.1",
"llmware/bling-sheared-llama-1.3b-0.1",
"llmware/bling-sheared-llama-2.7b-0.1",
"llmware/bling-red-pajamas-3b-0.1",
"llmware/bling-stable-lm-3b-4e1t-v0",
"llmware/bling-phi-3" # most accurate (and newest)
]
# Quantized GGUF versions generally load faster and run nicely on a laptop with at least 16 GB of RAM
gguf_models = ["bling-phi-3-gguf", "bling-stablelm-3b-tool", "dragon-llama-answer-tool", "dragon-yi-answer-tool", "dragon-mistral-answer-tool"]
# try model from either pytorch or gguf model list
# the newest (and most accurate) is 'bling-phi-3-gguf'
bling_meets_llmware_hello_world(gguf_models[0]
# check out the model card on Huggingface for RAG benchmark test performance results and other useful information
from llmware.configs import LLMWareConfig
# to set the collection database - mongo, sqlite, postgres
LLMWareConfig().set_active_db("mongo")
# to set the vector database (or declare when installing)
# --options: milvus, pg_vector (postgres), redis, qdrant, faiss, pinecone, mongo atlas
LLMWareConfig().set_vector_db("milvus")
# for fast start - no installations required
LLMWareConfig().set_active_db("sqlite")
LLMWareConfig().set_vector_db("chromadb") # try also faiss and lancedb
# for single postgres deployment
LLMWareConfig().set_active_db("postgres")
LLMWareConfig().set_vector_db("postgres")
# to install mongo, milvus, postgres - see the docker-compose scripts as well as examples
from llmware.agents import LLMfx
text = ("Tesla stock fell 8% in premarket trading after reporting fourth-quarter revenue and profit that "
"missed analystsā estimates. The electric vehicle company also warned that vehicle volume growth in "
"2024 'may be notably lower' than last yearās growth rate. Automotive revenue, meanwhile, increased "
"just 1% from a year earlier, partly because the EVs were selling for less than they had in the past. "
"Tesla implemented steep price cuts in the second half of the year around the world. In a Wednesday "
"presentation, the company warned investors that itās 'currently between two major growth waves.'")
# create an agent using LLMfx class
agent = LLMfx()
# load text to process
agent.load_work(text)
# load 'models' as 'tools' to be used in analysis process
agent.load_tool("sentiment")
agent.load_tool("extract")
agent.load_tool("topics")
agent.load_tool("boolean")
# run function calls using different tools
agent.sentiment()
agent.topics()
agent.extract(params=["company"])
agent.extract(params=["automotive revenue growth"])
agent.xsum()
agent.boolean(params=["is 2024 growth expected to be strong? (explain)"])
# at end of processing, show the report that was automatically aggregated by key
report = agent.show_report()
# displays a summary of the activity in the process
activity_summary = agent.activity_summary()
# list of the responses gathered
for i, entries in enumerate(agent.response_list):
print("update: response analysis: ", i, entries)
output = {"report": report, "activity_summary": activity_summary, "journal": agent.journal}
# This example illustrates a simple contract analysis
# using a RAG-optimized LLM running locally
import os
import re
from llmware.prompts import Prompt, HumanInTheLoop
from llmware.setup import Setup
from llmware.configs import LLMWareConfig
def contract_analysis_on_laptop (model_name):
# In this scenario, we will:
# -- download a set of sample contract files
# -- create a Prompt and load a BLING LLM model
# -- parse each contract, extract the relevant passages, and pass questions to a local LLM
# Main loop - Iterate thru each contract:
#
# 1. parse the document in memory (convert from PDF file into text chunks with metadata)
# 2. filter the parsed text chunks with a "topic" (e.g., "governing law") to extract relevant passages
# 3. package and assemble the text chunks into a model-ready context
# 4. ask three key questions for each contract to the LLM
# 5. print to the screen
# 6. save the results in both json and csv for furthe processing and review.
# Load the llmware sample files
print (f"\n > Loading the llmware sample files...")
sample_files_path = Setup().load_sample_files()
contracts_path = os.path.join(sample_files_path,"Agreements")
# Query list - these are the 3 main topics and questions that we would like the LLM to analyze for each contract
query_list = {"executive employment agreement": "What are the name of the two parties?",
"base salary": "What is the executive's base salary?",
"vacation": "How many vacation days will the executive receive?"}
# Load the selected model by name that was passed into the function
print (f"\n > Loading model {model_name}...")
prompter = Prompt().load_model(model_name, temperature=0.0, sample=False)
# Main loop
for i, contract in enumerate(os.listdir(contracts_path)):
# excluding Mac file artifact (annoying, but fact of life in demos)
if contract != ".DS_Store":
print("\nAnalyzing contract: ", str(i+1), contract)
print("LLM Responses:")
for key, value in query_list.items():
# step 1 + 2 + 3 above - contract is parsed, text-chunked, filtered by topic key,
# ... and then packaged into the prompt
source = prompter.add_source_document(contracts_path, contract, query=key)
# step 4 above - calling the LLM with 'source' information already packaged into the prompt
responses = prompter.prompt_with_source(value, prompt_name="default_with_context")
# step 5 above - print out to screen
for r, response in enumerate(responses):
print(key, ":", re.sub("[\n]"," ", response["llm_response"]).strip())
# We're done with this contract, clear the source from the prompt
prompter.clear_source_materials()
# step 6 above - saving the analysis to jsonl and csv
# Save jsonl report to jsonl to /prompt_history folder
print("\nPrompt state saved at: ", os.path.join(LLMWareConfig.get_prompt_path(),prompter.prompt_id))
prompter.save_state()
# Save csv report that includes the model, response, prompt, and evidence for human-in-the-loop review
csv_output = HumanInTheLoop(prompter).export_current_interaction_to_csv()
print("csv output saved at: ", csv_output)
if __name__ == "__main__":
# use local cpu model - try the newest - RAG finetune of Phi-3 quantized and packaged in GGUF
model = "bling-phi-3-gguf"
contract_analysis_on_laptop(model)
Qwen2 Models for RAG, Function Calling, and Chat
Start using Qwen2 models quickly with resources for Retrieval-Augmented Generation (RAG), function calling, and chat functionalities.
Phi-3 Function Calling Models
Get started in minutes with Phi-3 models designed for function calling.
BizBot: RAG + SQL Local Chatbot
Implement a local chatbot for business intelligence using RAG and SQL.
Lecture Tool
Enables Q&A on voice recordings for education and lecture analysis.
Web Services for Financial Research
An end-to-end example demonstrating web services with agent calls for financial research.
Voice Transcription with WhisperCPP
Start transcription projects with WhisperCPP, featuring tools for sample file usage and famous speeches.
Natural Language Query to CSV
Convert natural language queries to CSV with Slim-SQL, supporting custom Postgres tables.
OCR Embedded Document Images
Extract text systematically from images embedded in documents for enhanced document processing.
Enhanced Document Parsing for PDFs, Word, PowerPoint, and Excel
Improved text-chunking controls, table extraction, and content parsing.
Agent Inference Server
Set up an inference server for multi-model agents to optimize deployments.
Optimizing Accuracy of RAG Prompts
Tutorials for tuning RAG prompt settings for increased accuracy.
Step 1 - Install llmware - pip3 install llmware
or pip3 install 'llmware[full]'
End-to-End Scenario - Function Calls with SLIM Extract and Web Services for Financial Research
Analyzing Voice Files - Great Speeches with LLM Query and Extract
New to LLMWare - Fast Start tutorial series
Getting Setup - Getting Started
SLIM Examples - SLIM Models
Example | Detail |
---|---|
1. BLING models fast start (code / video) | Get started with fast, accurate, CPU-based models - question-answering, key-value extraction, and basic summarization. |
2. Parse and Embed 500 PDF Documents (code) | End-to-end example for Parsing, Embedding and Querying UN Resolution documents with Milvus |
3. Hybrid Retrieval - Semantic + Text (code) | Using 'dual pass' retrieval to combine best of semantic and text search |
4. Multiple Embeddings with PG Vector (code / video) | Comparing Multiple Embedding Models using Postgres / PG Vector |
5. DRAGON GGUF Models (code / video) | State-of-the-Art 7B RAG GGUF Models. |
6. RAG with BLING (code / video) | Using contract analysis as an example, experiment with RAG for complex document analysis and text extraction using llmware 's BLING ~1B parameter GPT model running on your laptop. |
7. Master Service Agreement Analysis with DRAGON (code / video) | Analyzing MSAs using DRAGON YI 6B Model. |
8. Streamlit Example (code) | Ask questions to Invoices with UI run inference. |
9. Integrating LM Studio (code / video) | Integrating LM Studio Models with LLMWare |
10. Prompts With Sources (code) | Attach wide range of knowledge sources directly into Prompts. |
11. Fact Checking (code) | Explore the full set of evidence methods in this example script that analyzes a set of contracts. |
12. Using 7B GGUF Chat Models (code) | Using 4 state of the art 7B chat models in minutes running locally |
Check out: llmware examples
š¬ Check out these videos to get started quickly:
The llmware repo can be pulled locally to get access to all the examples, or to work directly with the latest version of the llmware code.
git clone git@github.com:llmware-ai/llmware.git
We have provided a welcome_to_llmware automation script in the root of the repository folder. After cloning:
.\welcome_to_llmware_windows.sh
sh ./welcome_to_llmware.sh
Alternatively, if you prefer to complete setup without the welcome automation script, then the next steps include:
install requirements.txt - inside the /llmware path - e.g., pip3 install -r llmware/requirements.txt
install requirements_extras.txt - inside the /llmware path - e.g., pip3 install -r llmware/requirements_extras.txt
(Depending upon your use case, you may not need all or any of these installs, but some of these will be used in the examples.)
run examples - copy one or more of the example .py files into the root project path. (We have seen several IDEs that will attempt to run interactively from the nested /example path, and then not have access to the /llmware module - the easy fix is to just copy the example you want to run into the root path).
install vector db - no-install vector db options include milvus lite, chromadb, faiss and lancedb - which do not require a server install, but do require that you install the python sdk library for that vector db, e.g., pip3 install pymilvus
, or pip3 install chromadb
. If you look in examples/Embedding, you will see examples for getting started with various vector DB, and in the root of the repo, you will see easy-to-get-started docker compose scripts for installing milvus, postgres/pgvector, mongo, qdrant, neo4j, and redis.
Pytorch 2.3 note: We have recently seen issues with Pytorch==2.3 on some platforms - if you run into any issues, we have seen that uninstalling Pytorch and downleveling to Pytorch==2.1 usually solves the problem.
Numpy 2.0 note: we have seen issues with numpy 2.0 with many libraries not yet supporting. Our pip install setup will accept numpy 2.0 (to avoid pip conflicts), but if you pull from repo, we restrict numpy to versions <2. If you run into issues with numpy, we have found that they can be fixed by downgrading numpy to <2, e.g., 1.26.4. To use WhisperCPP, you should downlevel to numpy <2.
from llmware.configs import LLMWareConfig
LLMWareConfig().set_active_db("sqlite")
LLMWareConfig().set_vector_db("chromadb")
curl -o docker-compose.yaml https://raw.githubusercontent.com/llmware-ai/llmware/main/docker-compose.yaml
docker compose up -d
from llmware.configs import LLMWareConfig
LLMWareConfig().set_active_db("mongo")
LLMWareConfig().set_vector_db("milvus")
curl -o docker-compose.yaml https://raw.githubusercontent.com/llmware-ai/llmware/main/docker-compose-pgvector.yaml
docker compose up -d
from llmware.configs import LLMWareConfig
LLMWareConfig().set_active_db("postgres")
LLMWareConfig().set_vector_db("postgres")
# scripts to deploy other options
curl -o docker-compose.yaml https://raw.githubusercontent.com/llmware-ai/llmware/main/docker-compose-redis-stack.yaml
LLMWare is an open platform and supports a wide range of open source and proprietary models. To use LLMWare, you do not need to use any proprietary LLM - we would encourage you to experiment with SLIM, BLING, DRAGON, Industry-BERT, the GGUF examples, along with bringing in your favorite models from HuggingFace and Sentence Transformers.
If you would like to use a proprietary model, you will need to provide your own API Keys. API keys and secrets for models, aws, and pinecone can be set-up for use in environment variables or passed directly to method calls.
Like our models, we aspire for llmware to be "small, but mighty" - easy to use and get started, but packing a powerful punch!
Interested in contributing to llmware? Information on ways to participate can be found in our Contributors Guide. As with all aspects of this project, contributing is governed by our Code of Conduct.
Questions and discussions are welcome in our github discussions.
See also additional deployment/install release notes in wheel_archives
Friday, November 8 - v0.3.9
Sunday, October 27 - v0.3.8
Sunday, October 6 - v0.3.7
Tuesday, October 1 - v0.3.6
Monday, August 26 - v0.3.5
For complete history of release notes, please open the Change log tab.
Supported Operating Systems: MacOS (Metal - M1/M2/M3), Linux (x86), and Windows
Supported Vector Databases: Milvus, Postgres (PGVector), Neo4j, Redis, LanceDB, ChromaDB, Qdrant, FAISS, Pinecone, Mongo Atlas Vector Search
Supported Text Index Databases: MongoDB, Postgres, SQLite
To enable the OCR parsing capabilities, install Tesseract v5.3.3 and Poppler v23.10.0 native packages.
Monday, July 29 - v03.4
Monday, July 8 - v03.3
Saturday, June 29 - v0.3.2
Saturday, June 22 - v0.3.1
Tuesday, June 4 - v0.3.0
pip3 install llmware
which will support most use cases, and a larger install pip3 install 'llmware[full]'
with other commonly-used libraries.Wednesday, May 22 - v0.2.15
Saturday, May 18 - v0.2.14
Sunday, May 12 - v0.2.13
Sunday, May 5 - v0.2.12 Update
Monday, April 29 - v0.2.11 Update
Monday, April 22 - v0.2.10 Update
Tuesday, April 16 - v0.2.9 Update
Tuesday, April 9 - v0.2.8 Update
Wednesday, April 3 - v0.2.7 Update
Friday, March 22 - v0.2.6 Update
Thursday, March 14 - v0.2.5 Update
Wednesday, February 28 - v0.2.4 Update
Friday, February 16 - v0.2.3 Update
Latest Updates - 19 Jan 2024 - llmware v0.2.0
Latest Updates - 15 Jan 2024: llmware v0.1.15
Latest Updates - 30 Dec 2023: llmware v0.1.14
Latest Updates - 22 Dec 2023: llmware v0.1.13
Added 3 new vector databases - Postgres (PG Vector), Redis, and Qdrant
Improved support for integrating sentence transformers directly in the model catalog
Improvements in the model catalog attributes
Multiple new Examples in Models & Embeddings, including GGUF, Vector database, and model catalog
17 Dec 2023: llmware v0.1.12
8 Dec 2023: llmware v0.1.11
30 Nov 2023: llmware v0.1.10
24 Nov 2023: llmware v0.1.9
17 Nov 2023: llmware v0.1.8
14 Nov 2023: llmware v0.1.7
03 Nov 2023: llmware v0.1.6
27 Oct 2023: llmware v0.1.5
llmware
BLING models).20 Oct 2023: llmware v0.1.4
13 Oct 2023: llmware v0.1.3
06 Oct 2023: llmware v0.1.1
02 Oct 2023: llmware v0.1.0 š„ Initial release of llmware to open source!! š„
Revolutionizing AI Deployment: Unleashing AI Acceleration with Intel's AI PCs and Model HQ by LLMWare AI PC Model HQ.pdf
Revultionizing AI Deployment (Intel Abstract Version) LNL White paper (Abstract Version) final.pdf
Accelerating AI Powered Productivity with AI PCs Laptop.Performance.WP.Final (10).pdf
Privacy Policy AI BLOKS PRIVACY POLICY (1.2.25).docx
Terms of Service AI Bloks Terms of Service 1.3.25.docx
Acceptable Use PolicyAcceptable Use Policy for Model HQ by AI BLOKS LLC.docx