Infrastructure
Nomadic is an enterprise-grade toolkit by NomadicML focused on parameter search for ML teams to continuously optimize compound AI systems, from pre to post-production. Rapidly experiment and keep hyperparameters, prompts, and all aspects of your system production-ready. Teams use Nomadic to deeply understand their AI system's best levers to boost performance as it scales.
Join our Discord!
You can install nomadic
with pip (Python 3.9+ required):
pip install nomadic
Full documentation can be found here: https://docs.nomadicml.com.
Please check it out for the most up-to-date tutorials, cookbooks, SDK references, and other resources!
Follow the instructions below to get started on local development of the Nomadic SDK. Afterwards select the produced Python .venv
environment in your IDE of choice.
make setup_dev_environment
source .venv/bin/activate
Coming soon!
Run:
source .venv/bin/activate # If .venv isn't already activated
make build
For other Quickstarts based on your application: including LLM safety, advanced RAGs, transcription/summarization (across fintech, support, healthcare), or especially compound AI systems (multiple components > monolithic models), check out our 🍴Cookbooks.
import os
# Import relevant Nomadic libraries
from nomadic.experiment import Experiment
from nomadic.model import OpenAIModel
from nomadic.tuner import tune
from nomadic.experiment.base import Experiment, retry_with_exponential_backoff
from nomadic.experiment.rag import (
run_rag_pipeline,
run_retrieval_pipeline,
run_inference_pipeline,
obtain_rag_inputs,
save_run_results,
load_run_results,
get_best_run_result,
create_inference_heatmap
)
import pandas as pd
pd.set_option('display.max_colwidth', None)
import json
# Insert your OPENAI_API_KEY below
os.environ["OPENAI_API_KEY"]= <YOUR_OPENAI_KEY>
Say we want to explore (all of!) the following hyperparameters and search spaces to optimize a RAG performance:
Parameter | Supported values | Pipeline Stage |
---|---|---|
chunk_size | 128, 256, 512 | Retrieval |
top_k | 1, 3, 5 | Retrieval |
overlap | 50, 100, 150 | Retrieval |
similarity_threshold | 0.5, 0.7, 0.9 | Retrieval |
embedding_model | "text-embedding-ada-002", "text-embedding-curie-001" | Retrieval |
model_name | "gpt-3.5-turbo", "gpt-4" | Both |
temperature | 0.3, 0.7, 0.9 | Inference |
max_tokens | 300, 500, 700 | Inference |
retrieval_strategy | "sentence-window", "full-document" | Retrieval |
reranking_model | true, false | Inference |
query_transformation | "rephrasing", "HyDE", "Advanced contextual refinement" | Retrieval |
reranking_step | "BM25-based reranking", "dense passage retrieval (DPR)", "cross-encoder" | Inference |
reranking_model_type | "BM25", "DPR", "ColBERT", "cross-encoder" | Retrieval |
chunk_size = tune.choice([256, 512])
temperature = tune.choice([0.1, 0.9])
overlap = tune.choice([25])
similarity_threshold = tune.choice([50])
top_k = tune.choice([1, 2])
max_tokens = tune.choice([100, 200])
model_name = tune.choice(["gpt-3.5-turbo", "gpt-4o"])
embedding_model = tune.choice(["text-embedding-ada-002", "text-embedding-curie-001"])
retrieval_strategy = tune.choice(["sentence-window", "auto-merging"])
eval_json = {
"queries": {
"query1": "Describe the architecture of convolutional neural networks.",
"query2": "What are the ethical implications of AI in healthcare?",
},
"responses": {
"query1": "Convolutional neural networks consist of an input layer, convolutional layers, activation functions, pooling layers, fully connected layers, and an output layer.",
"query2": "Ethical implications include issues of privacy, autonomy, and the potential for bias, which must be carefully managed to avoid harm.",
}
}
pdf_url = "https://www.dropbox.com/scl/fi/sbko6nyzsuw00f2nhxa38/CS229_Lecture_Notes.pdf?rlkey=pebhb2qrdh08bnyxtus8qm11v&st=yha4ikm2&dl=1"
Nomadic supports reranking models to enhance the retrieval stage of the RAG pipeline. Reranking models, such as cross-encoders, can significantly improve the relevance of retrieved documents by scoring and reordering them based on their contextual relevance to the query. This process ensures that the most pertinent documents are provided to the language model for generating accurate and contextually appropriate responses.
To enable reranking in your experiments, specify a reranking_model in the hyperparameters and include it in the retrieval pipeline. You can experiment with different reranking models to find the one that best suits your use case. Currently supported options are: BM25, DPR, ColBERT, and Cross-encoder.
In this demo, we use specialized evaluation metrics that work specifically well for the retrieval / inferencing stages of a RAG.
BM25 Scoring:
Average Retrieval Score:
Retrieval Time (in milliseconds):
Hallucination Score:
Hallucination Score = 1 - (Matching Tokens / Total Predicted Tokens)
# Obtain RAG inputs
docs, eval_qs, ref_response_strs = obtain_rag_inputs(pdf_url=pdf_url, eval_json=eval_json)
# Run retrieval experiment
experiment_retrieval = Experiment(
param_fn=run_retrieval_pipeline,
param_dict={
"top_k": top_k,
"model_name": model_name,
"retrieval_strategy": retrieval_strategy,
"embedding_model": embedding_model
},
fixed_param_dict={
"docs": docs,
"eval_qs": eval_qs[:10],
"ref_response_strs": ref_response_strs[:10],
},
)
# After the retrieval is done
retrieval_results = experiment_retrieval.run(param_dict={
"top_k": top_k,
"model_name": model_name,
"retrieval_strategy": retrieval_strategy,
"embedding_model": embedding_model
})
save_run_results(retrieval_results, "run_results.json")
# Load the saved results and get the best run result
loaded_results = load_run_results("run_results.json")
best_run_result = get_best_run_result(loaded_results)
best_retrieval_results = best_run_result['metadata'].get("best_retrieval_results", [])
# Run inference experiment
experiment_inference = Experiment(
param_fn=run_inference_pipeline,
params={"temperature","model_name", "max_tokens", "reranking_model", "similarity_threshold"},
fixed_param_dict={
"best_retrieval_results": best_run_result['metadata'].get("best_retrieval_results", []),
"ref_response_strs": ref_response_strs[:10], # Make sure this matches the number of queries used in retrieval
},
)
inference_results = experiment_inference.run(param_dict={
"model_name": model_name,
"temperature": temperature,
"max_tokens": max_tokens,
"reranking_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
"similarity_threshold": 0.7,
})
Now we visualize the retrieval score (for the best run result) along with the inferencing scores for different configurations.
create_retrieval_heatmap(retrieval_results)
Here are the results using the best-performing parameter configuration:
create_inference_heatmap(inference_results)
Interested in contributing? Contributions to Nomadic as well as contributing integrations are both accepted and highly encouraged! Send questions in our Discord.