Rankify

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

Infrastructure

reranking

rag

question-answering

nlp

[ English | 中文]

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

If you like our Framework, don't hesitate to ⭐ star this repository ⭐. This helps us to make the Framework more better and scalable to different models and methods 🤗.

A modular and efficient retrieval, reranking and RAG framework designed to work with state-of-the-art models for retrieval, ranking and rag tasks.

Rankify is a Python toolkit designed for unified retrieval, re-ranking, and retrieval-augmented generation (RAG) research. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7 retrieval techniques, 24 state-of-the-art re-ranking models, and multiple RAG methods. Rankify provides a modular and extensible framework, enabling seamless experimentation and benchmarking across retrieval pipelines. Comprehensive documentation, open-source implementation, and pre-built evaluation tools make Rankify a powerful resource for researchers and practitioners in the field.

🚀 Demo

To run the demo locally:

# Make sure Rankify is installed
pip install streamlit

# Then run the demo
streamlit run demo.py

https://github.com/user-attachments/assets/13184943-55db-4f0c-b509-fde920b809bc

🔧 Installation

Set up the virtual environment

First, create and activate a conda environment with Python 3.10:

conda create -n rankify python=3.10
conda activate rankify

Install PyTorch 2.5.1

we recommend installing Rankify with PyTorch 2.5.1 for Rankify. Refer to the PyTorch installation page for platform-specific installation commands.

If you have access to GPUs, it's recommended to install the CUDA version 12.4 or 12.6 of PyTorch, as many of the evaluation metrics are optimized for GPU use.

To install Pytorch 2.5.1 you can install it from the following cmd

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Basic Installation

To install Rankify, simply use pip (requires Python 3.10+):

pip install rankify

This will install the base functionality required for retrieval, re-ranking, and retrieval-augmented generation (RAG).

Recommended Installation

For full functionality, we recommend installing Rankify with all dependencies:

pip install "rankify[all]"

This ensures you have all necessary modules, including retrieval, re-ranking, and RAG support.

Optional Dependencies

If you prefer to install only specific components, choose from the following:

# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install "rankify[retriever]"

# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install "rankify[reranking]"

Or, to install from GitHub for the latest development version:

git clone https://github.com/DataScienceUIBK/rankify.git
cd rankify
pip install -e .
# For full functionality we recommend installing Rankify with all dependencies:
pip install -e ".[all]"
# Install dependencies for retrieval only (BM25, DPR, ANCE, etc.)
pip install -e ".[retriever]"
# Install base re-ranking with vLLM support for `FirstModelReranker`, `LiT5ScoreReranker`, `LiT5DistillReranker`, `VicunaReranker`, and `ZephyrReranker'.
pip install -e ".[reranking]"

Using ColBERT Retriever

If you want to use ColBERT Retriever, follow these additional setup steps:

# Install GCC and required libraries
conda install -c conda-forge gcc=9.4.0 gxx=9.4.0
conda install -c conda-forge libstdcxx-ng

# Export necessary environment variables
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
export CC=gcc
export CXX=g++
export PATH=$CONDA_PREFIX/bin:$PATH

# Clear cached torch extensions
rm -rf ~/.cache/torch_extensions/*

:rocket: Quick Start

1️⃣ Pre-retrieved Datasets

We provide 1,000 pre-retrieved documents per dataset, which you can download from:

🔗 Hugging Face Dataset Repository

Dataset Format

The pre-retrieved documents are structured as follows:

[
    {
        "question": "...",
        "answers": ["...", "...", ...],
        "ctxs": [
            {
                "id": "...",         // Passage ID from database TSV file
                "score": "...",      // Retriever score
                "has_answer": true|false  // Whether the passage contains the answer
            }
        ]
    }
]

Access Datasets in Rankify

You can easily download and use pre-retrieved datasets through Rankify.

List Available Datasets

To see all available datasets:

from rankify.dataset.dataset import Dataset 

# Display available datasets
Dataset.avaiable_dataset()

Retriever Datasets

from rankify.dataset.dataset import Dataset
# Download BM25-retrieved documents for nq-dev
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download BGE-retrieved documents for nq-dev
dataset = Dataset(retriever="bge", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download ColBERT-retrieved documents for nq-dev
dataset = Dataset(retriever="colbert", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-DPR-retrieved documents for nq-dev
dataset = Dataset(retriever="mss-dpr", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="mss", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download MSS-retrieved documents for nq-dev
dataset = Dataset(retriever="contriever", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)
# Download ANCE-retrieved documents for nq-dev
dataset = Dataset(retriever="ance", dataset_name="nq-dev", n_docs=100)
documents = dataset.download(force_download=False)

Load Pre-retrieved Dataset from File

If you have already downloaded a dataset, you can load it directly:

from rankify.dataset.dataset import Dataset

# Load pre-downloaded BM25 dataset for WebQuestions
documents = Dataset.load_dataset('./tests/out-datasets/bm25/web_questions/test.json', 100)

Now, you can integrate retrieved documents with re-ranking and RAG workflows! 🚀

Feature Comparison for Pre-Retrieved Datasets

The following table provides an overview of the availability of different retrieval methods (BM25, DPR, ColBERT, ANCE, BGE, Contriever) for each dataset.

✅ Completed ⏳ Part Completed, Pending other Parts 🕒 Pending

Dataset	BM25	DPR	ColBERT	ANCE	BGE	Contriever
2WikimultihopQA	✅	🕒	🕒	🕒	🕒	🕒
ArchivialQA	✅	🕒	🕒	🕒	🕒	🕒
ChroniclingAmericaQA	✅	🕒	🕒	🕒	🕒	🕒
EntityQuestions	✅	🕒	🕒	🕒	🕒	🕒
AmbigQA	✅	🕒	✅	🕒	🕒	🕒
ARC	✅	🕒	🕒	🕒	🕒	🕒
ASQA	✅	🕒	🕒	🕒	🕒	🕒
MS MARCO	🕒	🕒	🕒	🕒	🕒	🕒
AY2	✅	🕒	🕒	🕒	🕒	🕒
Bamboogle	✅	🕒	🕒	🕒	🕒	🕒
BoolQ	✅	🕒	✅	🕒	✅	🕒
CommonSenseQA	✅	🕒	✅	🕒	✅	🕒
CuratedTREC	✅	🕒	✅	⏳	✅	🕒
ELI5	✅	🕒	🕒	🕒	🕒	🕒
FERMI	✅	🕒	✅	⏳	✅	🕒
FEVER	✅	🕒	🕒	🕒	🕒	🕒
HellaSwag	✅	🕒	🕒	🕒	🕒	🕒
HotpotQA	✅	🕒	🕒	🕒	🕒	🕒
MMLU	✅	🕒	🕒	🕒	🕒	🕒
Musique	✅	🕒	🕒	🕒	🕒	🕒
NarrativeQA	✅	🕒	✅	⏳	✅	🕒
NQ	✅	🕒	✅	⏳	✅	🕒
OpenbookQA	✅	🕒	🕒	🕒	🕒	🕒
PIQA	✅	🕒	✅	🕒	🕒	🕒
PopQA	✅	🕒	✅	⏳	✅	🕒
Quartz	✅	🕒	🕒	🕒	🕒	🕒
SIQA	✅	🕒	✅	🕒	✅	🕒
StrategyQA	✅	🕒	🕒	🕒	🕒	🕒
TREX	✅	🕒	🕒	🕒	🕒	🕒
TriviaQA	✅	🕒	✅	⏳	✅	🕒
TruthfulQA	✅	🕒	🕒	🕒	🕒	🕒
TruthfulQA	✅	🕒	🕒	🕒	🕒	🕒
WebQ	✅	🕒	✅	⏳	✅	🕒
WikiQA	✅	🕒	✅	⏳	✅	🕒
WikiAsp	✅	🕒	🕒	🕒	🕒	🕒
WikiPassageQA	✅	🕒	✅	⏳	✅	🕒
WNED	✅	🕒	🕒	🕒	🕒	🕒
WoW	✅	🕒	🕒	🕒	🕒	🕒
Zsre	✅	🕒	🕒	🕒	🕒	🕒

2️⃣ Running Retrieval

To perform retrieval using Rankify, you can choose from various retrieval methods such as BM25, DPR, ANCE, Contriever, ColBERT, and BGE.

Example: Running Retrieval on Sample Queries

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.retrievers.retriever import Retriever

# Sample Documents
documents = [
    Document(question=Question("the cast of a good day to die hard?"), answers=Answer([
            "Jai Courtney",
            "Sebastian Koch",
            "Radivoje Bukvić",
            "Yuliya Snigir",
            "Sergei Kolesnikov",
            "Mary Elizabeth Winstead",
            "Bruce Willis"
        ]), contexts=[]),
    Document(question=Question("Who wrote Hamlet?"), answers=Answer(["Shakespeare"]), contexts=[])
]

# BM25 retrieval on Wikipedia
bm25_retriever_wiki = Retriever(method="bm25", n_docs=5, index_type="wiki")

# BM25 retrieval on MS MARCO
bm25_retriever_msmacro = Retriever(method="bm25", n_docs=5, index_type="msmarco")


# DPR (multi-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="wiki")

# DPR (multi-encoder) retrieval on MS MARCO
dpr_retriever_msmacro = Retriever(method="dpr", model="dpr-multi", n_docs=5, index_type="msmarco")

# DPR (single-encoder) retrieval on Wikipedia
dpr_retriever_wiki = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="wiki")

# DPR (single-encoder) retrieval on MS MARCO
dpr_retriever_msmacro = Retriever(method="dpr", model="dpr-single", n_docs=5, index_type="msmarco")

# ANCE retrieval on Wikipedia
ance_retriever_wiki = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="wiki")

# ANCE retrieval on MS MARCO
ance_retriever_msmacro = Retriever(method="ance", model="ance-multi", n_docs=5, index_type="msmarco")


# Contriever retrieval on Wikipedia
contriever_retriever_wiki = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="wiki")

# Contriever retrieval on MS MARCO
contriever_retriever_msmacro = Retriever(method="contriever", model="facebook/contriever-msmarco", n_docs=5, index_type="msmarco")


# ColBERT retrieval on Wikipedia
colbert_retriever_wiki = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="wiki")

# ColBERT retrieval on MS MARCO
colbert_retriever_msmacro = Retriever(method="colbert", model="colbert-ir/colbertv2.0", n_docs=5, index_type="msmarco")


# BGE retrieval on Wikipedia
bge_retriever_wiki = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="wiki")

# BGE retrieval on MS MARCO
bge_retriever_msmacro = Retriever(method="bge", model="BAAI/bge-large-en-v1.5", n_docs=5, index_type="msmarco")


# Hyde retrieval on Wikipedia
hyde_retriever_wiki = Retriever(method="hyde" , n_docs=5, index_type="wiki", api_key=OPENAI_API_KEY )

# Hyde retrieval on MS MARCO
hyde_retriever_msmacro = Retriever(method="hyde", n_docs=5, index_type="msmarco", api_key=OPENAI_API_KEY)

Running Retrieval

After defining the retriever, you can retrieve documents using:

retrieved_documents = bm25_retriever_wiki.retrieve(documents)

for i, doc in enumerate(retrieved_documents):
    print(f"\nDocument {i+1}:")
    print(doc)

3️⃣ Running Reranking

Rankify provides support for multiple reranking models. Below are examples of how to use each model.

Example: Reranking a Document

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.models.reranking import Reranking

# Sample document setup
question = Question("When did Thomas Edison invent the light bulb?")
answers = Answer(["1879"])
contexts = [
    Context(text="Lightning strike at Seoul National University", id=1),
    Context(text="Thomas Edison tried to invent a device for cars but failed", id=2),
    Context(text="Coffee is good for diet", id=3),
    Context(text="Thomas Edison invented the light bulb in 1879", id=4),
    Context(text="Thomas Edison worked with electricity", id=5),
]
document = Document(question=question, answers=answers, contexts=contexts)

# Initialize the reranker
reranker = Reranking(method="monot5", model_name="monot5-base-msmarco")

# Apply reranking
reranker.rank([document])

# Print reordered contexts
for context in document.reorder_contexts:
    print(f"  - {context.text}")

Examples of Using Different Reranking Models

# UPR
model = Reranking(method='upr', model_name='t5-base')

# API-Based Rerankers
model = Reranking(method='apiranker', model_name='voyage', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='jina', api_key='your-api-key')
model = Reranking(method='apiranker', model_name='mixedbread.ai', api_key='your-api-key')

# Blender Reranker
model = Reranking(method='blender_reranker', model_name='PairRM')

# ColBERT Reranker
model = Reranking(method='colbert_ranker', model_name='Colbert')

# EchoRank
model = Reranking(method='echorank', model_name='flan-t5-large')

# First Ranker
model = Reranking(method='first_ranker', model_name='base')

# FlashRank
model = Reranking(method='flashrank', model_name='ms-marco-TinyBERT-L-2-v2')

# InContext Reranker
Reranking(method='incontext_reranker', model_name='llamav3.1-8b')

# InRanker
model = Reranking(method='inranker', model_name='inranker-small')

# ListT5
model = Reranking(method='listt5', model_name='listt5-base')

# LiT5 Distill
model = Reranking(method='lit5distill', model_name='LiT5-Distill-base')

# LiT5 Score
model = Reranking(method='lit5score', model_name='LiT5-Distill-base')

# LLM Layerwise Ranker
model = Reranking(method='llm_layerwise_ranker', model_name='bge-multilingual-gemma2')

# LLM2Vec
model = Reranking(method='llm2vec', model_name='Meta-Llama-31-8B')

# MonoBERT
model = Reranking(method='monobert', model_name='monobert-large')

# MonoT5
Reranking(method='monot5', model_name='monot5-base-msmarco')

# RankGPT
model = Reranking(method='rankgpt', model_name='llamav3.1-8b')

# RankGPT API
model = Reranking(method='rankgpt-api', model_name='gpt-3.5', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='gpt-4', api_key="gpt-api-key")
model = Reranking(method='rankgpt-api', model_name='llamav3.1-8b', api_key="together-api-key")
model = Reranking(method='rankgpt-api', model_name='claude-3-5', api_key="claude-api-key")

# RankT5
model = Reranking(method='rankt5', model_name='rankt5-base')

# Sentence Transformer Reranker
model = Reranking(method='sentence_transformer_reranker', model_name='all-MiniLM-L6-v2')
model = Reranking(method='sentence_transformer_reranker', model_name='gtr-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='sentence-t5-base')
model = Reranking(method='sentence_transformer_reranker', model_name='distilbert-multilingual-nli-stsb-quora-ranking')
model = Reranking(method='sentence_transformer_reranker', model_name='msmarco-bert-co-condensor')

# SPLADE
model = Reranking(method='splade', model_name='splade-cocondenser')

# Transformer Ranker
model = Reranking(method='transformer_ranker', model_name='mxbai-rerank-xsmall')
model = Reranking(method='transformer_ranker', model_name='bge-reranker-base')
model = Reranking(method='transformer_ranker', model_name='bce-reranker-base')
model = Reranking(method='transformer_ranker', model_name='jina-reranker-tiny')
model = Reranking(method='transformer_ranker', model_name='gte-multilingual-reranker-base')
model = Reranking(method='transformer_ranker', model_name='nli-deberta-v3-large')
model = Reranking(method='transformer_ranker', model_name='ms-marco-TinyBERT-L-6')
model = Reranking(method='transformer_ranker', model_name='msmarco-MiniLM-L12-en-de-v1')

# TwoLAR
model = Reranking(method='twolar', model_name='twolar-xl')

# Vicuna Reranker
model = Reranking(method='vicuna_reranker', model_name='rank_vicuna_7b_v1')

# Zephyr Reranker
model = Reranking(method='zephyr_reranker', model_name='rank_zephyr_7b_v1_full')

4️⃣ Using Generator Module

Rankify provides a Generator Module to facilitate retrieval-augmented generation (RAG) by integrating retrieved documents into generative models for producing answers. Below is an example of how to use different generator methods.

from rankify.dataset.dataset import Document, Question, Answer, Context
from rankify.generator.generator import Generator

# Define question and answer
question = Question("What is the capital of France?")
answers = Answer(["Paris"])
contexts = [
    Context(id=1, title="France", text="The capital of France is Paris.", score=0.9),
    Context(id=2, title="Germany", text="Berlin is the capital of Germany.", score=0.5)
]

# Construct document
doc = Document(question=question, answers=answers, contexts=contexts)

# Initialize Generator (e.g., Meta Llama)
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

# Generate answer
generated_answers = generator.generate([doc])
print(generated_answers)  # Output: ["Paris"]

5️⃣ Evaluating with Metrics

Rankify provides built-in evaluation metrics for retrieval, re-ranking, and retrieval-augmented generation (RAG). These metrics help assess the quality of retrieved documents, the effectiveness of ranking models, and the accuracy of generated answers.

Evaluating Generated Answers

You can evaluate the quality of retrieval-augmented generation (RAG) results by comparing generated answers with ground-truth answers.

from rankify.metrics.metrics import Metrics
from rankify.dataset.dataset import Dataset

# Load dataset
dataset = Dataset('bm25', 'nq-test', 100)
documents = dataset.download(force_download=False)

# Initialize Generator
generator = Generator(method="in-context-ralm", model_name='meta-llama/Llama-3.1-8B')

# Generate answers
generated_answers = generator.generate(documents)

# Evaluate generated answers
metrics = Metrics(documents)
print(metrics.calculate_generation_metrics(generated_answers))

Evaluating Retrieval Performance

# Calculate retrieval metrics before reranking
metrics = Metrics(documents)
before_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=False)

print(before_ranking_metrics)

Evaluating Reranked Results

# Calculate retrieval metrics after reranking
after_ranking_metrics = metrics.calculate_retrieval_metrics(ks=[1, 5, 10, 20, 50, 100], use_reordered=True)
print(after_ranking_metrics)

📜 Supported Models

1️⃣ Retrievers

✅ BM25
✅ DPR
✅ ColBERT
✅ ANCE
✅ BGE
✅ Contriever
✅ BPR
✅ HYDE
🕒 RepLlama
🕒 coCondenser
🕒 Spar
🕒 Dragon
🕒 Hybird

2️⃣ Rerankers

✅ Cross-Encoders
✅ RankGPT
✅ RankGPT-API
✅ MonoT5
✅ MonoBert
✅ RankT5
✅ ListT5
✅ LiT5Score
✅ LiT5Dist
✅ Vicuna Reranker
✅ Zephyr Reranker
✅ Sentence Transformer-based
✅ FlashRank Models
✅ API-Based Rerankers
✅ ColBERT Reranker
✅ LLM Layerwise Ranker
✅ Splade Reranker
✅ UPR Reranker
✅ Inranker Reranker
✅ Transformer Reranker
✅ FIRST Reranker
✅ Blender Reranker
✅ LLM2VEC Reranker
✅ ECHO Reranker
✅ Incontext Reranker
🕒 DynRank
🕒 ASRank
🕒 RankLlama

3️⃣ Generators

✅ Fusion-in-Decoder (FiD) with T5
✅ In-Context Learning RLAM

✨ Features

🔥 Unified Framework: Combines retrieval, re-ranking, and retrieval-augmented generation (RAG) into a single modular toolkit.
📚 Rich Dataset Support: Includes 40+ benchmark datasets with pre-retrieved documents for seamless experimentation.
🧲 Diverse Retrieval Methods: Supports BM25, DPR, ANCE, BPR, ColBERT, BGE, and Contriever for flexible retrieval strategies.
🎯 Powerful Re-Ranking: Implements 24 advanced models with 41 sub-methods to optimize ranking performance.
🏗️ Prebuilt Indices: Provides Wikipedia and MS MARCO corpora, eliminating indexing overhead and speeding up retrieval.
🔮 Seamless RAG Integration: Works with GPT, LLAMA, T5, and Fusion-in-Decoder (FiD) models for retrieval-augmented generation.
🛠 Extensible & Modular: Easily integrates custom datasets, retrievers, ranking models, and RAG pipelines.
📊 Built-in Evaluation Suite: Includes retrieval, ranking, and RAG metrics for robust benchmarking.
📖 User-Friendly Documentation: Access detailed 📖 online docs, example notebooks, and tutorials for easy adoption.

🔍 Roadmap

Rankify is still under development, and this is our first release (v0.1.0). While it already supports a wide range of retrieval, re-ranking, and RAG techniques, we are actively enhancing its capabilities by adding more retrievers, rankers, datasets, and features.

🛠 Planned Improvements

Retrievers

✅ Supports: BM25, DPR, ANCE, BPR, ColBERT, BGE, Contriever
✨ ⏳ Coming Soon: Spar, MSS, MSS-DPR
✨ ⏳ Custom Index Loading for user-defined retrieval corpora

Re-Rankers

✅ 24 models & 41 sub-methods
✨ ⏳ Expanding with more ranking models

Datasets

✅ 40 benchmark datasets
✨ ⏳ Adding new datasets & custom dataset integration

Retrieval-Augmented Generation (RAG)

✅ Works with: GPT, LLAMA, T5
✨ ⏳ Expanding to more generative models

Evaluation & Usability

✅ Standard metrics: Top-K, EM, Recall
✨ ⏳ Adding advanced metrics: NDCG, MAP for retrievers

Pipeline Integration

✨ ⏳ Introducing a pipeline module for end-to-end retrieval, ranking, and RAG workflows

📖 Documentation

For full API documentation, visit the Rankify Docs.

💡 Contributing

Follow these steps to get involved:

Fork this repository to your GitHub account.
Create a new branch for your feature or fix:
```
git checkout -b feature/YourFeatureName
```
Make your changes and commit them:
```
git commit -m "Add YourFeatureName"
```

Push the changes to your branch:

git push origin feature/YourFeatureName

Submit a Pull Request to propose your changes.

Thank you for helping make this project better!

🌐 Community Contributions

Chinese community resources available!

Special thanks to Xiumao for writing two exceptional Chinese blog posts about Rankify:

📘 Introduction to Rankify

📘 Deep dive into re-ranking models in Rankify

These articles were crafted with high-traffic optimization in mind and are widely recommended in Chinese academic and developer circles.

We updated the 中文版本 to reflect these blog contributions while keeping original content intact—thank you Xiumao for your continued support!

:bookmark: License

Rankify is licensed under the Apache-2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

We would like to express our gratitude to the following libraries, which have greatly contributed to the development of Rankify:

Rerankers – A powerful Python library for integrating various reranking methods.
🔗 GitHub Repository
Pyserini – A toolkit for supporting BM25-based retrieval and integration with sparse/dense retrievers.
🔗 GitHub Repository
FlashRAG – A modular framework for Retrieval-Augmented Generation (RAG) research.
🔗 GitHub Repository

:star2: Citation

Please kindly cite our paper if helps your research:

@article{abdallah2025rankify,
  title={Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Ali, Mohammed and Jatowt, Adam},
  journal={arXiv preprint arXiv:2502.02464},
  year={2025}
}

Rankify

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥

🚀 Demo

:link: Navigation

🔧 Installation

Set up the virtual environment

Install PyTorch 2.5.1

Basic Installation

Recommended Installation

Optional Dependencies

Using ColBERT Retriever

:rocket: Quick Start

1️⃣ Pre-retrieved Datasets

Dataset Format

Access Datasets in Rankify

List Available Datasets

Feature Comparison for Pre-Retrieved Datasets

2️⃣ Running Retrieval

3️⃣ Running Reranking

4️⃣ Using Generator Module

5️⃣ Evaluating with Metrics

📜 Supported Models

1️⃣ Retrievers

2️⃣ Rerankers

3️⃣ Generators

✨ Features

🔍 Roadmap

🛠 Planned Improvements

Retrievers

Re-Rankers

Datasets

Retrieval-Augmented Generation (RAG)

Evaluation & Usability

Pipeline Integration

✨ ⏳ Introducing a pipeline module for end-to-end retrieval, ranking, and RAG workflows

📖 Documentation

💡 Contributing

🌐 Community Contributions

:bookmark: License

🙏 Acknowledgments

:star2: Citation

Star History