Infrastructure
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB. Visit olake.io/docs for the full documentation, and benchmarks
Connector ecosystem for Olake, the key points Olake Connectors focuses on are these
Functionality | MongoDB | Postgres | MySQL |
---|---|---|---|
Full Refresh Sync Mode | ✅ | ✅ | ✅ |
Incremental Sync Mode | ❌ | ❌ | ❌ |
CDC Sync Mode | ✅ | ✅ | ✅ |
Full Parallel Processing | ✅ | ✅ | ✅ |
CDC Parallel Processing | ✅ | ❌ | ❌ |
Resumable Full Load | ✅ | ✅ | ✅ |
CDC Heart Beat | ❌ | ❌ | ❌ |
We have additionally planned the following sources - AWS S3 | Kafka
Functionality | Local Filesystem | AWS S3 | Apache Iceberg |
---|---|---|---|
Flattening & Normalization (L1) | ✅ | ✅ | |
Partitioning | ✅ | ✅ | |
Schema Changes | ✅ | ✅ | |
Schema Evolution | ✅ | ✅ |
Catalog | Status |
---|---|
Glue Catalog | WIP |
Hive Meta Store | Upcoming |
JDBC Catalogue | Upcoming |
REST Catalogue - Nessie | Upcoming |
REST Catalogue - Polaris | Upcoming |
REST Catalogue - Unity | Upcoming |
REST Catalogue - Gravitino | Upcoming |
Azure Purview | Not Planned, submit a request |
BigLake Metastore | Not Planned, submit a request |
Core or framework is the component/logic that has been abstracted out from Connectors to follow DRY. This includes base CLI commands, State logic, Validation logic, Type detection for unstructured data, handling Config, State, Catalog, and Writer config file, logging etc.
Core includes http server that directly exposes live stats about running sync such as:
Core handles the commands to interact with a driver via these:
spec
command: Returns render-able JSON Schema that can be consumed by rjsf libraries in frontendcheck
command: performs all necessary checks on the Config, Catalog, State and Writer configdiscover
command: Returns all streams and their schemasync
command: Extracts data out of Source and writes into destinationsFind more about how OLake works here.
Checkout GitHub Project Roadmap and Upcoming OLake Roadmap to track and influence the way we build it. If you have any ideas, questions, or any feedback, please share on our Github Discussions or raise an issue.
We ❤️ contributions big or small check our Bounty Program. As always, thanks to our amazing contributors!.