LanceDB is an AI-native multimodal lakehouse built on the open-source Lance columnar format. It unifies vector embeddings, structured metadata, and raw source data — text, images, video, and audio — in a single embedded retrieval engine. The platform scales from a local Python or Rust process to enterprise-grade distributed deployments handling 100 billion+ rows, and powers vector, full-text, and hybrid search for AI applications.

Founded in 2021 and headquartered in the US, LanceDB has become one of the de facto open-source vector databases for retrieval-augmented generation (RAG), agent memory, recommendation systems, and multimodal training pipelines. Major projects such as AnythingLLM and several large LLM tooling stacks integrate LanceDB directly, and AWS published a reference architecture for billion-scale vector search built on LanceDB and Amazon S3.

In June 2025, LanceDB closed a $30 million Series A led by Theory Ventures, with participation from CRV, Y Combinator, Databricks Ventures, RunwayML, Zero Prime, and Swift. The funding supported the rollout of LanceDB Cloud and LanceDB Enterprise — fully managed, serverless offerings that eliminate infrastructure management and pay only for storage used, while scaling compute on demand. The Multimodal Lakehouse, part of LanceDB Enterprise, extends this with managed pipelines from raw files to production-ready training and retrieval features.

Developers can use LanceDB as an embedded library, a self-hosted server, or a managed cloud service, all backed by the same open-source format. Its hybrid search combines dense vectors with keyword and metadata filters, and native support for multimodal data makes it a strong choice for vision-language and multi-modal RAG systems. With its open-source roots, serverless cloud, and growing enterprise offering, LanceDB competes with Pinecone, Weaviate, Qdrant, and Milvus, while differentiating through its lakehouse approach and tight coupling between training data and retrieval.