1. Install LanceDB
Install LanceDB in your client SDK.Python pre-release builds
To pick up the latest features and bug fixes before the next stable release, install a pre-release from LanceDB’s Fury index.Pre-release builds receive the same level of testing as stable releases, but their availability is not guaranteed
for more than 6 months after release. For real-world workloads, we recommend you use the latest stable release
as far as possible.
2. Connect to a LanceDB database
LanceDB supports several URI patterns to connect to a database.- A local filesystem path (when using it as an embedded library)
- A
db://...URI (when using LanceDB Enterprise) - An object storage URI:
s3://...,gs://..., oraz://...(when connecting directly from the client SDK)
Connect via local directory path
The simplest way to begin is to use LanceDB as an embedded library. Import LanceDB in your client SDK of choice and point to a local directory path.Connect via object storage URIs
You can also connect directly to object storage from the client SDK: For credentials, endpoints, and provider-specific options, see Configuring storage.Connect to LanceDB Enterprise
If you’re using LanceDB Enterprise, you can connect to the remote database using thedb:// URI along with the API key, region, and cluster endpoint you received from the
LanceDB team. Pass the cluster endpoint via host_override so the client routes
requests to your deployment.
host_override is the full URL of your cluster endpoint, including the scheme
(https://) and a port if your deployment listens on a non-default one
(e.g. https://your-enterprise-endpoint.com:443). If you don’t have the
endpoint, contact the LanceDB team.RemoteTable semantics and how Enterprise differs operationally from
embedded LanceDB, see the Enterprise overview.
3. Create a new table
Let’s create a small table of characters from the kingdom of Camelot. Each row stores source text, metadata, structured fields, and a vector embedding in the same LanceDB table.The embeddings we use in this example are synthetic and for demonstration purposes only. In a real AI
data workflow, you would generate them from text, images, audio, or video using an embedding model of choice.
4. Semantic search
Search is a useful capability for all kinds of AI data pipelines. Below, we do a vector similarity search for samples similar to a “wise magical advisor” (transforming the natural language query to an embedding), and project only the columns needed by the next step. Search (which requires random access) is a ubiquitous access pattern that appears in many workloads: whether you’re building a RAG or recommendation system, serving agent memory, or curating a training dataset. The example for Python above shows how to convert results to a Polars DataFrame. Depending on your language, you can collect query results as a list/array of objects or DataFrames to be used downstream in your application.Pandas users in Python can get results as a Pandas DataFrame
Pandas users in Python can get results as a Pandas DataFrame
Use the
to_pandas() method to convert query results into a Pandas DataFrame.5. Curation
Searching for relevant results can be more useful when combined with metadata filters. In this tiny example, we filter to examples with highmagic stats.
When working with large datasets, it’s common to use the same pattern to filter on quality labels,
train/eval splits, numeric fields, categorical values, timestamp windows, or generated tags and labels.
6. Add a derived feature
Feature engineering is the process of cleaning up your data and creating new signals that help your model learn, make better predictions, or your agent retrieve more useful information. In the example below, we add apower_score column from the structured stats fields.
Lance supports data evolution, so you can add new columns without rewriting the entire table.
Next, you can query a compact view of the new feature:
| name | role | power_score |
|---|---|---|
| King Arthur | King | 3.5 |
| Merlin | Wizard | 4.0 |
| Sir Lancelot | Knight | 3.0 |
7. Store multimodal data
Multimodal data is a first-class citizen in LanceDB. Binary data (image, audio, video, etc.) is stored as blobs or inline Arrow binary types in a LanceDB column, and they benefit from the same table operations and data versioning semantics as other data types. All the data is governed in the same table, so you can search, filter, and retrieve multimodal records together with structured fields, metadata, and embeddings. In this example, thelancedb/magical_kingdom dataset stores
character images, descriptions, structured stats, image embeddings, and text embeddings together.
Say we downloaded the image for Sir Lancelot from that dataset locally. You can read the image bytes
in your client SDK and store them in a LanceDB column. The image bytes can be used for downstream tasks
like retrieval, evaluation, or training.

image column:
For more examples, see the multimodal data section.
Code
See the full code for these examples (including helper functions) in thequickstart file for the appropriate client language in the
files provided in the repo.
What’s next?
You’ve learned how to install LanceDB, connect, create one table for AI data, retrieve related examples, curate with metadata, add a derived feature, and represent multimodal records. These same primitives apply across the AI data lifecycle, from data preparation and feature engineering to retrieval, evaluation, and training. Continue to the table and search guides to build on this example with schema options, appends, updates, versioning, indexing, full-text search, hybrid search, and reranking.Basic table operations
Build on this quickstart with table creation, updates, and schema tips.
Build a RAG App
Learn how to build Retrieval-Augmented Generation (RAG) applications using LanceDB.
Indexing
Create vector, full-text, and scalar indexes to speed up queries on larger datasets.
Data loading and shuffles
Use LanceDB for projected, shuffled, random-access reads in training workflows.
