we use sqlite database to index books for orris right now but the data is growing. we use sqlite ecause of fts5 and bm25 ranking algorithm but i dont know if that's good for scale or not and we want to make agent to sql queries only and use bm25 algorithm only and all benifits of fts5 as well so tell me whether the current approach is good only or we should think of something else like postgres or mysql or managed sqlite but sqlite is i think designed for local processing only so i dont know if there are any managed service for that or not and are they any good. i dont know what to do?

Searching the Web

SQLite FTS5 BM25 scaling limitations vs PostgreSQL full text search 2024 2025

Searching the Web

managed SQLite services cloud Turso LiteFS 2024 production scale

Searching the Web

PostgreSQL full text search BM25 pg_bm25 ParadeDB vs SQLite FTS5 comparison

Good, I have solid data to give you a thorough, opinionated answer. Here it is:

Should You Stay on SQLite FTS5 or Move?

Short answer: SQLite FTS5 is genuinely good for your use case, but it has a specific ceiling you need to understand. Here is the honest breakdown of every option.

Why SQLite FTS5 is Actually Strong for This

SQLite FTS5 has a meaningful advantage over vanilla PostgreSQL for your exact use case:
  • Real BM25 out of the box. PostgreSQL's native FTS does NOT implement BM25. It cannot, because it does not maintain corpus-wide term frequency statistics. SQLite FTS5 does maintain those statistics, so its BM25 scores are legitimate relevance scores. This is a real difference, not a minor one.
  • Fast for read-heavy workloads. At small-to-medium scale, SQLite FTS5 is measurably faster than PostgreSQL FTS - there are benchmarks showing SQLite at 4ms vs PostgreSQL at 32ms for the same FTS query on the same data.
  • Zero infrastructure. One file, no server process, no connection pooling, no network overhead.
  • Agent-to-SQL queries work perfectly. FTS5 is standard SQL with MATCH, rank, bm25(), highlight(), and snippet() - all callable from plain SQL, which is exactly what an agent needs.

Where SQLite Hits a Wall

ProblemDetails
Single writerOnly one write at a time. If you ever need concurrent indexing, you will get lock contention.
File-based = local onlyBy design, it is a library not a server. Multiple machines cannot share one SQLite file directly.
No built-in replicationIf your indexing process and your query process are on different machines, you need external tooling.
No horizontal read scalingYou cannot add read replicas natively.
Large DB file performanceFTS5 can start degrading at very high row counts (tens of millions of paragraphs), though for a book library this is unlikely to be a problem anytime soon.
The key question is: what does "data is growing" mean for you? More books added regularly (write side)? More agents querying simultaneously (read side)? Or both?

Your Real Options

1. Stay on SQLite + Managed Service (Turso)

Turso is a managed SQLite-compatible platform built on libSQL (a fork of SQLite). It is the most relevant option for your case.
  • Fully managed, hosted on AWS globally
  • Supports FTS5, BM25, all SQLite extensions
  • Embedded replicas: your agent gets a local SQLite file that syncs from the Turso primary - sub-5ms reads
  • Branching, point-in-time recovery, encryption
  • Works with plain SQL - agent queries stay unchanged
  • Handles read scaling via edge replicas
Cloudflare D1 is another option if you are on Cloudflare's stack. It is managed SQLite with global replication, reached GA in April 2024. FTS5 is supported.
Verdict: If you want the least migration work and to keep BM25 + FTS5 as-is, Turso is the strongest move. Your agent SQL queries need zero changes.

2. PostgreSQL (native FTS)

Do not use this if BM25 matters to you. PostgreSQL's built-in FTS (tsvector/tsquery) does not implement BM25. It uses a simpler ranking function (ts_rank) that lacks corpus statistics. You would be trading away the primary algorithmic benefit of your current setup.
PostgreSQL FTS also degrades on tables with tens of millions of rows. Supabase and ParadeDB both document this.
Verdict: Wrong choice if BM25 is a requirement.

3. PostgreSQL + ParadeDB / pg_search

ParadeDB's pg_search extension embeds Tantivy (a Rust-based Lucene alternative) inside PostgreSQL. It adds a proper BM25 index type, fuzzy search, relevance tuning, and faceting - all queryable via SQL.
  • You get real BM25 inside a proper client-server database
  • Full PostgreSQL ecosystem: ACID, MVCC, concurrent writes, horizontal read replicas via standard Postgres tooling
  • SQL interface - agent queries work
  • ParadeDB is available as a managed service or self-hosted
Verdict: Best path if you need multi-writer concurrency or your data grows to truly large scale (hundreds of millions of rows). Higher operational complexity than Turso/SQLite.

4. MySQL

MySQL's FTS does not implement BM25 either. Skip it for this use case.

5. Dedicated Search Engine (Typesense, Meilisearch, OpenSearch)

These are excellent at search but they are not SQL databases. Your agent would need a non-SQL API to query them, which breaks the "agent queries SQL only" requirement unless you add a translation layer. Not recommended given your constraints.

What I Would Actually Recommend

For now: Stay on SQLite FTS5. It is not broken. Your BM25 scores are real, your SQL queries work, and the setup is simple.
For managed/hosted: Move to Turso when you need replication or multi-machine access. It is SQLite-compatible, your existing schema and queries move over as-is, and you gain replication + managed backups. LiteFS Cloud was shut down (October 2024), so Turso is the clearest managed SQLite path today.
If you hit write contention or truly massive scale: Evaluate ParadeDB on PostgreSQL. It is the only PostgreSQL option that gives you proper BM25, and it keeps the SQL interface your agent depends on.
Summary table:
OptionBM25SQL AgentManagedWrite ScaleNotes
SQLite FTS5 (current)YesYesNoSingle writerGood, free, simple
TursoYesYesYesSingle primaryBest migration path
Cloudflare D1YesYesYesSingle primaryIf on CF stack
PostgreSQL native FTSNoYesYesFullWrong choice for BM25
ParadeDB (pg_search)YesYesYesFullBest at large scale
MySQL FTSNoYesYesFullSkip
Typesense/MeilisearchYesNoYesFullBreaks SQL requirement
This is a shared conversation. Sign in to Orris to start your own chat.