Engineering
How Euterria works under the hood.
Euterria isn't a directory of profiles. It's a system for turning the documents a community already has into a typed, reviewable, searchable graph, then ranking across that graph using structured metadata, document-level evidence, and semantic relevance. Here's how that works, end to end.
Start with the shape of the problem
Almost everything a climate community knows is already written down somewhere: in grant reports, program one-pagers, decks, and the occasional sixty-page toolkit. The trouble is that this knowledge is scattered across files and inboxes, and it is deeply relational. A program belongs to an organization. A person has skills and sits on a coalition. A report references three partners and a funding source.
Keyword search over a folder of PDFs misses all of that. It can match a filename; it can't tell you who else is working on urban heat, or which toolkit a buried paragraph belongs to. Euterria's job is to give that knowledge a structure first, and then make the entire structure searchable as a single thing.
A directory tells you who exists. Euterria is built to tell you who can help.
A typed graph, rather than a pile of docs
The foundation is a small set of connected entity types. Organizations, people, programs, resources, networks, and partnerships each have their own schema of named fields that describe what they are and how they relate to everything around them.
Long documents don't just sit on a single record. They are split into sections, each given a short, grounded overview, and every entity carries a vector embedding so it can be found by meaning rather than exact wording. The graph, not the file, is the unit Euterria reasons about.
Getting knowledge in
Two front doors lead into the graph. Upload a file and it is parsed into clean markdown. Point Euterria at a website and it maps the site and scrapes the pages that matter. Either way, a model then reads the content and extracts it into strict, named fields, never a free-form paragraph.
Structured record
The two sources fill in different shapes. A document yields a title, description, type, issue areas, geographic scope, key contacts, contributors, languages, sections, a grounded overview, and any program it describes. A website yields an organization's mission, constituency, contact details, programs, staff, partners, and networks, each tied back to the page it came from.
The structure layer. Because every extraction is constrained to a schema, the output is a reviewable record with named fields. That constraint is what makes the knowledge auditable going in and searchable coming out.
Nothing publishes itself
Extraction produces a draft. You review and edit it, correcting fields, linking or creating the programs it mentions, and confirming contacts. Only when you publish does Euterria write the final records and their relationships, and trigger the embedding and index sync that makes them findable.
A record can't go live until it passes
Finding it again
Once the graph exists, the hard part is search. People don't ask database-table questions; they ask ecosystem questions. “Who's working on urban heat in Oakland?” should return a person, a program, an organization, and a report in one ranked list, even though those live in completely different indices.
Search is Algolia-first and Supabase-backed. Supabase stays the source of truth, while live retrieval runs through Algolia and then a layer of application-side ranking. Two retrieval paths run at once, keyword matching and semantic similarity, and their candidates blend into a single pool before anything is ranked.
Keyword · Algolia
Semantic · vectors
One question fans out to keyword and semantic retrieval at once. The candidates — across every entity type — blend into a single pool, and a reranker decides the final order by meaning. That's retrieval-augmented, hybrid search.
Before any of that, the query itself is read. A natural-language question is parsed into structured intent: a type, a topic, a city, and the raw keywords. Short queries skip the model entirely and stay literal, so a quick lookup is never over-thought.
“Who is working on urban heat islands in Oakland?”
Retrieval also reaches inside documents. A long report isn't only searchable by its title and description; its body is split into chunks and indexed separately. A single passage buried deep in a toolkit can surface and lift the resource it belongs to.
Climate Resilience Toolkit
Relevance to “urban heat”
Deciding what's relevant
Relevance isn't a single score. Candidates pass through a stack of signals, each doing a specific job: fast retrieval narrows the field, structured filters and heuristics shape it, document evidence reinforces it, and a semantic reranker has the final say on order.
- 1
Algolia retrieval
Fast first-stage candidates across every index.
- 2
Hard filters
Constraints the query implies, like a city, a type, or an issue area.
- 3
Filter boosts
Soft preferences that nudge relevant matches upward.
- 4
Heuristic scoring
Application-side signals about field matches and completeness.
- 5
Chunk evidence
Body passages from long documents boost their parent resource.
- 6
Semantic reranking
A reranking model reorders the shortlist by true meaning.
That last step matters most. Each finalist is turned into a compact semantic document — its title, type, organization details, metadata, description, and resource snippets — and a reranking model reorders the shortlist by what each result actually means, placing the most relevant answer first regardless of its keyword density.
Keyword shortlist
semantic rerankerThe same shortlist, reordered. Keyword scores get candidates in the door; the reranker reads what each one actually means and decides the final order — so the most relevant person, program, or org wins, whatever its type.
The embeddings behind all of this are deliberately curated. They are built from intentionally chosen text rather than raw database rows: an organization leads with what it needs and offers, a program with its goals and the populations it serves, a person with their skills and role.
Grounded, governed, private
Three guarantees run underneath everything. Extraction is grounded in your source material, so records reflect what the documents actually say. Publishing is gated by explicit checks, so incomplete records never go live. And row-level security governs who can read what.
The stack that runs it
Supabase
Source of truth for all content and relationships
Algolia
First-stage retrieval across every entity index
Voyage
Semantic reranking and stored embeddings
OpenAI / OpenRouter
Structured extraction and query understanding
LlamaParse
Turns uploaded documents into clean markdown
Firecrawl
Maps and scrapes organization websites
Vercel
Runs the Next.js application
Infisical
Manages production secrets
What this makes possible
All of this exists to answer the questions a flat directory can't:
Who else is working on this?
Surface the organizations, people, and programs already tackling your issue.
What already exists?
Find reports, toolkits, and resources you can reuse instead of rebuilding.
Who can help?
Locate the person with the skill, the partner, or the org that has done it before.
What can we build on?
See the networks and partnerships that connect the ecosystem together.
Each one is a question about the edges of the graph: who connects to whom, and what builds on what. Answering it well is the entire point.