Engineering

How Euterria works under the hood.

Euterria isn't a directory of profiles. It's a system for turning the documents a community already has into a typed, reviewable, searchable graph, then ranking across that graph using structured metadata, document-level evidence, and semantic relevance. Here's how that works, end to end.

By the Engineering Team · ~9 min read

Start with the shape of the problem

Almost everything a climate community knows is already written down somewhere: in annual reports, program one-pagers, decks, and the occasional sixty-page toolkit. The trouble is that this knowledge is scattered across files and inboxes, and it is deeply relational. A program belongs to an organization. A person has skills and sits on a coalition. A report references three partners and a funding source.

Keyword search over a folder of PDFs misses all of that. It can match a filename; it can't tell you who else is working on urban heat, or which toolkit a buried paragraph belongs to. Euterria's job is to give that knowledge a structure first, and then make the entire structure searchable as a single thing.

A directory tells you who exists. Euterria is built to tell you who can help.

A typed graph, rather than a pile of docs

The foundation is a small set of connected entity types. Organizations, people, programs, resources, networks, and partnerships each have their own schema of named fields that describe what they are and how they relate to everything around them.

Six entity types, connected. Most real questions are really questions about the edges between them.

Long documents don't just sit on a single record. They are split into sections, each given a short, grounded overview, and every entity carries a vector embedding so it can be found by meaning rather than exact wording. The graph, not the file, is the unit Euterria reasons about.

Getting knowledge in

Two front doors lead into the graph. Upload a file and it is parsed into clean markdown. Point Euterria at a website and it maps the site and scrapes the pages that matter. Either way, a model then reads the content and extracts it into strict, named fields, never a free-form paragraph.

GreenLine_2025_Report.pdf

extract

Structured record

TitleUrban Canopy Equity Program

TypeProgram

Issue areasUrban heatTree equityAir quality

Key contactDr. Maya Okonkwo

Timeframe2024 – 2026

An annual report becomes a structured record: title, type, issue areas, contacts, and dates, each one a field you can review.

The two sources fill in different shapes. A document yields a title, description, type, issue areas, geographic scope, key contacts, contributors, languages, sections, a grounded overview, and any program it describes. A website yields an organization's mission, constituency, contact details, programs, staff, partners, and networks, each tied back to the page it came from.

The structure layer. Because every extraction is constrained to a schema, the output is a reviewable record with named fields. That constraint is what makes the knowledge auditable going in and searchable coming out.

Nothing publishes itself

Extraction produces a draft. You review and edit it, correcting fields, linking or creating the programs it mentions, and confirming contacts. Only when you publish does Euterria write the final records and their relationships, and trigger the embedding and index sync that makes them findable.

A record can't go live until it passes

Has a titleHas a descriptionHas an overviewPublishable typeFile or external linkValid program choice

Finding it again

Once the graph exists, the hard part is search. People don't ask database-table questions; they ask ecosystem questions. “Who's working on urban heat in Oakland?” should return a person, a program, an organization, and a report in one ranked list, even though those live in completely different indices.

Search is Algolia-first and Supabase-backed. Supabase stays the source of truth, while live retrieval runs through Algolia and then a layer of application-side ranking. Two retrieval paths run at once, keyword matching and semantic similarity, and their candidates blend into a single pool before anything is ranked.

Query

“Who's working on urban heat in Oakland?”

Hybrid retrieval

Keyword · Algolia

Heat & Health ReportOakland Shade Plan

Semantic · vectors

Dr. Maya OkonkwoGreenLine Collaborative“…tree-canopy gaps in flatland heat”

Blend & rerank

Heat & Health ReportOakland Shade PlanDr. Maya OkonkwoGreenLine Collaborative“…tree-canopy gaps in flatland heat”

reranking by meaning

Ranked answer

1Dr. Maya Okonkwo

2Oakland Shade Plan

3GreenLine Collaborative

One question fans out to keyword and semantic retrieval at once. The candidates — across every entity type — blend into a single pool, and a reranker decides the final order by meaning. That's retrieval-augmented, hybrid search.

OrganizationPersonProgramResourceDoc chunk

Hybrid retrieval: a single question fans out to keyword and semantic search, the candidates blend, and a reranker decides the final order.

Before any of that, the query itself is read. A natural-language question is parsed into structured intent: a type, a topic, a city, and the raw keywords. Short queries skip the model entirely and stay literal, so a quick lookup is never over-thought.

“Who is working on urban heat islands in Oakland?”

type: persontopic: urban heat islandscity: Oaklandkeywords

Retrieval also reaches inside documents. A long report isn't only searchable by its title and description; its body is split into chunks and indexed separately. A single passage buried deep in a toolkit can surface and lift the resource it belongs to.

Climate Resilience Toolkit

Cooling centers now serve 12k residents…

Outreach partners across three CBOs…

Tree-canopy gaps mapped across flatland heat zones…

Budget, grant acknowledgements, and credits…

Appendix: survey methodology and sources…

+ evidence

Resource

Climate Resilience Toolkit

Relevance to “urban heat”

Evidence from inside a document. One relevant passage raises the relevance of its parent resource.

Deciding what's relevant

Relevance isn't a single score. Candidates pass through a stack of signals, each doing a specific job: fast retrieval narrows the field, structured filters and heuristics shape it, document evidence reinforces it, and a semantic reranker has the final say on order.

1
Algolia retrieval
Fast first-stage candidates across every index.
2
Hard filters
Constraints the query implies, like a city, a type, or an issue area.
3
Filter boosts
Soft preferences that nudge relevant matches upward.
4
Heuristic scoring
Application-side signals about field matches and completeness.
5
Chunk evidence
Body passages from long documents boost their parent resource.
6
Semantic reranking
A reranking model reorders the shortlist by true meaning.

That last step matters most. Each finalist is turned into a compact semantic document — its title, type, organization details, metadata, description, and resource snippets — and a reranking model reorders the shortlist by what each result actually means, placing the most relevant answer first regardless of its keyword density.

Keyword shortlist

semantic reranker

1Heat & Health Report

4Dr. Maya Okonkwo

3Oakland Shade Plan

5GreenLine Collaborative

2Citywide Tree Census

The same shortlist, reordered. Keyword scores get candidates in the door; the reranker reads what each one actually means and decides the final order — so the most relevant person, program, or org wins, whatever its type.

The reranker at work: a keyword-sorted shortlist gets reordered by meaning, and the best answer rises regardless of its type.

The embeddings behind all of this are deliberately curated. They are built from intentionally chosen text rather than raw database rows: an organization leads with what it needs and offers, a program with its goals and the populations it serves, a person with their skills and role.

Grounded, governed, private

Three guarantees run underneath everything. Extraction is grounded in your source material, so records reflect what the documents actually say. Publishing is gated by explicit checks, so incomplete records never go live. And row-level security governs who can read what.

The stack that runs it

Supabase

Source of truth for all content and relationships

Algolia

First-stage retrieval across every entity index

Voyage

Semantic reranking and stored embeddings

OpenAI / OpenRouter

Structured extraction and query understanding

LlamaParse

Turns uploaded documents into clean markdown

Firecrawl

Maps and scrapes organization websites

Vercel

Runs the Next.js application

Infisical

Manages production secrets

What this makes possible

All of this exists to answer the questions a flat directory can't:

Who else is working on this?

Surface the organizations, people, and programs already tackling your issue.

What already exists?

Find reports, toolkits, and resources you can reuse instead of rebuilding.

Who can help?

Locate the person with the skill, the partner, or the org that has done it before.

What can we build on?

See the networks and partnerships that connect the ecosystem together.

Each one is a question about the edges of the graph: who connects to whom, and what builds on what. Answering it well is the entire point.

See it in action Explore the use cases

Engineering

How Euterria works under the hood.

By the Engineering Team · ~9 min read

Start with the shape of the problem

A directory tells you who exists. Euterria is built to tell you who can help.

A typed graph, rather than a pile of docs

Six entity types, connected. Most real questions are really questions about the edges between them.

Getting knowledge in

GreenLine_2025_Report.pdf

extract

Structured record

TitleUrban Canopy Equity Program

TypeProgram

Issue areasUrban heatTree equityAir quality

Key contactDr. Maya Okonkwo

Timeframe2024 – 2026

An annual report becomes a structured record: title, type, issue areas, contacts, and dates, each one a field you can review.

Nothing publishes itself

A record can't go live until it passes

Has a titleHas a descriptionHas an overviewPublishable typeFile or external linkValid program choice

Finding it again

Query

“Who's working on urban heat in Oakland?”

Hybrid retrieval

Keyword · Algolia

Heat & Health ReportOakland Shade Plan

Semantic · vectors

Dr. Maya OkonkwoGreenLine Collaborative“…tree-canopy gaps in flatland heat”

Blend & rerank

Heat & Health ReportOakland Shade PlanDr. Maya OkonkwoGreenLine Collaborative“…tree-canopy gaps in flatland heat”

reranking by meaning

Ranked answer

1Dr. Maya Okonkwo

2Oakland Shade Plan

3GreenLine Collaborative

OrganizationPersonProgramResourceDoc chunk

Hybrid retrieval: a single question fans out to keyword and semantic search, the candidates blend, and a reranker decides the final order.

“Who is working on urban heat islands in Oakland?”

type: persontopic: urban heat islandscity: Oaklandkeywords

Climate Resilience Toolkit

Cooling centers now serve 12k residents…

Outreach partners across three CBOs…

Tree-canopy gaps mapped across flatland heat zones…

Budget, grant acknowledgements, and credits…

Appendix: survey methodology and sources…

+ evidence

Resource

Climate Resilience Toolkit

Relevance to “urban heat”

Evidence from inside a document. One relevant passage raises the relevance of its parent resource.

Deciding what's relevant

1
Algolia retrieval
Fast first-stage candidates across every index.
2
Hard filters
Constraints the query implies, like a city, a type, or an issue area.
3
Filter boosts
Soft preferences that nudge relevant matches upward.
4
Heuristic scoring
Application-side signals about field matches and completeness.
5
Chunk evidence
Body passages from long documents boost their parent resource.
6
Semantic reranking
A reranking model reorders the shortlist by true meaning.

Keyword shortlist

semantic reranker

1Heat & Health Report

4Dr. Maya Okonkwo

3Oakland Shade Plan

5GreenLine Collaborative

2Citywide Tree Census

The reranker at work: a keyword-sorted shortlist gets reordered by meaning, and the best answer rises regardless of its type.

Grounded, governed, private

The stack that runs it

Supabase

Source of truth for all content and relationships

Algolia

First-stage retrieval across every entity index

Voyage

Semantic reranking and stored embeddings

OpenAI / OpenRouter

Structured extraction and query understanding

LlamaParse

Turns uploaded documents into clean markdown

Firecrawl

Maps and scrapes organization websites

Vercel

Runs the Next.js application

Infisical

Manages production secrets

What this makes possible

All of this exists to answer the questions a flat directory can't:

Who else is working on this?

Surface the organizations, people, and programs already tackling your issue.

What already exists?

Find reports, toolkits, and resources you can reuse instead of rebuilding.

Who can help?

Locate the person with the skill, the partner, or the org that has done it before.

What can we build on?

See the networks and partnerships that connect the ecosystem together.

Each one is a question about the edges of the graph: who connects to whom, and what builds on what. Answering it well is the entire point.

See it in action Explore the use cases

Start with the shape of the problem

A typed graph, rather than a pile of docs

Getting knowledge in

Nothing publishes itself

Finding it again

Climate Resilience Toolkit

Deciding what's relevant

Algolia retrieval

Hard filters

Filter boosts

Heuristic scoring

Chunk evidence

Semantic reranking

Grounded, governed, private

What this makes possible

Who else is working on this?

What already exists?

Who can help?

What can we build on?

Start with the shape of the problem

A typed graph, rather than a pile of docs

Getting knowledge in

Nothing publishes itself

Finding it again

Climate Resilience Toolkit

Deciding what's relevant

Algolia retrieval

Hard filters

Filter boosts

Heuristic scoring

Chunk evidence

Semantic reranking

Grounded, governed, private

What this makes possible

Who else is working on this?

What already exists?

Who can help?

What can we build on?