Inspector Context Stack¶

This page explains how inspector data moves from raw profiling into the higher-level context that AI and catalog surfaces consume.

The short version¶

The inspector does not produce a single "context" object. It builds context in layers:

live schema and profiling data
persisted table profiles in target/inspect.json
cross-table enrichments such as relationships and fanout risk
description enrichments from dbt metadata
AI-focused formatting into compact schema context text

Each layer serves a different consumer and intentionally keeps the boundaries clean.

Stack overview¶

flowchart TD
    A["Live database schema<br/>InspectConnection"] --> B["Raw table profile<br/>TableInspector / TableInspection"]
    B --> C["Artifact cache<br/>target/inspect.json"]
    C --> D["Catalog enrichments<br/>relationships, join profiles, fanout risk"]
    C --> E["Description enrichments<br/>dbt_descriptions"]
    D --> F["AI formatting<br/>format_table_context()"]
    E --> F
    C --> F
    F --> G["Schema context string<br/>get_schema_context()"]
    C --> H["MCP catalog() and inspect dashboards"]

Layer 1: Live schema and profiling¶

The lowest layer is direct database access:

InspectConnection handles dialect-specific metadata queries and table access.
TableInspector.inspect_table() runs the profiling pipeline for one table.

This is where Dataface computes the raw per-column facts:

nulls
distinct counts
min/max
numeric stats
top values
enum values
semantic type detection
quality flags
primary date column
grain candidate

This layer is table-local. It knows a lot about one table, but not yet about the wider catalog.

Layer 2: Raw profile contract¶

The output of a profiling run is a TableInspection. That object serializes into the stable profiler contract via to_dict() / to_json_dict().

Important characteristics of this layer:

it is the canonical raw profile representation
it is designed to be reused across CLI, IDE, and API surfaces
it remains close to the underlying facts rather than AI prompt formatting

The main contract reference lives in:

dataface/core/inspect/CONTRACT.md

The broader artifact shape and field semantics are documented in:

dataface/core/inspect/inspector_schema.md

Layer 3: Persisted artifact in `target/inspect.json`¶

InspectionStorage writes table profiles into a single artifact:

target/inspect.json

This is the shared cache for inspector-driven features. It gives Dataface a stable, queryable representation of the catalog without having to re-profile on every request.

Key properties:

one file for the whole catalog
one entry per table
merges new profiles into existing data
acts as the handoff point to downstream consumers

Layer 4: Cross-table enrichment¶

Some context only makes sense once multiple tables have been profiled. That work happens after the raw table profiles are saved.

Relationships¶

InspectionStorage.update_relationships() reconstructs table profiles from the cache and runs cross-table relationship inference.

That pipeline is intentionally cache-based:

it does not hit the database again
it reasons over already-profiled metadata
it produces deterministic catalog-level edges

The main signals are:

foreign-key naming conventions such as customer_id
key_role classifications
uniqueness ratios
FK range containment within PK range

Join multiplicity and fanout risk¶

Detected relationships are then enriched with:

join_profile
fanout_risk

This is what allows downstream consumers to distinguish:

safe dimension lookups
one-to-many joins
risky many-to-many patterns

That enrichment is important because it turns a guessed relationship into a usable modeling hint.

Layer 5: Description enrichment¶

Descriptions are currently baked from dbt metadata into cached profiles via InspectionStorage.update_descriptions().

That step parses:

models/**/schema.yml
models/**/schema.yaml

and stores matched descriptions under dbt_descriptions.

This is deliberately provenance-preserving:

dbt descriptions are stored with source metadata
they do not overwrite profiler facts
higher layers can choose how to merge or prioritize descriptions

Layer 6: AI context shaping¶

The AI-facing layer lives in dataface/ai/schema_context.py.

This layer is not about discovering new facts. It is about taking the cached and enriched facts and turning them into something an LLM can consume efficiently.

There are two important functions:

format_table_context(table) produces a structured AI payload for one table
get_schema_context() produces a compact multi-table schema summary string

What `format_table_context()` adds¶

It takes a table profile and returns:

ai_context_version
formatted
selected_description
selected_source
description_candidates
column_descriptions

This is a different contract from the raw profiler contract. The profiler contract is about profiling output. The AI context contract is about prompt-safe consumption.

The AI contract reference lives in:

dataface/ai/AI_CONTEXT_CONTRACT.md

Description merging¶

format_table_context() supports a generalized description_candidates model and resolves it through the description merge engine.

The current priority stack is:

dbt_schema_yml
database_comment
curated
inferred

Today, much of the persisted inspector enrichment is still stored as dbt_descriptions in the cache artifact, while the AI layer is already built to support the more general description_candidates contract. That is worth knowing when reasoning about where a specific description came from.

Layer 7: Consumer-specific views¶

Once the stack above exists, different surfaces take different slices of it.

`catalog()`¶

catalog() exposes an AI-friendly browsing surface:

table listing uses cached profiles when available
cache misses fall back to live schema introspection
single-table deep profiling is opt-in via force_refresh=True

The list response is intentionally slimmer than the raw profile contract. It keeps only the fields that help with exploration.

Inspect dashboards¶

Inspect dashboards use the cached artifact and inspect templates to render a UI over the same underlying profile data.

AI prompts and playground¶

get_schema_context() produces a compact text summary of the schema. It prefers cached profiles, but if a table has never been profiled it can still include the live column list so AI flows do not fail hard on a cold start.

Boundary rules¶

The main design rule is that each layer owns a different job:

inspector/profile layer owns factual table metadata
storage layer owns persistence and catalog-wide baking
relationship layer owns cross-table reasoning
description layer owns provenance
AI context layer owns compact formatting and merge presentation

That separation is what keeps the system extensible. If a new consumer needs the raw contract, it can stop at the artifact layer. If it needs LLM-friendly text, it can consume the AI context layer instead.