Skip to content

Inspector Context Stack

This page explains how inspector data moves from raw profiling into the higher-level context that AI and catalog surfaces consume.

The short version

The inspector does not produce a single "context" object. It builds context in layers:

  1. live schema and profiling data
  2. persisted table profiles in target/inspect.json
  3. cross-table enrichments such as relationships and fanout risk
  4. description enrichments from dbt metadata
  5. AI-focused formatting into compact schema context text

Each layer serves a different consumer and intentionally keeps the boundaries clean.

Stack overview

flowchart TD
    A["Live database schema<br/>InspectConnection"] --> B["Raw table profile<br/>TableInspector / TableInspection"]
    B --> C["Artifact cache<br/>target/inspect.json"]
    C --> D["Catalog enrichments<br/>relationships, join profiles, fanout risk"]
    C --> E["Description enrichments<br/>dbt_descriptions"]
    D --> F["AI formatting<br/>format_table_context()"]
    E --> F
    C --> F
    F --> G["Schema context string<br/>get_schema_context()"]
    C --> H["MCP catalog() and inspect dashboards"]

Layer 1: Live schema and profiling

The lowest layer is direct database access:

  • InspectConnection handles dialect-specific metadata queries and table access.
  • TableInspector.inspect_table() runs the profiling pipeline for one table.

This is where Dataface computes the raw per-column facts:

  • nulls
  • distinct counts
  • min/max
  • numeric stats
  • top values
  • enum values
  • semantic type detection
  • quality flags
  • primary date column
  • grain candidate

This layer is table-local. It knows a lot about one table, but not yet about the wider catalog.

Layer 2: Raw profile contract

The output of a profiling run is a TableInspection. That object serializes into the stable profiler contract via to_dict() / to_json_dict().

Important characteristics of this layer:

  • it is the canonical raw profile representation
  • it is designed to be reused across CLI, IDE, and API surfaces
  • it remains close to the underlying facts rather than AI prompt formatting

The main contract reference lives in:

dataface/core/inspect/CONTRACT.md

The broader artifact shape and field semantics are documented in:

dataface/core/inspect/inspector_schema.md

Layer 3: Persisted artifact in target/inspect.json

InspectionStorage writes table profiles into a single artifact:

target/inspect.json

This is the shared cache for inspector-driven features. It gives Dataface a stable, queryable representation of the catalog without having to re-profile on every request.

Key properties:

  • one file for the whole catalog
  • one entry per table
  • merges new profiles into existing data
  • acts as the handoff point to downstream consumers

Layer 4: Cross-table enrichment

Some context only makes sense once multiple tables have been profiled. That work happens after the raw table profiles are saved.

Relationships

InspectionStorage.update_relationships() reconstructs table profiles from the cache and runs cross-table relationship inference.

That pipeline is intentionally cache-based:

  • it does not hit the database again
  • it reasons over already-profiled metadata
  • it produces deterministic catalog-level edges

The main signals are:

  • foreign-key naming conventions such as customer_id
  • key_role classifications
  • uniqueness ratios
  • FK range containment within PK range

Join multiplicity and fanout risk

Detected relationships are then enriched with:

  • join_profile
  • fanout_risk

This is what allows downstream consumers to distinguish:

  • safe dimension lookups
  • one-to-many joins
  • risky many-to-many patterns

That enrichment is important because it turns a guessed relationship into a usable modeling hint.

Layer 5: Description enrichment

Descriptions are currently baked from dbt metadata into cached profiles via InspectionStorage.update_descriptions().

That step parses:

models/**/schema.yml
models/**/schema.yaml

and stores matched descriptions under dbt_descriptions.

This is deliberately provenance-preserving:

  • dbt descriptions are stored with source metadata
  • they do not overwrite profiler facts
  • higher layers can choose how to merge or prioritize descriptions

Layer 6: AI context shaping

The AI-facing layer lives in dataface/ai/schema_context.py.

This layer is not about discovering new facts. It is about taking the cached and enriched facts and turning them into something an LLM can consume efficiently.

There are two important functions:

  • format_table_context(table) produces a structured AI payload for one table
  • get_schema_context() produces a compact multi-table schema summary string

What format_table_context() adds

It takes a table profile and returns:

  • ai_context_version
  • formatted
  • selected_description
  • selected_source
  • description_candidates
  • column_descriptions

This is a different contract from the raw profiler contract. The profiler contract is about profiling output. The AI context contract is about prompt-safe consumption.

The AI contract reference lives in:

dataface/ai/AI_CONTEXT_CONTRACT.md

Description merging

format_table_context() supports a generalized description_candidates model and resolves it through the description merge engine.

The current priority stack is:

  1. dbt_schema_yml
  2. database_comment
  3. curated
  4. inferred

Today, much of the persisted inspector enrichment is still stored as dbt_descriptions in the cache artifact, while the AI layer is already built to support the more general description_candidates contract. That is worth knowing when reasoning about where a specific description came from.

Layer 7: Consumer-specific views

Once the stack above exists, different surfaces take different slices of it.

catalog()

catalog() exposes an AI-friendly browsing surface:

  • table listing uses cached profiles when available
  • cache misses fall back to live schema introspection
  • single-table deep profiling is opt-in via force_refresh=True

The list response is intentionally slimmer than the raw profile contract. It keeps only the fields that help with exploration.

Inspect dashboards

Inspect dashboards use the cached artifact and inspect templates to render a UI over the same underlying profile data.

AI prompts and playground

get_schema_context() produces a compact text summary of the schema. It prefers cached profiles, but if a table has never been profiled it can still include the live column list so AI flows do not fail hard on a cold start.

Boundary rules

The main design rule is that each layer owns a different job:

  • inspector/profile layer owns factual table metadata
  • storage layer owns persistence and catalog-wide baking
  • relationship layer owns cross-table reasoning
  • description layer owns provenance
  • AI context layer owns compact formatting and merge presentation

That separation is what keeps the system extensible. If a new consumer needs the raw contract, it can stop at the artifact layer. If it needs LLM-friendly text, it can consume the AI context layer instead.