Cabinet of Signals · pilot 3

One image, every way we can read it.

An image is not one thing. It is a depth map; a set of regions; a list of faces; a palette; a caption; a salience field; a style; a frequency spectrum. Each of these is a different way to read the same pixels — a different signal extracted for a different purpose.

The Cabinet of Signals collects every reading our system can produce, against one shared image. Open cells below are wired into Gateway today; closed cells are signals we know we want and will add over time. The room visibly accumulates as each tool lands.

The shared subject

The 2014 Oscars group selfie used as the cabinet's shared subject.

Source. Photo by Bradley Cooper, 2014 Academy Awards.

Readings — 8 of 14 signals available

A grayscale map of how far each pixel is from the camera.

spatial · Dp

Depth

Per-object masks. Each region is one thing the model considers coherent.

spatial · Sm

Segmentation

Bounding boxes + six landmark points per face.

spatial · Fd

Face detection

The dominant colours — extracted by quantisation, frequency-ordered.

compositional · Pl

Palette

The image, expressed as text. Searchable, queryable, editable.

semantic · Cap

Caption

Pure structure, no texture — the skeleton of composition.

spatial · Edg

Edges

Skeletal keypoints: shoulders, elbows, hands, gaze direction.

spatial · Po

Body pose

compositional · Frq

Frequency

spatial

Obj

Object detection

coming

spatial

Surface normals

coming

semantic

Tag

Depth

A grayscale map of how far each pixel is from the camera.

What it sees here

Cooper's face is brightest — closest to the camera. Ellen and Lawrence sit at mid-grey. The back row and the stage behind them recede into near-black. The depth map reads the photograph as a 3D scene rather than a flat plane.

Tool card · /lab/elements/dp →

spatial · Sm

Segmentation

Per-object masks. Each region is one thing the model considers coherent.

What it sees here

SAM 2 returns ten masks at default settings. They cover only ~12% of the image — the most prominent objects (some clothing, partial faces) get masked; the rest of the frame is left untouched. A first lesson: SAM in automatic mode is asking what the obvious things are, not how to tile every pixel.

Tool card · /lab/elements/sm →

spatial · Fd

Face detection

Bounding boxes + six landmark points per face.

What it sees here

At MediaPipe's empirical sweet spot (confidence 0.2) we get six faces — Cooper, Lawrence, Ellen, Lupita's brother, Brad Pitt, and Kevin Spacey area. The back row is reachable here in a way it isn't at the default 0.5 threshold. Each detection carries six landmarks (eyes, nose, mouth, ears) — anchorable, alignable, lip-syncable.

Tool card · /lab/elements/fd →

compositional · Pl

Palette

The dominant colours — extracted by quantisation, frequency-ordered.

What it sees here

Six swatches: deep black (the tuxedo / venue shadow), warm beige (skin tones), near-white (Ellen's blazer / stage light), two near-black variants (depth of background), and a peach mid-tone (more skin / blush). The palette is what you'd reach for to colour-grade against, or to tell another image to match this one's mood.

semantic · Cap

Caption

The image, expressed as text. Searchable, queryable, editable.

What it sees here

Claude vision describes the photograph in 2-3 sentences — subjects, setting, lighting, mood, composition. The default prompt is tuneable: a caller can ask "describe only the mood", "what's missing from this composition?", "be exhaustive". Same Node, same contract, different question — that's the primary creative knob. The caption gives Producer the *language* of an image to compose against when reasoning about a brief.

Tool card · /lab/elements/cap →

spatial · Edg

Edges

Pure structure, no texture — the skeleton of composition.

What it sees here

A two-tier Canny gives strong edges (full black) and weak ones (grey). On the Cooper selfie: facial features, jawlines, suit collars, hairline transitions all register cleanly. The result is sparse — paper showing through — because edge maps respect the source's natural blank surfaces. Useful as ControlNet conditioning input, as a structural-similarity signal, and as the starting point for line-art stylisation.

Tool card · /lab/elements/edg →

spatial · Po

Body pose

Skeletal keypoints: shoulders, elbows, hands, gaze direction.

What it sees here

MediaPipe Pose Lite finds the people whose torsos are visible enough to fit a 33-keypoint skeleton. The Cooper selfie is a hard case — most subjects are heads/shoulders only — so just three poses register at the multi-subject confidence threshold. On a richer image (a stage scene, a dance class, a sports moment), this Node would catch every person and trace each body's full pose. Each detected person gets a distinct hue.

Tool card · /lab/elements/po →

compositional · Frq

Frequency

Low / mid / high frequency bands of the image — different scales of structure.

What it sees here

A Gaussian-pyramid decomposition splits the image into three bands. *Low*: the squinted version — big tonal masses, where bright/dark regions live. *Mid*: shapes and forms at medium scale. *High*: fine detail and texture. The four panels show the source plus each band so they can be compared at a glance. The frequency view is useful for compositional balance reasoning, stylisation pipelines, and quality / noise assessment.

Pilot 3 · cabinet of signals · Alt Shift Lab

The cabinet is an active room — new cells are added as we wire each decomposition Node into Gateway's decompose_full() Graph. Closed cells today; open cells next session.