We are building an image-processing module in the ASL ecosystem that services different apps — including CoNoggin's content-authoring app and the virtual guest simulator internally called Alt TV.
We are exploring how to combine use of generative AI with use of existing deterministic tools to edit and generate images in original and controllable ways.
This experiment tests the mechanics of the process: can we isolate elements of an image and edit them according to a chosen criterion?
The experiment
Take a source image, identify and extract its subjects, map their presence in depth of field, and use the information to affect each object according to its location in relation to an anchor point.
The effect we went for: the further you are from the anchor point, the more blurred the object becomes.
Source image
The 2014 Oscars group selfie — twelve faces at varied depths.

Iteration 1
Process:Applied SAM (Meta's segmentation model) to isolate areas, mapped depth, picked the largest detected object as anchor, applied blur to every object scaled by its 3D distance from the anchor.
Result: SAM was not picking fields correctly. The ten detected objects covered only 12% of the image; the remaining 88% had no detection and therefore no effect applied.

Changes made: focused just on SAM to see if we could get better object identification.
Iteration 2
Process:Increased SAM's sampling density and relaxed its thresholds.
Result:Got too granular. Sunglasses split from faces, bow ties from collars. Wouldn't work.
But what if we mapped against z coordinates?
Iteration 3
Process: Stop iterating SAM regions. Use the depth map directly: for every pixel in the image, compute a strength value from its depth and its 2D distance to the anchor. Apply blur proportional to that strength.
Result:Coverage problem solved — every pixel is now modulated. New problem exposed: the largest detected object is Bradley Cooper's shoulder and shirt, not his face. The gradient is smooth, but the anchor is on the wrong thing.

Since the image is made up mainly of faces, what if we used face detection instead of SAM?
Iteration 4
Process:Added MediaPipe face detection (a 1 MB model that runs on a laptop CPU in 100 ms and returns a bounding box plus six landmarks per face). Picked the largest detected face as anchor.

Result: Anchor lands on the intended face. The blur radiates from the right point.

Anchor variations
With detection separated from selection, swapping which face the image is anchored on became a one-argument change. The expensive analysis (segmentation, depth, face detection) runs once per image and is cached.




A different effect
Finally, rather than blurring, we tried a pencil-sketch tool — a known image-processing recipe (colour-dodge a grayscale copy with its own blurred negative). Same pipeline, same anchor; only the per-pixel operation changed.

We then modulated the same effect with a ring parameter, so the sketch only applies inside a chosen band of distances — leaving both the anchor itself and the far edges of the frame untouched.

Conclusion
Mechanics work. The tool call is available to ASL apps.
But, can AI pick the right tool for the right purpose?
We have our hypothesis on how to do that, and we will be testing it soon.
Pilot 1 · polyglot patch system · Alt Shift Lab