EssaysPart 1: Foundations
4

From Temporal Cuts to Spatial Composition

The cut migrates from the timeline to the body.

18 min read

I. The Amoeba

Before I understood montage as a theory, I was living inside two different versions of it every weekand the difference between them taught me more about spatial composition than any seminar.

In the pit orchestra, montage was vertical and conducted. Nine musicians, a conductor on a raised platform, a score that determined every note. I held one musical line while simultaneously hearing seven others, tracking the conductor's tempo while feeling the drummer's kick in my sternum, counting measures until my next entrance while the horn section played a melodic line that would resolve into the chord I was about to strike. Multiple streams held in simultaneous relationship, and the meaningthe emotional impact on the audiencearising from the collision of those streams. This is Eisenstein's vertical montage: the simultaneous interplay of elements at any given moment, the collision that produces ideas neither element contains alone.

But the bluegrass session circle was a different montage entirely. No conductor. No score. No hierarchy of chairs. The session is an amoebaself-regulating, leaderless, the roles distributed across the bodies in the circle like organs in a single creature. The bass holds the 1 and the 3. The mandolin chops on the upbeats. The guitar's strum braids those two together. The fiddle provides pads or lead. And the banjothe perpetual motion machinebraids subdivisions of rhythm through harmonic and melodic information, a continuous stream that holds the organism together.

In the pit, the conductor determines what collides with what. The montage is designedcomposed, scored, rehearsed. In the session circle, the montage self-assembles. Nobody decides that the banjo's roll will collide with the fiddle's sustain at this particular moment. The collision emerges from the coupling between bodies that have internalized the same musical DNAthe same core repertoire, the same harmonic vocabulary, the same rhythmic conventionsand are improvising within that shared constraint. The meaning arises not from a designer's intention but from the organism's self-organization.

These are two topologies of montage. The pit is top-down: a single intelligence (the conductor, the score) organizing the collision of simultaneous streams. The session circle is bottom-up: multiple intelligences negotiating collisions in real time, with no central authority. Eisenstein's montage theory, developed for cinema, assumes the first topologyan editor controlling the sequence and timing of collisions. But the session circle suggests a second kind of montage, one where the collisions are emergent rather than composed, and the compositional intelligence is distributed across the ensemble rather than concentrated in a single author.

XR can build both. But the secondthe emergent, self-organizing montage of the session circleis what the medium uniquely enables. In cinema, the editor has total control. In XR, the participant moves freely through a field of simultaneous elements, and the collisions they produce are not determined by a designer but negotiated in real time between the participant's body and the environment. The participant is not receiving a montage. They are performing onethe way I perform one every time I sit in a session circle and let the amoeba assemble itself around me.

II. The Oldest Montage Technology

Peter Meineck's Theatrocracy argues that the Theater of Dionysus was precisely this kind of simultaneous collision enginethe original spatial montage technology, two thousand years before Eisenstein had the concept. Fifteen thousand Athenian citizens gathered in a bowl-shaped amphitheater, looking down at masked performers who enacted stories of moral crisis drawn from the same mythic tradition I was studying at Columbia. The architecture of the theater was designed so that every spectator could see not only the performance but also the other spectators. You watched the play and you watched Athens watching the play. You heard the chorus's voice reverberating off stone and you felt the crowd's breath around you. You processed the narrative and you processed the collective processing of the narrative. Multiple streams, simultaneously.

The masks were montage elements. A single actor wearing the mask of Agamemnon carried in his body the simultaneous presence of the performer and the charactertwo realities occupying the same coordinates, the audience's perception oscillating between them. The mask did not hide the performer. It created a collision between the human face you knew was underneath and the archetypal face you saw. The meaningthe specific emotional and political charge of the performancewas produced by the friction between those two layers. Neither the actor nor the mask alone produced it. The collision did.

This is the thaumotrope at civic scale. The performer on one side of the disc, the mask on the other, and the characterthe thing the audience experiencesproduced by the spinning between them. Meineck understands that the theater's power was not representational but experiential: the space restructured the bodies that entered it through synchronized spectatorship, shared acoustics, the collective experience of watching and being watched watching. The proscenium arch, which arrived centuries later, destroyed this by separating audience from performer and audience from audience. The montage collapsed into monologue. The vertical became horizontal.

III. Eisenstein's Collision

Sergei Eisenstein's montage theory begins with a simple, radical claim: meaning arises from the collision of elements, not from the elements themselves. Shot A means nothing. Shot B means nothing. But Shot A followed by Shot B produces an idea that neither shot contains. This is not additiveA plus B does not equal AB. It is dialecticalA colliding with B produces C, something qualitatively new.

Eisenstein identified five types of montage, each producing a different kind of collision. Metric montage cuts according to absolute time intervals, creating rhythm. Rhythmic montage follows the visual rhythm within the frame rather than the clock. Tonal montage works through the emotional tone of the image. Overtonal montage layers multiple tonal qualities. And intellectual montage juxtaposes conceptsthe most famous example being the sequence in October that intercuts images of a mechanical peacock with footage of Kerensky ascending a staircase, producing the idea of vanity without either image stating it.

What matters for our purposes is not the taxonomy itself but the underlying principle: montage is a compositional practice. It composes meaning from the relationships among elements. The editor's art is not choosing the right shots but choosing the right collisionsthe adjacencies that produce ideas neither element contains alone.

The Kuleshov effect proves this experimentally. The same footage of an actor's neutral face, intercut with a bowl of soup, a dead child, and an attractive woman, produced three different emotional readings: hunger, grief, desire. The face did not change. The collision did. The audience's experience was not in the image. It was in the cut.

The Talmud has been doing this for fifteen hundred years. A page of Talmud is a designed montage surfacethe Mishnah at the center, the Gemara surrounding it, Rashi's commentary running down one margin and the Tosafot down the other, later commentators crowding the edges. The reader's eye performs cuts between these voices, and the meaning of any passage is produced not by what any single commentator says but by the collision between their readings. The Mishnah states a law. The Gemara questions it. Rashi explains. The Tosafot disagree with Rashi. The reader holds all four voices simultaneously and the understandingthe thing that actually forms in the reader's mindexists in none of them individually. It is produced by the collision. Eisenstein's intellectual montage, stated as hermeneutic architecture, running on parchment instead of celluloid.

Gilles Deleuze, in Cinema 1 and Cinema 2, reframes Eisenstein's montage through a more fundamental distinction. The movement-image organizes perception through sensorimotor links: a situation provokes an action, the action transforms the situation, and the cycle continues. This is paratactic montagethis happens, then that happens, then this happens. It works through sequence and causation.

The time-image emerges when those sensorimotor links break. Characters find themselves in situations they cannot act on. Time is no longer subordinated to movement; it surfaces directly. The time-image does not show what happens next; it shows time itselfduration, memory, anticipationas a force the characters and the audience must inhabit rather than traverse.

The time-image is hypotactic. It holds multiple temporal layers in suspensionthis moment, the memory it evokes, the future it anticipatesand asks the viewer to feel the relationships among them rather than follow a sequential chain. Deleuze's distinction maps onto Alison's parataxis and hypotaxis: the movement-image is paratactic (additive, sequential, one-thing-after-another), while the time-image is hypotactic (embedded, recursive, the-part-contains-the-whole). The pit orchestra is hypotactic. The Talmud page is hypotactic. The Theater of Dionysus is hypotactic. Every moment contains all the other moments folded inside it.

IV. The Cut Migrates to the Body

In cinema, the cut is the editor's tool. It happens between frames, at a precise moment chosen by someone who controls the timeline. The viewer receives the cut passivelyit structures their attention without their participation.

In XR, the cut migrates to the body.

The participant's head turn is a cuta transition from one visual field to another, initiated by the participant at a moment of their choosing. A step forward is a cuta shift in perspective, in scale, in what's visible and what's occluded. A reaching gesture is a cuta decision to engage with one element rather than another, to foreground this and background that. Every movement is an editorial decision, whether the participant knows it or not.

This means that in XR, the participant is simultaneously the viewer and the editor. They are experiencing montage and performing it at the same time. The designer does not control the sequence of cuts; they compose the environment in which cuts will occur. This is composition, not direction. It is closer to what a composer doeswriting the score that the orchestra will performthan to what a film editor does.

Abraham Burickson, in his work on frames, provides the crucial concept for understanding how this composition works in practice. Frames are "the lines we draw around a thing so we know what it is and what it is not." The proscenium arch is a frame. A museum wall is a frame. A change in lighting from one room to the next is a frame. And each frame structures a different kind of attentiontells the body what kind of experience it is having, what responses are appropriate, what meanings are available.

In XR, frames are spatial. A doorway is a frame. A change in ambient sound is a frame. A shift in floor texture is a frame. A boundary between biomes is a frame that the participant crosses with their whole body, not just their eyes. And each frame crossing is a cuta collision between the experiential state on one side and the experiential state on the other. The designer of a spatial montage is, fundamentally, a designer of frames. They are deciding where the cuts can happen, what kinds of collision are possible at each transition, and how thick or porous each frame boundary should be.

A thick framea dramatic threshold, a long corridor, a blackout between scenescreates a hard cut. The collision is sharp, the juxtaposition stark. A thin framea gradual shift in lighting, a slow crossfade of ambient sound, a permeable scrimcreates a dissolve. The participant may not even register the transition consciously; they simply find themselves in a different experiential state, the way you find yourself in a different mood without noticing the moment it shifted.

V. The Folk Corpus as Modular Montage

Before I understood any of this theoretically, I was performing it musicallynot just in the pit but in the folk tradition itself. The Appalachian murder ballad corpus, which I discuss at length in Essay 11, is a montage system at civilizational scale. The same lyric fragments"down by the river," "she knelt down beside him," "the flowers they grew over her grave"migrate between songs, and each time a fragment appears in a new song, it carries the ghost of every other song that used it. The listener who knows the tradition hears the collision: this phrase in this context, but also this phrase in that other context, and the meaning produced by the friction between the two appearances.

This is Eisenstein's intellectual montage stated as folk practice. The lyric fragment is the shot. The song is the sequence. The listener's knowledge of the tradition is the editorial intelligence that perceives the collision. And Murder Ballad: The Gamethe card-based generative songwriting system I built from this insightis a montage engine. Each card carries a lyric fragment. The spatial arrangement of cards on the table produces the narrative. The player is simultaneously the performer and the editor, assembling collisions from a finite set of modular elements. The tarot spread as editing room. The folk tradition as montage archive.

VI. The Viewer Who Is Also the Editor

Here is the fundamental challenge of spatial montage: the participant is both viewer and editor, and the designer cannot control their cuts.

In film, the director has absolute control over the sequence and timing of cuts. Hitchcock's genius, as Hasson's neural coupling research shows, is precisely this controlthe ability to orchestrate the audience's brain activity by determining exactly what they see and when. In XR, the designer gives up this control. The participant chooses where to look, when to move, how long to dwell. The designer can influence these choicesthrough spatial composition, lighting, sound, and the placement of attention magnetsbut they cannot determine them.

This is not a limitation. It is the medium's specific power.

Jesse Schell, writing about theme park design, calls it "the art of the indirect." The guest at a theme park is not told where to go or what to look at. The designer shapes the environment so that the guest's natural curiosity, their embodied tendenciesto follow light, to approach sound, to explore openings, to move toward the novelleads them through a composed sequence of experiences. The composition is real. The editorial control is indirect. The guest feels free while navigating a designed field of possibilities.

Alison, writing about fiction, notes that the reader of a novel travels "not just through places conjured in the story, but through the narrative itself." Neuroscientist Rolf Zwaan's work shows that readers construct spatial models of narrative environments in working memorythey "walk" through the story in a way that activates motor and spatial processing areas. The path each reader traces through the text is uniqueshaped by reading speed, attention, rereading, skipping, dwelling. In XR, this movement is not virtual. The body actually moves through space, and the shape it traces is visiblerecorded in tracking data, inscribed in the path the participant's feet describe on the floor.

The designer's job is not to determine this shape but to compose the environment so that the shapes available to the participant are all meaningful ones. To design a space in which every path through it produces a coherent montagenot the same montage, but a montage that coheres. Over six billion possible orderings through fifteen rooms, and every ordering tells a story. This is the dependency graph problem that Essay 7 addresses, and it is also the Seder's solution: a fixed architecture that generates infinite personal experiences because the architecture is designed for traversal, not for a single correct path.

VII. Spatial Montage Techniques

With the theoretical framework in place, here are specific techniques for composing spatial montage in XR:

Proximity-triggered reveals align discovery with locomotion. Approaching an object causes related layers to fade incontext, history, associated memories. The participant's curiosity becomes the editorial impulse: moving toward something is choosing to cut to it. The collision between the object as initially perceived and the object as revealed through proximity creates the montage effect.

Gaze-directed branching uses eye tracking to route attention. What you look at grows; what you ignore fades. The participant edits the experience with their gaze, but unlike a film editor, they are making cuts they may not be consciously aware of. The gaze as unconscious editorial intelligencethe body choosing before the mind decides.

Ambient layering creates vertical montage through environmental sound, light, and atmospheric effects that shift as the participant moves through space. Unlike triggered events, which are paratacticthis happens, then that happensambient layers are hypotactic: they create a continuous field of simultaneous information that the participant samples through their position and orientation. This is the pit orchestra as environmental design.

Simultaneous events place multiple unfolding narratives in the same space, as in Punchdrunk's Sleep No More. The participant cannot see everything; their path through the space creates a unique editorial sequence from a field of simultaneous possibilities. The meaning is not in any single narrative thread but in the specific montage each participant's body composes by moving through the field. Six floors, dozens of rooms, multiple storylines unfolding in parallelthe participant is Eisenstein's editor, cutting between them with their feet.

Threshold composition uses Burickson's frames as structural cuts. Each transition between spaceseach doorway, each change in floor texture, each shift in lightingis a designed collision between experiential states. The designer composes these thresholds the way a film editor composes cuts: for rhythm, for contrast, for the specific idea that the collision between adjacent states will produce.

VIII. Measurement: Did the Montage Work?

How do we evaluate spatial montage?

Path topology can be analyzed: what shape does the participant's movement describe? Linear, paratactic, sequential? Or complex, hypotactic, returning, spiraling? More complex paths generally indicate deeper engagementthe participant is performing more cuts, creating more collisions, exploring more adjacencies.

Collision density measures how many meaningful juxtapositions the participant encounters. A collision occurs whenever the participant's movement or gaze creates an adjacency between elements that produce a third meaning through their juxtaposition. Higher collision density suggests a richer montagethe environment is producing more ideas per unit of traversal.

Dwell patterns reveal what the participant found compelling enough to stop for. In temporal montage, the editor controls pacing. In spatial montage, the participant controls pacing through their dwelling behavior. Dwell patterns are the participant's intuitive assessment of where the collisions are richest.

Return frequency measures how often the participant revisits spaces or elements. Returns are the spatial equivalent of rereading a passagethey indicate that the participant sensed depth, unresolved meaning, something worth circling back to. Returns are hypotactic behavior: the participant is folding earlier experience into later experience, creating the recursive structure that Deleuze's time-image demands.

And retrospective coherence asks: when the participant looks back at the experience, do they perceive a unified composition or a random walk? Does the "numinous shape" that Alison attributes to readingthe trace left in the mind by the movement through the textemerge from the spatial montage? The ultimate measure of montage is whether the collisions compose into something the participant carries with them after the experience ends.

IX. What the Pit Taught Me

The pit orchestra dissolved the boundary between performing and experiencing. You were inside the music and you were the music. Your body was the instrument and the audience's body was the resonating chamber. The montage happened in the space between all of usmusicians, conductor, actors, audienceand nobody controlled it. Not the conductor. Not the composer. Not any single performer. The montage was an emergent property of all those streams colliding in a shared space at a shared moment.

I did not know, sitting in that pit, that I was learning the principles of spatial composition. I did not know that the simultaneous layering of musical streams, the vertical montage of melody and harmony and rhythm, the felt sense of multiple temporal layers held in the body at onceI did not know that these would become the foundations of my understanding of how to design experiences in XR.

But the body remembers what the mind does not name. And when I first put on a VR headset and moved through a designed spacehearing sounds from multiple directions, seeing events unfold at different distances, feeling the environment respond to my movementI recognized the pit.

The cut had migrated from the timeline to the body. The montage had migrated from the editing room to the space. And the designerlike the conductor, like the composerwas not determining the experience but composing the conditions for experience to emerge from the collision of simultaneous streams.

This is the thaumotrope once more. The editor on one side of the disc, the viewer on the other, and the participantthe body that is both at once, cutting and being cut, composing and being composedproduced by the spinning between them. Eisenstein wanted this. Meineck showed it had already existed. The pit taught my body what both of them described. XR is the medium that finally lets us build it.