Visual memory in real-world scenes
Visual memory in real-world scenes
The world does not fit inside isolated objects
Before entering the article, we can return to the body for a moment.
Eyes.
Breathing.
Neck.
Chest.
Feet.
Now, remember a scene.
A street.
A room.
A school.
A kitchen.
A face in the corner of the image.
An open window.
Light coming in.
An object on the table.
We do not remember the world as a collection of colored squares floating in emptiness.
We remember scenes.
Relations.
Depth.
Direction.
Texture.
Contrast.
Environment.
Possibility of action.
The article “A Population Vector Model of Visual Working Memory for Real-World Scenes”, by John E. Kiat and Steven J. Luck, enters exactly at this point: visual working memory needs to be understood in real-world scenes, not only in simplified artificial objects.
For BrainLatam2026, this article is a direct bridge to the ideas of internal 3D, attention, representational spaces, and 5D Body-Territory.
Because the lived world does not reach the body as an isolated object.
It arrives as a field.
The original question of the article
The central question of the article can be stated like this:
how does visual working memory represent complex real-world scenes?
Visual working memory is fundamental for navigating and interacting with complex environments.
The problem is that much of the traditional research in this area has used simple stimuli: discrete objects, isolated colors, easily separable shapes, and attributes defined by a single value.
Kiat and Luck begin precisely from this limit.
Classical models have been very useful for experimental control, but they struggle to explain how the brain temporarily stores complex natural scenes, similar to photographs, full of contours, continuous gradients, and spatial relations.
The question is valuable because it moves visual memory from an abstract laboratory toward a world closer to life.
After all, in daily life, we do not memorize only “a red object.”
We memorize:
a red object on a table,
in a dimly lit room,
near a person,
in front of a door,
inside an environment that may feel safe, strange, beautiful, threatening, or familiar.
Visual working memory does not only store items.
It temporarily sustains world.
What the article actually investigated
The article proposes a population vector model to explain how real-world scenes can be represented in visual working memory.
The proposal is to represent a scene as a noisy vector of neural firing rates in one or more areas of the ventral visual pathway, estimated through a deep neural network model.
With this, the authors seek to bring behavior, brain activity, and computational modeling closer together in tasks that require storing naturalistic scenes in working memory.
The scientific materiality of the article passes through five main elements:
naturalistic scenes;
behavior in memory tasks;
EEG;
representational similarity analysis;
deep neural networks as a computational approximation of visual representations.
This is an important point.
The article is not only arguing, philosophically, that real scenes are more complex.
It proposes a quantitative way to model that complexity.
And here lies its strength: bringing visual working memory closer to the real world without abandoning formal modeling.
The strength of the article
The strength of this article lies in confronting a deep methodological problem:
how can we study visual memory without impoverishing the visual world too much?
Many studies gained precision by using simple stimuli.
That is useful.
But the price was high.
Real scenes have spatial relations, depth, texture, semantics, salience, continuity, and context.
An isolated glass is not the same as a glass inside a kitchen.
An isolated chair is not the same as an empty chair in a hospital.
An isolated door is not the same as an open door at the end of a corridor.
The article by Kiat and Luck is strong because it attempts to update formal models so they can handle this complexity.
It seeks to predict both behavior and brain activity during the storage of naturalistic scenes.
For BrainLatam2026, this opens a larger question:
what if visual working memory is less like a drawer of objects and more like a temporary architecture of world?
The local optimum of the article
The local optimum of the article lies in the computational and neural modeling of visual working memory for real-world scenes.
It is strong when it shows that complex scenes can be formalized without being reduced to a single simple attribute.
It is also strong when it uses deep neural networks and EEG as bridges between image, behavior, and brain activity.
But through the BrainLatam2026 lens, we can expand the question.
The article models how scenes are represented.
BrainLatam2026 asks:
where do these scenes live inside the body-territory?
And further:
when does a scene stop being an image and become bodily possibility?
Because a real scene is not only visual.
It is spatial.
Affective.
Motor.
Historical.
Territorial.
An image of a square may activate childhood.
A dark street may activate threat.
A classroom may activate shame.
A kitchen may activate belonging.
A landscape may activate enjoyment.
A hospital may activate fear.
The article measures visual memory.
BrainLatam2026 asks about the internal world that this memory organizes.
5D Body-Territory: scenes as internal spaces
In the 5D Body-Territory model, perception is a spatial abstraction produced by the transduction of stimuli.
A visual scene enters through the eyes, but it does not remain only as a retinal image.
It is transduced.
Reorganized.
Associated.
Prioritized.
Transformed into internal space.
This space has:
3D, movement, and qualia.
Internal 3D
The article deepens the 3D dimension because it works with real-world scenes.
A scene has depth.
Foreground and background.
Center and periphery.
Right and left.
Objects in relation.
Possible paths.
Barriers.
Texture.
Perspective.
The visual memory of a scene is not merely “storing pixels.”
It is temporarily sustaining an internal architecture.
BrainLatam2026 would ask:
which parts of the scene occupy the center of the body-territory?
what remains peripheral?
what feels near?
what feels distant?
which elements create depth, opening, or closure?
does the scene expand the field of action or narrow the body?
This is the point at which visual memory stops being only cognitive.
It becomes spatial.
Movement
The remembered scene also moves.
Even when the static image remains still on the screen, the body-territory moves inside it.
Attention travels through the image.
It goes to the light.
Returns to the face.
Shifts toward the door.
Searches for threat.
Recognizes a path.
Compares with a previous memory.
Activates expectation.
Then the scene disappears.
But it leaves a trace.
A recently activated element has a greater ease of returning.
A salient region may dominate the field.
A threatening detail may hijack lived time.
In the BrainLatam2026 model, there is no separate axis of time.
Lived time is derived from the movement of internal spaces.
In visual memory, this means:
the time of the scene is the movement of attention inside the represented architecture.
An image may last seconds in the experiment.
But inside the body-territory, it may open childhood, fear, desire, disgust, longing, or future.
The clock measures exposure.
The body lives movement.
Qualia
A scene is never neutral for the body.
It has qualia.
It may be beautiful.
Heavy.
Familiar.
Strange.
Safe.
Threatening.
Confusing.
Wide.
Suffocating.
Desirable.
The same scene may be processed as a visual configuration, but lived as an affective world.
A classroom may be only “an interior with chairs” for one participant.
For another, it may be a memory of humiliation.
A street may be only an “urban scene.”
For another, it may be bodily vigilance.
A forest may be nature.
For another, it may be Tekoha.
The article works with visual representation.
BrainLatam2026 adds:
every represented scene carries a possibility of qualia.
And qualia changes space.
Attention: what gains the right to exist in the field
Visual working memory does not store everything with the same strength.
Attention selects.
Prioritizes.
Organizes.
Erases.
Reactivates.
When the article critiques models based on simple objects, it touches something BrainLatam2026 considers decisive: the real world does not arrive in perfectly cut units.
Natural scenes have complex contours, continuous gradients, and spatial relations that challenge simplified models of memory.
In the body-territory, attention works as an internal diplomacy.
It decides:
what enters,
what remains,
what disappears,
what returns,
what threatens,
what calms,
what becomes a path,
what becomes noise.
But attention is not neutral.
It is modulated by hunger, sleep, trauma, desire, culture, language, fear, belonging, algorithm, school, family, and State.
A real scene is not seen by the eye alone.
It is seen by the whole body.
APUS: scene as field of action
APUS is extended proprioception.
It is territory entering through body position, space, architecture, distance, posture, and field of action.
A real scene does not ask only:
what am I seeing?
It asks:
where can I go?
can I pass through?
can I hide?
can I sit?
can I escape?
can I touch?
can I approach?
can I inhabit?
The visual memory of a real scene sustains possibilities for action.
It is a spatial preparation of the body.
That is why BrainLatam2026 would say:
visual memory of real-world scenes is temporary APUS.
A visual space maintained for a few seconds can guide movement, decision, search, avoidance, planning, and safety.
The article does not need to use this language to open this path.
It already gives us the foundation: real-world scenes require more ecological models of memory.
Tekoha: scene as internal state
Tekoha is extended interoception.
It is territory entering through the body’s internal states.
A scene can change breathing.
Accelerate the heart.
Relax the shoulders.
Tighten the jaw.
Produce belonging.
Produce vigilance.
A photograph of home can warm the chest.
An image of violence can close the body.
A forest scene can expand breathing.
A hospital scene can anticipate pain.
Visual working memory, in this sense, is not only storage.
It is momentary regulation of the body before a represented world.
That is why a BrainLatam2026 proposal would ask:
which Tekoha does each scene activate?
Does the scene expand safety or activate threat?
Does it open Zone 2 or hijack the body into Zone 3?
Does it produce enjoyment or vigilance?
Jiwasa: scenes are never only individual
A scene can also carry the collective.
A classroom carries school.
A church carries religion.
A stadium carries the crowd.
A square carries the city.
A house carries family.
A hospital carries State, care, and fear.
A football field carries shirt, opponent, belonging, and dispute.
An image of a periphery can be read by one body as home and by another as threat, depending on the Jiwasa that organized perception.
Visual memory of real-world scenes is also memory of belonging.
The article investigates representations of naturalistic scenes in visual memory.
BrainLatam2026 asks:
which Jiwasa is already inside the scene before the person even remembers it?
Because no social scene arrives empty.
It arrives with collective history.
With codes.
With inequality.
With racism.
With desire.
With advertising.
With architecture.
With algorithm.
With politics.
DNA Intelligence and Artificial Intelligence
This article also allows us to discuss DNA Intelligence and Artificial Intelligence.
DNA Intelligence is information lived in the body.
It is the body learning scenes.
Recognizing places.
Differentiating safety from risk.
Storing paths.
Creating internal maps.
Remembering faces, corners, doors, lights, threats, hiding places, openings.
It is life organizing space in order to continue living.
Artificial Intelligence appears as part of the methodological path: deep neural networks help estimate visual representations and bring computational models closer to human memory for real-world scenes.
This is powerful.
But BrainLatam2026 maintains the question:
is AI helping us understand DNA Intelligence, or is it replacing the body with a model?
A neural network can estimate patterns.
Compare images.
Model representations.
But it does not feel belonging.
It does not feel fear of the street.
It does not feel longing for home.
It does not feel the body relaxing before a landscape.
It does not live the cost of being in a territory.
AI organizes visual traces.
DNA Intelligence lives scenes.
The science of the future needs to make both converse without confusing one with the other.
Generous decolonial critique
Like every study situated in a specific scientific context, this article opens space to ask how visual working memory for real-world scenes appears in Latin American, collective, and non-WEIRD contexts.
Which scenes do we use in experiments?
Scenes from where?
Produced by whom?
For which bodies?
A North American kitchen, a European street, a university room, a Brazilian favela, a Latin American public school, an Indigenous village, and a bus terminal are not equivalent from a body-territorial point of view.
They may all be “real-world scenes.”
But they do not activate the same 5D spaces.
They do not carry the same Jiwasas.
They do not produce the same qualia.
They do not organize the same Tekoha.
BrainLatam2026 does not ask this to diminish the article.
It asks this to expand its strength.
If we want to study real-world scenes, we need to ask:
real for whom?
real in which territory?
real for which body?
BrainLatam2026 experimental proposal
From this article, BrainLatam2026 could propose an experiment:
How do real-world scenes from different territories reorganize visual working memory, the 5D body-territory, and the sense of belonging?
Possible design:
participants from different social and territorial contexts;
scenes of home, school, street, hospital, square, church, work, transportation, and nature;
familiar and unfamiliar scenes;
safe and threatening scenes;
high and low social-density scenes;
comparison between standardized images and images collected in the participants’ own territories.
Measures:
EEG for fast dynamics of visual representation and attention;
fNIRS for prefrontal hemodynamics during scene maintenance in memory;
eye-tracking for salience, visual search, and attentional return;
HRV/RMSSD for autonomic regulation;
breathing for rhythm and lived time;
GSR for alert;
facial/jaw EMG for tension;
phenomenological report for qualia, Tekoha, and belonging;
computational analysis of images with artificial vision models;
analysis of the Jiwasa associated with each scene.
The question would not be only:
which scene was remembered better?
The question would be:
which scene occupied more space in the body-territory?
And further:
which scenes expand Zone 2?
which scenes hijack the body into Zone 3?
which scenes produce belonging?
which scenes activate vigilance?
which scenes return more easily after attentional shift?
DANA and care with images
This type of research also requires DANA.
Images are data.
But images are also territory.
A scene of a periphery can become scientific data.
But it can also become stereotype.
An image of a public school can become an experimental stimulus.
But it can also carry inequality.
A scene of home can reveal intimacy.
An image of a street can expose vulnerability.
DANA asks:
who chooses the images?
who authorizes them?
who interprets them?
who benefits?
does the image care for the body-territory or capture its world?
In the study of real-world scenes, ethics is not only consent.
It is visual diplomacy.
Closing
The article by Kiat and Luck matters because it tries to bring visual working memory closer to the world as it appears: complex, continuous, relational, and full of scenes.
It proposes a population vector model for real-world scenes and opens a path for thinking about how behavior, brain activity, and computational modeling can dialogue when the stimulus stops being an isolated object and becomes visual world.
For BrainLatam2026, this article opens an essential path:
the world does not fit inside isolated objects.
Memory does not either.
Perception does not either.
Consciousness does not either.
Real-world scenes are representational spaces.
They enter the body as 3D.
They move attention and lived time.
They carry qualia.
They activate APUS.
They reorganize Tekoha.
They summon Jiwasa.
And they can be approached by AI, as long as DNA Intelligence remains at the center.
The question that remains is:
if visual memory stores scenes, what worlds are we placing inside bodies when we choose what science calls a stimulus?
Highlighted reference
Commented article:
Kiat, J. E., & Luck, S. J. (2026).
A Population Vector Model of Visual Working Memory for Real-World Scenes.
Journal of Experimental Psychology: General, 155(5), 1257–1281.
DOI: 10.1037/xge0001921.
This article is the main basis for this BrainLatam2026 commentary. From its proposal of a population vector model for visual working memory in real-world scenes, we expand the discussion toward internal 3D, attention, representational spaces, 5D Body-Territory, visual APUS, Tekoha of scenes, Jiwasa of images, DNA Intelligence, and the question of how to study visual memory without reducing the lived world to isolated objects.