Since the early days of western philosophy, sight (vision) has often been understood as the noblest and highest of the five senses (the others are hearing, smell, taste, touch). Plato, for instance, wrote in his Timaeus (1997, 47a) that “our sight has indeed proved to be a source of supreme benefit to us” because of its close connection to the soul (the mind’s eye), intellectual activity, and the ways in which we inquire into and understand the world around us.
For example, we usually use the expression ‘I see’ to express ‘understanding’ and in many ways regard the sense of vision as the most important. It is therefore no surprise that today’s society is increasingly becoming visually stimulated and determined. Think, for example, of all the pictures and (moving) images you encounter on a daily basis in a host of various media – from paintings, drawings, cinema, from the visuals in advertisements to those on ubiquitous social media platforms such as Instagram and Snapchat. On Tinder, a dating app, one decides the romantic fate of a whole person by swiping their image either to the left, relegating them to the dust heap of history, or to the right, signalling interest. This move toward the social rule of images was not lost on the German philosopher Martin Heidegger (1889-1976), who stated in his essay “The Age of the World Picture” (1977, 130) that we live in a “world conceived and grasped as picture”. William J. T. Mitchell, a contemporary theorist of visual representation, has called this the visual or “pictorial turn”, stating that “The fantasy of a pictorial turn, of a culture dominated by images, has now become a real possibility on a global scale” (Picture Theory, 1994, 15).
In philosophy, as in daily life, seeing is intricately linked to understanding, with images (visual objects) functioning as preferred carriers of meaning and understanding. However, a closer look at visual practices demonstrates that the connection between sight, visual imagery, and understanding and interpreting them, is significantly more complex than one might think. Hence, a number of questions come to the fore that require answers, such as: do all people see images in exactly the same way? Are sight and the interpretation of its objects universally encoded with the same significance? When it comes to the anatomy of vision, can neural patterns change (significantly) or are they fixed? To examine these questions, this essay delves into a number of visual examples and explores their link to several important and recent concepts of seeing and understanding images.
Duck or Rabbit?
A cartoon by Paul Noth for a 2014 issue of The New Yorker magazine illustrates the extent to which a global, uniformly shared response to visual stimuli is at least questionable:
What do you see here? The head of a duck or a rabbit?
The duck-rabbit image is not new. Ludwig Wittgenstein famously used it many years ago in his Philosophical Investigations (2009, Part II, xi.125; PI), writing: “I see two pictures, with the duck–rabbit surrounded by rabbits in one, by ducks in the other. I don’t notice that they are the same. Does it follow from this that I see something different in the two cases?” Wittgenstein’s answer (PI, xi.129-30) was that you could indeed see them both, by quickly oscillating between one and the other, but never simultaneously. Thus, you would indeed be left with two differently interpretations of what is seen. Wittgenstein called this visual experience ‘seeing as’ (also known as ‘aspect perception’): depending on which aspect of the image you are noticing or seeing now (the rabbit’s ears or the duck’s bill), you interpret it either as a rabbit or as a duck, while the image itself stays physically unchanged. Thus, the same unaltered image can look quite different despite being physically one and the same.
One may, however, wonder about images that don’t play upon optical illusions or are deemed ambiguous, like the duck-rabbit image. Do we see and interpret them differently, too? Art historians such as Erwin Panofsky (1892-1968) were especially interested in the ways we see images (or, indeed, culturally ‘coded’ artefacts and general performances) and adapt methods of visual interpretation accordingly. In his essay “Iconography & Iconology: An Introduction to the Study of Renaissance Art” (1955), Panofsky distinguished between pre-iconographical description, iconographical analysis, and iconological interpretation of visuals in art and lived experience. To illustrate this point, Panofsky offered the following example of an everyday experience: imagine taking a walk and seeing a man on the other side of the street tipping his hat towards you. By recognizing him as a man who performs an action, namely, taking off his hat, you have already taken the first descriptive, pre-iconographic step. Suppose, further, that you are European and familiar with many of its typical cultural customs and practices. In that case, you interpret the meaning behind the gesture of raising one’s hat as a salute and an indication of the other’s presumable friendliness and politeness towards you. These latter interpretations exemplify the levels of iconographical/iconological understanding.
The same progression seems to apply to images. In Panofsky’s view, the interpretations of images rely on recognizing the lines and colours as figures and objects (pre-iconographic level) and identifying them as specific figures and objects, based upon the viewer’s background knowledge of possible literary sources (e.g., biblical, mythological) and contexts (time, place, society, politics, etc.). Panofsky’s central message was that you need to be aware of the contextsrelated to images in order to see and properly decode their conceptual or representational/symbolic content and cultural codes.
Wittgenstein and Panofsky seem to share something here in their thinking. For both, seeing, understanding and interpreting an image involve possible contrary interpretations, influenced by differing aspects, contexts, or background knowledge. To illustrate this point further, let us consider the following emoji that is used in the world of WeChat, a highly popular Chinese instant messaging app:
How would you interpret this emoji?
This emoji is a stylized representation of a typical and frequent gesture. In China, the emoji’s caption reads ‘Salute’, suggesting, at various times, a person’s gratefulness for another person’s help or collaboration, thanking somebody, and indicating respect. In non-Chinese cultures, however, performing a salute in this fashion is uncommon¾hence not a part of everyone’s background knowledge. Moreover, because of one’s unfamiliarity with the performance being represented here, one might even interpret the emoji in a way diametrically opposed to the Chinese usage, as threatening violence. Such an interpretation might occur to someone from a Western tradition, where a similar gesture is used to communicate the threat of conflict. The difference in backgrounds thus transforms the way the emoji is perceived and understood: instead of signalling collaboration, respect, and thanks, one sees the opposite. Moreover, with the broad validation of the fist bump during COVID, this gesture is again changing, further signalling the need for contextualization when attempting to decode what we see. This, then, brings us back to Panofsky’s point: knowing and recognizing the contexts (here, culture-based practices and uses of emoticons) guides one’s vision and allows one to grasp the representational and conceptual content of an image or cultural performance. It is context and background knowledge that enable us to see and understand their meanings.
What Colour is that Dress: Gold/White or Black/Blue?
Consider also the following example taken from the world of cyberspace.
In February 2015, the above image was posted on Facebook in order to help decide once and for all the question of the colour of the dress depicted, as there had been different interpretations earlier. The image went viral and in the days following the post, with much of the virtual world heatedly debating the dress’s colour, and thus leading to a global discussion about images and sight. Twitter and Buzzfeed were flooded with tweets and posts, with two factions bitterly opposing each other: those who saw the dress as gold and white and those who saw it as black and blue. In the end, Adobe, one of whose programmes had been used to render the image, stepped in and stated that the picture had indeed depicted one of the two colour combinations: the dress itself was black and blue. This, however, did not end the discussion; rather, the issue now moved on to technical inquiries of image usage in creating and maintaining communities of meaning. The example then clearly illustrates that when it comes to vision, most people will readily have an answer for inquiries about what they (think they) see. And indeed, communal and consensual living requires a minimal consensus of what people see. Yet, this example also demonstrates that, to many people’s surprise, this consensus is far from assured. Still today, the debate, formally known as “Dressgate,” rages on.
What is particularly interesting here is that the different ways of seeing the identical image of “The dress” have little to do with Wittgenstein’s concept of ‘aspect perception’: irrespective of the actual colour of the dress in reality, you cannot flip back and forth between two different perceptions in order to see both colour combinations. Similarly, Panofsky’s approach is of little help here: knowing and recognizing the underlying concepts/contexts doesn’t guide you in viewing the dress as either white/gold or black/blue. The differences in how the dress is perceived are not sensitive to such aspectual or cultural factors.
Furthermore, the perception of colours can be approached from various disciplinary angles, such as art history, medicine, philosophy, physics, and psychology. Many physicists, for instance, don’t think that physical objects are coloured in the ways we believe we see them. As the Australian philosopher Barry Maund (‘Color,’ 2022) put it: “Oceans and skies are not blue in the way that we naively think, nor are apples red (nor green)”. Instead, the colours of the objects we see depend upon the light source and the way in which light interacts with objects, as well as upon our eyes and brains. Some philosophers also argue that colours are response-dependent properties. In this view, what colour an object is depends not only on its intrinsic physical features, but also on how viewers, given specific circumstances, respond to this object. For example, an object is perceived as having a certain colour, say red, when a “normal” perceiver (as opposed to someone who wears colour-tinted glasses or is colour-blind), under certain circumstances, say natural daylight, responds to the object with the sensation of redness. Seeing colours is, hence, a relational phenomenon that connects perceivers (with specific capabilities) to an object (with specific properties) in a certain environment. If even one of the three related variables changes, so does the perception of colours (see Jackson and Pettit, “Response-Dependence without Tears,” 2002, for an account of this).
“The dress” might be a good case in point here: the image of the dress is framed in a particular way, in which the environmental light is ambiguous. Because that image frames the dress in such a way that we cannot unambiguously determine whether the room as a whole is dark or well-lit, we may see the dress as differently coloured. In much the same way that white objects appear a pale blue at night, our visual system may interpret the saturation and brightness of the dress as indicative of white and gold in a dark room. The photograph might also have been overexposed. This allows normal perceivers to react differently to it, seeing the dress as white/gold or black/blue. Thus, even in a case which, on the surface, appears to be clear-cut and as unambiguous as perceiving colours, things turn out to be made up of sophisticated and contingent or conditional relationships. The colours we see turn out to be a consequence of the ambient light in our environment, the way objects in this environment absorb, reflect, and refract this light, and the manner in which our visual system integrates or responds to this information.
Do We All See Images in Exactly the Same Way?
To return to our introductory question, in the light of the above, the answer is: probably not. Given the evidence and its philosophical discussion, it seems highly unlikely that humans share something like “stable” sight or “pure” vision that would allow them to see and interpret the world in exactly the same way. Or, to put it another way, we don’t see the world through naïve eyes (what the media theorist Friedrich Kittler called the “unarmed eye”); seeing an object is not sufficient for understanding it. Instead, our discussion of the visual examples above has revealed how differently one and the same image can be seen. The same image can invite different visual understandings and interpretations: aspectual, contextual, or relational (to highlight only a few). Awareness that our sight and the interpretation of the content and meaning of visuals is not universally encoded and as simple as we think is a first step towards navigating and understanding our image-laden world.
Susanne Beiweis is adjunct assistant professor at Mount Allison University. Holger Briel is Professor in the Division of Culture and Creativity at Beijing Normal University-United International College.