Summary: Object Recognition

Prepared by Group 4

Kevin Driscoll and Seth Stocking [additions by Frawley]

To understand the mind's perception of objects and space one must look into the properties of edge detection, surfaces, the concept of geons, and a priori spatial knowledge.

How does the brain recognize objects?


Two main schools of thought

1.  Template matching 

2.  Structural description

What is template matching?

Template matching is the process by which the mind identifies objects by comparison to [a particular kind of] stored mental representation. According to the theory, the mind carries around a vast storehouse of images that can be compared with visual input. An object is identified by "matching" the mental image. [Thus, in classic form, template matching involves comparing the input with a pictorial, global, or non-decomposed representation of objects.]

Problems with the template method

1.  Large amount of time required to scan mind for match

2.  Different template required for different viewpoints

3.  Match may not occur if object is changed (example: template of coke can [might] fail to match

    a coke can with a large dent in the side.

4.  Where do the templates come from?  Are they innate(platonic ideals) or learned? 

5.  If templates are learned, how are they recognized in the first place?(chicken and egg 

    argument)


What is structural description?

The structural description theory of object recognition states that the mind puts together primitives (features that allow specific recognition) in a system that breaks down objects into simple 3 dimensional shapes.

Problems with Structural Description
1. How detailed is the decomposition of objects? 2. Where do the primitives come from? Are they innate? Are all the distortions of objects learned? Regardless of the final and of object recognition, there appears to be a reliable decompositional process by which visual input is transformed into a 3-D mental representation.

First Stage- Edge Detection (1 dimensional)

Humans are very good at recognizing definite boundaries. An edge can be defined as an area where a sharp change in brightness occurs (texture, color, etc. can be the same) Visual input, in the form of light waves, is received through the eye and filtered then enhanced. These processes can be represented mathematically, but may well occur in a deep or more complicated way. The end result is a sharp difference between two brightness areas. [More technically, edge-detection by a mechanism that can be modeled by the calculaiton of zero -crossings, points in a regularized distribtion of brightness values where the values differ from neutral.] Note- This system evidenced by Mach bands- regions of a uniform surface that appear brighter or darker due to proximity to an edge. [More technically, Mahc bands result from the lateral inhibition of edge-detection cells. This right at the boundary of brightness changes will not receive as much inhibition from one side.]

Second Stage 2 1/2 dimensional sketches

Edges are combined to form an intermediate representation [of a surface]. According to Marr and Nishihara, this occurs in the form of a 2 1/2 dimensional sketch. Surfaces are oriented and their distance is defined but they are not unified into a true 3d view. [More particularly, the intersection of edges makes a surface, and discontinuities in the surface produce a representation of a curved surface, a surface with depth cues: hence 2 1/2 D.]

Third Stage 3D image.

The boundaries of the initial edge detection and the curvature defined in the 2 1/2 D sketch are combined to create a 3D representation of the image. This is not a humuncular image your mind sees, but rather a cognitively impenetrable mental representation. This representation is identified by the processes discussed above. [More particularly, the 2 1/2D sketch feeds the construction of idealized 3D objects -- idealized cones, volumes swept around axes, geons -- that underlie full object recognition.

What is a geon? A geon is a three-dimensional shape that is mentally represented and used to construct mental representations of objects. The number of geons is finite and could be quite small if certain distortions (stretching, bounding, etc.) are built into their construciotn rules. [More particularly, Biederman has identified a samll set of volumetric primitives that are said to underlie 3D object recognition. These are derived from a finite set of atoms and procedures. Their presence in experimental stimuli facilitates recogntion; geons appear to give essential object-sentered information. Problem: while 1D and 2 1/2 D seem reasonable enough computationally and physically, the status of geons as computational or neurocognitive objects seems less certain.]

A Priori Spatial Knowledge

The commonly held belief based on the research of Piaget that babies learn the basic concepts of object permanence, physical knowledge, physical reasoning and search through incremental processes may very well be wrong. [Piaget argues that in children, cognition comes to overtake perception as children develop more and more abstract notions of the world and undergo mental revolutions as they construct their abstract knowledge in interaction with the world: certain stages precede other stages, and the knowledge available at ther later stage is not available at the earlier one. For example, at some point,m children do not have object permanence: out of sight, out of mind.] However, research by Baillargeon [and Spelke, and others] counters this view, and indicates that much of children's spatial knowledge may be innate [or, at least, present in full form much much earlier than Piaget claimed]. For example, infants at the age of 3.5 months have been known to have a concept of object permanence. It seems reasonable then that this knowledge is a priori, that we are born with it. [A more detailed description of Piaget/Ballargeon contrasts will soon be posted.]