Thursday, August 20, 2009

Psychology and Vision: Relevancy

References:
[1] E. Rosch, C. Mervis, W. Gray, D. Johnson, and P. Boyes-Braem, ``Basic Objects in Natural Categories'', Cognitive Psychology, 8:382--439.
[2] Biederman, I. (1987). Recognition--by--components: A theory of human image understanding. Psychological Review, 94(2):115--147.

Which paper is more relevant to Vision?

The paper by Rosch et. al. defines and argues about the existence of “basic category” which carry the most common information, has the highest category cue validity and are more differentiated from others. Categories one level more abstract is called super-ordinate category whose members share only a few attributes among each other and categories below the basic level is termed subordinate categories that contain many attributes which overlaps with other categories. Further, they also showed from their experiments on visual detection and priming of classification, that basic level is the most abstract level at which perceptual identification of an object could be aided.

The second paper by Biederman proposes a theory of human image understanding – Recognition by Components (RBC), which is based on geometrical cones (geons) which can be derived from edge properties in image. They further argue that human visual system parses the regions of concavity to determine the primitive components first, and then matches the arrangement to the pre-stored representation to identify the object. Also, contour based features are more efficient than color and texture in most categories.

Both the paper provides results and conclusions which gives insight into perception and recognition process of human brain. In Computer Vision (CV), one of the central problems is to recognize objects in a image. Since human visual system and reasoning for recognizing object is very developed and efficient, it makes sense, to find how it works and what are the steps involved. At the same time, a vision system is mostly interested in recognizing a particular category or object type, not everything that exists, in this world. Because the purpose of a CV system is to aid humans in automating certain process, we are interested in building a particular object detector or recognizer, specific to the process. So the knowledge about existence of a basic level do not help much except that it will be known, that recognizing a basic category will be easier as compared to super and sub- ordinate category. For example, it will be difficult to recognize both - “furniture” or “a sleeping chair”, than simply a chair. But it does not reduce the complexity of a given problem, just reasons why certain object recognition task is difficult than others.

In my opinion, the second paper, to some extent, is more relevant to CV as the theory proposed, if not for everything, at least in some context, can be used for making a object recognition system. Building a primitive component recognizer, and then based on the arrangement and edge properties, identification of a particular object may work well in some cases. But before this task is done, the problem of matching a particular component and finding relationship between them is itself a difficult problem. While 3-D to 2-D transformation produces a unique image, 2-D to 3-D may have multiple possibilities of arrangements in 3-D. RBC can be successful if we are able to recover the full arrangement and relation among the components. Solving this itself has been a challenging. A given image of objects can come from different object arrangement and view point. Although we can provide this knowledge in the vision system to some extent, it may fail when exception occurs. We humans due to other senses and capability of using and relating past experiences perform well, even during the exceptional case, but for a vision system this is not trivial. More interestingly, even for humans recognizing object in images is more difficult than in 3-D world. For example, there exist, many image based optical illusions which even confuses human brain. I do not recall many 3-D illusions, except few. We have to admit the fact that 2-D image formation has resulted in loss of information, and trying to figure out objects in 3-D world can always be tricked. But we are interested in average performance of any vision system and mostly in non-exceptional scenarios and hence achieving that should not be so impossible. The second paper also proposes that cue based on primal sketch is more important than color and texture in many cases. This is also useful when vision system is designed, we can prioritize the features or cue used. But the bottom line is, unless a CV system has sufficient knowledge of 3-D world and physical principles governing the image formation, it will be difficult to mimic human visual system and perception.
Hence in sum, both the papers discuss some basic question in psychology and vision – Why recognition of certain objects, is difficult? How humans perceive visual information? But unless, we have a system with other capabilities of human brain, it will be difficult to make use of these theories to full extent.

No comments: