Computer Vision and Machine Learning: Shape Context Vs FORMS

References:

[1] Serge Belongie, Jitendra Malik and Jan Puzicha Shape Matching and Object Recognition Using Shape Contexts PAMI, 24(4):509-522, April 2002.

[2] Zhu and Yuille. FORMS: A Flexible Object Recognition and Modelling System.

D’arcy Thompson’s vision of modeling shapes of objects and comparing related forms using the precise language of mathematics can be seen more or less, in both the papers. While “Shape context” (SC) approach defines a shape as discrete set of points sampled from internal and external contours of the object, FORMS argues that shape of objects are hierarchical in nature, and can be modeled by three levels of granularity.

One of the first things to note is that we are interested in “comparison of related forms” and hence the above vision does not seem unrealistic, as even in mathematics we do compare similar family of curves etc. and hence there is an analogy. The only difference (and hence problem) is that while the curves etc (or existing families of basic shapes like circle, ellipse) have well defined equations and parameters which decides their properties, in nature the contours of objects have no well defined local structure. But the good thing is that globally shapes of related forms do look similar and hence can be transformed into each other.

Any model which relaxes this “rigidity” present due to “very well defined equations of exact shapes” and allow for small deformations locally will have better matching capability. FORMS does the same by first decomposing the objects into primitives and allowing for the deformations at that level (Bottom-up). But in my opinion SC will be more robust to variation across related forms or deformation for matching as it allows this flexibility at point level, which the most basic level possible. Also the number of descriptors is much higher for SC than FORMS and hence chances of working under occlusions are good. But the matching will be more time consuming in SC.

It initially appealed to me that SC will be more suitable for modeling shapes of leaf of plants and FORMS will be suitable for matching animate objects (animals etc), because in my opinion primitives considered for leafs may be very coarse and finer variations along the periphery may not be captured properly. Comparison of plant leafs (or in general shapes with minute variations) may be better captured by SC. The notion of primitives for animate objects makes more sense to me, and even if neglect the slight variations, matching still will be consistent.

Also cloths or external occlusion may be a problem to both the methods, but I would expect SC to perform better. It seems we need some different approach to model foldable parts on animate object and the model should capture the continuous trace of folding or aware of this phenomenon to incorporate while matching, because image may contain any state of that part. Images of foldable objects should have inner edges otherwise it will difficult to match. Both the methods are only applicable to objects which have well represented 2-D silhouettes. Also different views can generate very different 2-D silhouettes. So data for different views should be available.

In sum, both the approaches seem to work really well in some specific scenarios (Handwritten digit for SC, well defined 2-D silhouettes of animate objects for FORMS) but at the same time, may fail in different settings. Both methods have their own limitations and hence I would consider the problem of modeling shapes “perfectly” (as usually done in mathematical science but here in approximate sense) and matching to be still open.

Computer Vision and Machine Learning

Friday, February 5, 2010

Shape Context Vs FORMS

No comments:

Blog Archive

About Me