Friday, February 5, 2010

Conditional Independence Assumptions - How far we can take it ?

References:

[1] J. Yamato, J. Ohya, and K. Ishii, “Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model,” CVPR ’92, pages 379-385.

[2] Crandall, Felzenszwalb and Huttenlocher. Object Recognition by Combining Appearance and Geometry.


Paper by Crandall et. al. [2] proposes a model called k-fans for part-based object recognition. They discuss the trade-offs that arise due conditional independence assumption on the part of the objects. On one side, models have been used to capture the spatial dependencies of all pair of parts, but accurate detection and localization which rely on search heuristics become computationally intractable and on the other side we have models which assume no conditional dependence between parts and hence detection and localization is much easier. But while this model yields computationally tractable recognition and learning procedures, it is unable to accurately represent multi-part objects since it captures no relative spatial information. They resort to a model between the two extremes that can be defined by making certain conditional independence assumptions.
In the other paper [1] also, they use conditional independence based HMM model to do human action recognition.


But in my opinion even with this intermediary model, not all categories of object can be recognized. If the object parts have large degree of freedom with respect to each other then it will still be difficult to localize. They propose that reference objects with spatial priors can capture the geometric relationship while non-reference parts with conditional independence assumption will make the model more tractable for search. First problem I see is that learning such model will be difficult. The search space for maximum likelihood model can become large for high value of k. Also, finding the optimal value of k itself seems a difficult task to me, for some objects. They have shown results on the motorbike and airplane dataset where the object parts are fixed, there are no results for other category where parts are not fixed. For human action recognition the conditional independence assumptions between time sequential images has similar issues of learning and accurate localization. For actions in which this time sequential frames has less dependencies, this model will not work.

In general I think conditional independence assumptions and models based on them are appropriate for some kind of problem (speech recognition, online handwriting recognition ), but it depends on the nature of problem as to how much this conditional independence assumption will be valid and upto to what extent. They can work in some scenarios while fail completely in other. It should be evaluated properly before applying to any recognition problem. For example modeling spatial/temporal dependencies for object parts may not be correct for all object/action recognition task.

1 comment:

Jayant said...

Thanks a lot lucy :)