References:
[1] O. Boiman, E. Shechtman and M. Irani. In Defense of Nearest-Neighbor Based Image Classification.
When it comes to handling the data from vision problems, the first problem faced by the most machine learning techniques is the high-dimensionality of feature space. Most of the effort goes in reducing the high dimension to a dimension where it is computationally feasible and tractable. Such dimensionality reduction is essential for many learning based classifiers but it affects the discriminative power and degrades the accuracy of classification. As mentioned in paper by Boiman et al. [1], dimensionality reduction is mostly harmful in the case of non-parametric classification because there is no training phase to compensate for this loss of information. They further explain that quantization of long-tail descriptors affects the NN based classification.
Coming up with good distance measure is sometime difficult. In the paper by Boiman et. al. they argue that while image-to-image distance is central to the kernel based methods it is not good for non-parametric classifiers like NN. This limitation is much severe for classes with large diversity. Also when the number of classes to be classified is huge (ex - image classification), then also, many learning algorithms which were initially designed as binary classifier (For example – Support Vector Machines) needs to be extended or applied multiple times to get the multiclass classification results. This becomes infeasible when the number of classes is huge.
In paper by viola and Jones [2] they address the problem of detecting face in an image at a very fast rate (15 frames per second). Hence an extra effort to make the system work in real time is mostly required. They came up with a novel representation of an image called “Integral Image” using which feature computation and evaluation is very fast. Frequently an additional amount work in terms of representing the data or handling the scale of data is required before a suitable machine learning technique can be applied. Features used for training should be robust to rotation, translation and scale, which is very common in vision problems. They further use a combination of weak classifiers (ADABOOST), to decide the important feature among the large number of features available. This makes me think about another issue that most learning based classifiers which give equal weight to the features lacks the capability in itself to select the best features to increase its computational time/performance. An explicit cascading is required to get the best features available.
Another problem which is not exactly attributed to application in vision but inherent to machine learning techniques is over-fitting of the data that can happen frequently. But due to high-dimension this problem is not easily tractable in vision problems. I mean several experiments are needed to obtain a good generalization.
In sum, I would say that although machine learning have greatly influenced the vision applications in terms of their power of automatic learning of parameters of model there is always some preprocessing of data which is required to make it suitable for the technique to be applied and need of some workaround to handle the above mentioned issues.

1 comment:
Good post. Would love to see a follow up on this one !
Post a Comment