Rate of Convergence of Image Classifiers Based on Convolutional Neural Networks

Benjamin Walter

In this thesis, the rate of convergence of image classifiers based on convolutional neural networks is investigated. It is shown that classifiers defined by least squares estimators as plug-in classifiers achieve a rate of convergence for the difference of the misclassification risk of the estimate towards the optimal misclassification risk which does not depend on the input dimension and therefore circumvent the curse of dimensionality. This analysis provides a theoretical explanation for the usefulness of convolutional neural network components in image classification, provides theoretical guidance for an appropriate choice of network parameters, and provides theoretical indication for the advantage of these architectures over other classification methods.

In previous work, it has been shown in the context of regression estimation that neural network estimators achieve a rate of convergence which does not depend on the input dimension under compositional assumptions on the regression function. However, these results have not yet provided a theoretical justification for the superiority of convolutional neural networks compared to other network architectures in image classification applications. To enable this, the above approach is applied to image classification by formulating structural and smoothness assumptions on the a-posteriori probability. In this way, three statistical models for image classification are introduced, in which the convergence behavior of suitable classifiers is investigated.

The first model includes the following basic observations about image classification: First, the class of an image depends on the existence of specific objects that are possibly much smaller than the entire image area, and second, subparts of an image can be hierarchically composed of neighboring smaller subparts. The second model is extended by the aspect that only approximate relative distances between features of objects are important. The network architectures of convolutional neural networks introduced for the second model include, in particular, local pooling layers. For the third model, a more general framework is introduced in which images are considered as random variables with values in a functional space, where the observed sample consists of discretizations of such random variables. A model for the functional a-posteriori probability is introduced, which includes classification problems in which the rotation of objects through arbitrary angles is irrelevant concerning a correct classification. For this model, a convergence rate which is independent of the input dimension is achieved if a resolution-dependent error term is neglected.

To verify the corresponding results, approximation properties for convolutional neural networks are derived and the complexity of the classes of these network architectures is bounded.

Finally, the finite sample size behavior of the introduced image classifiers is analyzed. For this purpose, the classifiers are applied to both simulated and real images and the results are compared to alternative classification methods.

https://tuprints.ulb.tu-darmstadt.de/24333