Title:

Statistical analysis and machine learning algorithms for RF breast cancer screening

Details:

McGill University
Department of Electrical Engineering
Dr. Milicia Popovich

Abstract:

The work of this thesis explores statistical and machine learning methods for anomaly detection in a novel low-power microwave breast cancer screening system. Reported dielectric contrast in the microwave frequency range between healthy and malign breast tissue is the main motivator behind the effort to design a time-domain radar-based prototype for safe breast screening. The microwave radar does not strive to yield a three-dimensional image of the breast interior. Instead, its aimed use would be for frequent monthly screenings which have the potential to detect a departure from the normal, hence increasing the chance of early detection and, in turn, successful treatment. The data used for the development of the algorithms was obtained either in controlled laboratory experiments on tissue-mimicking phantoms or in a clinical setting. Since the data is preliminary and scarce, the conclusions may be limited, but in the process of the algorithmic development, this work strives to take into account the nature of the signals and how they have been generated in this very new application. The following methods were adapted and applied to the data sets: simple statistical analysis to illustrate the differences in the data sets investigated in this work; discrete Fourier transform, short-time Fourier transform, empirical mode decomposition and ad hoc time domain analysis to derive effective feature extraction strategies for the radio-frequency radar scans; high-dimensional statisti- cal hypothesis tests to investigate the characteristics of time-frequency features ex- tracted; random search, random walk, simulated annealing, genetic algorithm and particle swarm derivative-free optimization algorithms to improve the computational efficiency of an ensemble cost-sensitive support vector machine classifier based on previous literature; and a forward step-wise ensemble selection algorithm to improve the predictive performance of the classifier. For each of the methods, the results were discussed in the light of the limitations of the collected data sets. Older data sets were found to have high signal amplitudes on average. Statistically significant differences between features extracted from scans with anomalies and scans without anomalies were only observed for scans of subjects with higher average permittivity. The time-frequency analysis features yielded superior predictive performance than feature extraction using dimensionality reduction by principal component analysis. The computational efficiency of the classifier was improved by a factor of at least 3.8 when optimization algorithms were used for hyperparameter selection, instead of an exhasutive grid search. With the data available, the forward step-wise selection algorithm did not improve the predictive performance as was anticipated.

Full Document:

Download full thesis PDF