Using a CNN to Predict the Presence of Lung Cancer

  • Final project for EECS349: Machine Learning

    Exploring the parameters of convolutional neural networks to create an accurate image classifier.

    Final GitHub Repo: EECS349_Project

    Contributors: Adam Pollack, Chainatee Tanakulrungson, Nate Kaiser

    Summary

    The objective of this project was to predict the presence of lung cancer given a 40×40 pixel image snippet extracted from the LUNA2016 medical image database. This problem is unique and exciting in that it has impactful and direct implications for the future of healthcare, machine learning applications affecting personal decisions, and computer vision in general. The medical field is a likely place for machine learning to thrive, as medical regulations continue to allow increased sharing of anonymized data for the sake of better care. Not only that, but the field is still new enough that our project implements methods at the forefront of technology.

    Approach

    Due to the complex nature of our task, most machine learning algorithms are not well-posed for this project. There are currently two prominent approaches for machine learning image data: either extract features using conventional computer vision techniques and learn the feature sets, or apply convolution directly using a CNN. In the past few years, however, CNNs have far outpaced traditional computer vision methods for difficult, enigmatic tasks such as cancer detection. We decided to implement a CNN in TensorFlow, Google’s machine learning framework.

    Cancerous Images

    Figure 1: Examples of cancerous images

    Non-cancerous Images

    Figure 2: Examples of non-cancerous images

    Because we collectively had limited experience with convolutional neural networks, we decided to first explore the hyperparameters of a CNN. We did so by creating an experiment in which we varied the kernel size and number of filters of each convolutional layer and the dropout rate for a total of 108 models. For this study, we kept a constant network architecture.

    Hyperparameter Permutations
    Attribute Values Tested
    Kernel Size (Convolutional Layer 1) 3x3, 5x5, 7x7
    Kernel Size (Convolutional Layer 2) 3x3, 5x5, 7x7
    Number of Filters (Convolutional Layer 1) 16, 32
    Number of Filters (Convolutional Layer 2) 32, 64
    Dropout Rate 0.1, 0.2, 0.3

    Each model was trained on 2,064 images (batch size of 104), validation was run every 10 epochs on another 442 images, and a final test was run after 500 epochs on another 442 images.

    Results

    After determining the best set of hyperparameters based on average peak validation accuracy, we then tested six new architectures based on these hyperparameters. The structure of each of these architectures was decided based on the principles described in the Stanford CS231n course notes[1]. After running the final six architectures at 500 epochs, we found the inflection point of the loss to be around 250 epochs. We then ran each of the six architectures for 250 epochs and recorded the final test accuracy. The best network architecture of these six achieved a test accuracy of 96.38%.

    Final Network Architecture
    Input → [Conv Layer 1 → ReLU] → Max Pool Layer 1 → [Conv Layer 2 → ReLU] → Max Pool Layer 2 → [Conv Layer 3 → ReLU] → Max Pool Layer 3 → [Fully-Connected Layer 1 → Dropout] → Fully-Connected Layer 2 → Output Classes [0 or 1]
    Final Model Accuracy Graph

    Figure 3: Tensorboard Graph of Accuracy for Final Model at 500 epochs (Orange Line = Training Dataset, Blue Line = Validation Dataset)

    Final Model Accuracy Graph

    Figure 4: Tensorboard Graph of Loss for Final Model at 500 epochs (Orange Line = Training Dataset, Blue Line = Validation Dataset)

    Conclusion

    After finding our best model, we ran further analysis to extract a confusion matrix and misclassified images of the final test results to determine why this number was not closer to 100%.

    Confusion Matrix
    Predicted Positive Predicted Negative
    Actual Positive 226 12
    Actual Negative 4 200
    Misclassified Images

    Figure 5: Examples of misclassified images from the test dataset

    Our model classified more examples as negative when they should have been positive than vice versa. We believe this is because of the nature of some of the positive examples. For example, the first four misclassified images above are all positive examples of cancer even though two of them have almost no distinct features. It is likely that it would be just as difficult for a human to classify those images as a doctor. We also can’t guarantee that the data we used is completely correctly classified; it is possible there are some mislabeled images.

    Future Work

    We plan to test our model on entire scans of a lung by extracting 40x40 images from each image slice of the lung. Sliding a window with a stride of around 20 would give us a large set of images to test for cancer but with a pre-trained model, this would be relatively easy to do. Our hope is that this method would allow us to determine whether or not cancer is present in an entire lung instead of a predetermined section. We would also like to try implementing one or more named convolutional neural networks such as AlexNet[2] or Inception[3].

    Final Report

    If you are intersted in learning more about the details of this project, please read our report.

    References

    [1] Stanford Course Notes on CNNs: http://cs231n.github.io/convolutional-networks/
    [2] AlexNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
    [3] Inception (by Google): https://arxiv.org/abs/1409.4842