Software Tools
For the hackathon the lowest order goal is to be able to run a simple, likely with not-good-enough results, Convolutional Neural Network (CNN), just to at least go through all the motions. This will demonstrate the "Findable" and "Accessible" parts of the FAIR principle. For this we will download a dataset (under 1 GB and consisting of dicom images with two labels), load it to Google drive, and build an run the CNN on Google Colab with a free TPU. Alternately the same code can be run on other dedicated resources that participants may have access to.
The more exciting part is to come up with problems to solve using the (much larger) datasets given that include dicom images, metadata, and even some datacubes. Given the short duration of the hackathon it is not expected, or likely, that any such problems will get solved during the hackathon. The hope is that this will be a start of brainstorming in defining the problem, associate specific datasets with them, and outline a path. Larger computing resources will also be needed to handle the workflows than are available at the hackathon.
Envisioned problems:
- Using a dataset for some task that the PIs had not originally considered (Reusable from FAIR)
- Combining two (or more) datasets (and their metadata) to approach a more complex problem (Interoperable from FAIR)
- Transfer learning of a solution from one dataset to another (Interoperable and Reusable)
- Zero-shot learning -- transfer learning with no additional training (Interoperable and Reusable)
To enable such possibilities, the three datasets on offer are all related to one organ (breast), and have associated control samples. The hope is to make similar well curtaed and documented datasets available for other organs as well.
The following tools will be useful:
- The Python programming language
- Google Colab, a notebook environment with limited free resources
- Tensorflow, a deep learning library
- PyTorch, another deep learning library
- CONDA, a Python environment (with R support)
- Jupyter, a notebook environment for Python (and R and Julia, ...)
- GitHub, tool for coding in teams
Additional/alternate tools:
- The R programming language
- DICOM tags