AI and Biostatistics Glossary of Terms

A select glossary that may be helpful for attendees of the hackathon at the Cancer Biomarkers AI and Bioinformatics Workshop, 2024.

This document introduces many common terms that are used in computer science, statistics, and in connection with cancer biomarkers. Some of the terms have emerged and/or evolved somewhat differently in some of these fields, and our attempt here has been to bring out the nuances of these overloaded terms. That way cross-disciplinary collaborators can be watch out for these alternate meanings.

Algorithm

Artificial Intelligence (AI)

Bias

Classifier

Data Mining

Feature

Machine Learning

Model

Normalization

Sensitivity and Specificity

Validation

Here are some additional terms specifically from computer science:

  1. Generative AI: A subset of artificial intelligence that focuses on creating new content, such as text, images, music, and more, by learning patterns from existing data. Generative AI models are trained on large datasets and use this training to generate outputs that are similar to the data they were trained on, but not identical.
  2. Deep Learning: A subset of machine learning that uses neural networks with many layers (deep neural networks) to analyze various factors of data. It is particularly effective for image and speech recognition.
  3. Neural Network: A series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
  4. Large Language Models (LLMs): Large Language Models are a type of AI model designed to understand and generate human language. They are built using transformer architectures and are trained on vast amounts of text data, allowing them to generate coherent and contextually relevant text based on the input they receive. LLMs are capable of a wide range of natural language processing tasks, including text generation, translation, summarization, and question answering.
  5. Natural Language Processing (NLP): A field of AI that focuses on the interaction between computers and humans through natural language. It involves enabling computers to understand, interpret, and generate human language.
  6. Supervised Learning: A type of machine learning where the model is trained on labeled data, which means the input comes with the correct output.
  7. Unsupervised Learning: A type of machine learning where the model is trained on unlabeled data and must find patterns and relationships in the data on its own.
  8. Reinforcement Learning: A type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.
  9. Big Data: Large and complex datasets that traditional data processing software cannot handle. These datasets are used to train AI models.
  10. Computer Vision: A field of AI that trains computers to interpret and understand the visual world. It uses images from cameras and videos and deep learning models to accurately identify and classify objects.
  11. Robotics: A branch of engineering that involves the conception, design, manufacture, and operation of robots. AI in robotics enables robots to perform tasks autonomously.
  12. Cognitive Computing: A term used to describe AI systems that aim to simulate human thought processes in a computerized model. These systems use self-learning algorithms that use data mining, pattern recognition, and natural language processing.
  13. Turing Test: A test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Proposed by Alan Turing in 1950.
  14. Ethics in AI: The field of study that examines the moral implications and responsibilities of creating and using AI technologies. It includes issues like privacy, security, and the impact on employment.
  15. Fuzzy Logic: A form of logic used in AI that allows for reasoning about imprecise or uncertain information, similar to how humans make decisions.
  16. Predictive Analytics: Techniques that use historical data to predict future outcomes. It involves statistical algorithms and machine learning techniques.
  17. Chatbot: An AI program designed to simulate conversation with human users, especially over the internet.