AI and Biostatistics Glossary of Terms

A select glossary that may be helpful for attendees of the hackathon at the Cancer Biomarkers AI and Bioinformatics Workshop, 2024.

This document introduces many common terms that are used in computer science, statistics, and in connection with cancer biomarkers. Some of the terms have emerged and/or evolved somewhat differently in some of these fields, and our attempt here has been to bring out the nuances of these overloaded terms. That way cross-disciplinary collaborators can be watch out for these alternate meanings.

Algorithm

CS: A set of rules or steps designed to perform a specific task or solve a specific problem.
Stats: Often refers to a step-by-step procedure for calculations in data analysis.

Artificial Intelligence (AI)

General: The simulation of human intelligence in machines that are programmed to think and learn like humans, and their application in performing tasks requiring human-like intelligence.

Bias

CS: Systematic error introduced by an algorithm that affects learning and outcomes in AI systems.
Stats: Deviation of the expected value of a statistical estimate from the true value.
Cancer Biomarker: In experimental design, bias can affect clinical trial outcomes, often related to sample selection or data collection methods.

Classifier

CS/Stats: An algorithm that categorizes data into one or more classes.
Cancer Biomarker: Sometimes used to refer to a test or a method that distinguishes between patients with different prognoses or treatment responses based on specific biomarkers.

Data Mining

CS: The process of discovering patterns and knowledge from large sets of data.
Stats: Often overlaps with statistical analysis, but focused more on the exploration of large data sets to find patterns or relationships.

Feature

CS: An individual measurable property or characteristic of a phenomenon being observed.
Stats: Known as a variable, attribute, or covariate, used interchangeably in data analysis contexts.

Machine Learning

CS: A subset of AI focused on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention.
Stats: Often considered a method in statistics for creating models from data.

Model

CS: In machine learning, a model represents what is learned by a computer algorithm.
Stats: A formal representation of the relationship between variables.
Cancer Biomarker: Can refer to a statistical model or a theoretical model used to predict disease progression.

Normalization

CS/Stats: A process of adjusting data to a standard or common scale.
Cancer Biomarker: In laboratory methods, it can refer to adjusting measurements to control for variability in samples.

Sensitivity and Specificity

General: Measures of the performance of a classification test, where sensitivity refers to the test’s ability to correctly identify positives, and specificity refers to its ability to correctly identify negatives.
Cancer Biomarker: Used to evaluate the efficacy of diagnostic tests or biomarkers in correctly identifying patients with or without a disease.

Validation

CS: The process of evaluating how well an AI or machine learning model performs on new, unseen data.
Stats: The process of confirming the reliability and accuracy of a model.
Cancer Biomarker: Often refers to the confirmation of the clinical relevance of a biomarker through additional studies.

Here are some additional terms specifically from computer science:

Generative AI: A subset of artificial intelligence that focuses on creating new content, such as text, images, music, and more, by learning patterns from existing data. Generative AI models are trained on large datasets and use this training to generate outputs that are similar to the data they were trained on, but not identical.
Deep Learning: A subset of machine learning that uses neural networks with many layers (deep neural networks) to analyze various factors of data. It is particularly effective for image and speech recognition.
Neural Network: A series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
Large Language Models (LLMs): Large Language Models are a type of AI model designed to understand and generate human language. They are built using transformer architectures and are trained on vast amounts of text data, allowing them to generate coherent and contextually relevant text based on the input they receive. LLMs are capable of a wide range of natural language processing tasks, including text generation, translation, summarization, and question answering.
Natural Language Processing (NLP): A field of AI that focuses on the interaction between computers and humans through natural language. It involves enabling computers to understand, interpret, and generate human language.
Supervised Learning: A type of machine learning where the model is trained on labeled data, which means the input comes with the correct output.
Unsupervised Learning: A type of machine learning where the model is trained on unlabeled data and must find patterns and relationships in the data on its own.
Reinforcement Learning: A type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties.
Big Data: Large and complex datasets that traditional data processing software cannot handle. These datasets are used to train AI models.
Computer Vision: A field of AI that trains computers to interpret and understand the visual world. It uses images from cameras and videos and deep learning models to accurately identify and classify objects.
Robotics: A branch of engineering that involves the conception, design, manufacture, and operation of robots. AI in robotics enables robots to perform tasks autonomously.
Cognitive Computing: A term used to describe AI systems that aim to simulate human thought processes in a computerized model. These systems use self-learning algorithms that use data mining, pattern recognition, and natural language processing.
Turing Test: A test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Proposed by Alan Turing in 1950.
Ethics in AI: The field of study that examines the moral implications and responsibilities of creating and using AI technologies. It includes issues like privacy, security, and the impact on employment.
Fuzzy Logic: A form of logic used in AI that allows for reasoning about imprecise or uncertain information, similar to how humans make decisions.
Predictive Analytics: Techniques that use historical data to predict future outcomes. It involves statistical algorithms and machine learning techniques.
Chatbot: An AI program designed to simulate conversation with human users, especially over the internet.