This is a curated list of medical data for machine learning. This list is provided for informational purposes only, please make sure
you respect any and all usage restrictions for any of the data listed
here.
1. Medical Imaging Data
The National Library of Medicine presents MedPix® Database of 53,000 medical images from 13,000 patients with annotations. Requires registration. Information: https://medpix.nlm.nih.gov/home
ABIDE: The Autism Brain Imaging Data Exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Function MRI images for 539 individuals suffering from ASD and 573
typical controls. These 1112 datasets are composed of structural and
resting state functional MRI data along with an extensive array of
phenotypic information. Requires registration. Paper: http://www.ncbi.nlm.nih.gov/pubmed/23774715 Information: http://fcon_1000.projects.nitrc.org/indi/abide/ Preprocessed version: http://preprocessed-connectomes-project.org/abide/
AMRG Cardiac AtlasThe AMRG Cardiac MRI Atlas is a complete labelled MRI image set of a
normal patient's heart acquired with the Auckland MRI Research Group 's
Siemens Avanto scanner. The atlas aims to provide university and school
students, MR technologists, clinicians...
Congenital Heart Disease (CHD) AtlasThe Congenital Heart Disease (CHD) Atlas represents MRI data sets,
physiologic clinical data and computer models from adults and children
with various congenital heart defects. The data have been acquired from
several clinical centers including Rady...
DETERMINEDefibrillators to Reduce Risk by Magnetic Resonance Imaging Evaluation,
is a prospective, multicenter, randomized clinical trials in patients
with coronary artery diseases and mild-to-moderate left ventricular
dysfunction. The primary objective...
MESAMulti-Ethnic Study of Atherosclerosis, is a large-scale cardiovascular
population study (>6,500 participants) conducted in six centres in
the USA. It aims to investigate the manifestation of subclinical to
clinical cardiovascular disease before...
OASISThe Open Access Series of Imaging Studies (OASIS) is a project aimed at
making MRI data sets of the brain freely available to the scientific
community. Two datasets are available: a cross-sectional and a
longitudinal set.
Cross-sectional MRI Data in Young, Middle Aged, Nondemented and
Demented Older Adults: This set consists of a cross-sectional collection
of 416 subjects aged 18 to 96. For each subject, 3 or 4 individual
T1-weighted MRI scans obtained in single scan sessions are included.
The subjects are all right-handed and include both men and women. 100
of the included subjects over the age of 60 have been clinically
diagnosed with very mild to moderate Alzheimer’s disease (AD).
Additionally, a reliability data set is included containing 20
nondemented subjects imaged on a subsequent visit within 90 days of
their initial session.
Longitudinal MRI Data in Nondemented and Demented Older Adults: This
set consists of a longitudinal collection of 150 subjects aged 60 to
96. Each subject was scanned on two or more visits, separated by at
least one year for a total of 373 imaging sessions. For each subject, 3
or 4 individual T1-weighted MRI scans obtained in single scan sessions
are included. The subjects are all right-handed and include both men and
women. 72 of the subjects were characterized as nondemented throughout
the study. 64 of the included subjects were characterized as demented at
the time of their initial visits and remained so for subsequent scans,
including 51 individuals with mild to moderate Alzheimer’s disease.
Another 14 subjects were characterized as nondemented at the time of
their initial visit and were subsequently characterized as demented at a
later visit.
SCMR Consensus DataThe SCMR Consensus Dataset is a set of 15 cardiac MRI studies of mixed
pathologies (5 healthy, 6 myocardial infarction, 2 heart failure and 2
hypertrophy), which were acquired from different MR machines (4 GE, 5
Siemens, 6 Philips). The main objectives...
Sunnybrook Cardiac DataThe Sunnybrook Cardiac Data (SCD) , also known as the 2009 Cardiac MR
Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI
images from a mixed of patients and pathologies: healthy , hypertrophy ,
heart failure with infarction and heart...
Preliminary clinical studies have shown that spiral CT scanning of
the lungs can improve early detection of lung cancer in high-risk
individuals. Image processing algorithms have the potential to assist in
lesion detection on spiral CT studies, and to assess the stability or
change in lesion size on serial CT studies. The use of such
computer-assisted algorithms could significantly enhance the sensitivity
and specificity of spiral CT lung screening, as well as lower costs by
reducing physician time needed for interpretation.
The intent of the Lung Imaging Database Consortium (LIDC) initiative
was to support a consortium of institutions to develop consensus
guidelines for a spiral CT lung image resource and to construct a
database of spiral CT lung images. The investigators funded under this
initiative created a set of guidelines and metrics for database use and
for developing a database as a test-bed and showcase for those methods.
The database is available to researchers and users through the Internet
and has wide utility as a research, teaching, and training resource.
Specifically, the LIDC initiative aims were to provide:
a reference database for the relative evaluation of image processing or CAD algorithms and
a flexible query system that will provide investigators the
opportunity to evaluate a wide range of technical parameters and
de-identified clinical information within this database that may be
important for research applications.
This resource will stimulate further database development for image
processing and CAD evaluation for applications that include cancer
screening, diagnosis, and image guided intervention, and treatment.
Therefore, the NCI encourages investigator-initiated grant applications
that utilize the database in their research. NCI also encourages
investigator-initiated grant applications that provide tools or
methodology that may improve or complement the mission of the LIDC.
Cancer imaging data sets across various cancer types (e.g. carcinoma,
lung cancer, myeloma) and various imaging modalities.
The image data in The Cancer Imaging Archive (TCIA) is organized into
purpose-built collections of subjects. The subjects typically have a
cancer type and/or anatomical site (lung, brain, etc.) in common. Each
link in the table below contains information concerning the scientific
value of a collection, information about how to obtain any supporting
non-image data which may be available, and links to view or download the
imaging data. To support reproducibility in scientific research, TCIA
supports Digital Object Identifiers (DOIs) which allow users to share
subsets of TCIA data referenced in a research manuscript.
Tuberculosis (TB) is a major problem of Belarus Public Health
.Recently situation has been complicated with emergence and development
of MDR/XDR TB and HIV/TB which require long-term treatment. Many and the
most severe cases usually disseminate across the country to different
TB dispensaries. The ability of leading Belarus TB specialists to follow
such patients will be greatly improved by using a common database
containing patients’ radiological images, lab work and clinical data.
This will also significantly improve adherence to the treatment protocol
and result in a better record of the treatment outcomes.
Criteria for inclusion clinical cases in the database of the portal -
patients admitted to the MDR-TB department of RSPC of Pulmonology and
Tuberculosis with diagnosed or suspected of MDR-TB, which conducted CT –
study (± 2 months from the date of registration)
Belarus dataset have both chest X-rays and CT scans of the same patient.