Machine learning for drug discovery
Introduction: All
stages of drug discovery and development, including clinical trials, have
embarked on developing and utilizing ML algorithms and software to
identify novel targets, provide stronger evidence for target–disease
associations, improve small-molecule compound design and optimization, increase
understanding of disease mechanisms, increase understanding of disease and
non-disease phenotypes, develop new biomarkers for prognosis, progression
and drug efficacy, improve analysis of biometric and other data from patient
monitoring and wearable devices, enhance digital pathology imaging and
extract high-content information from images at all levels of resolution.
Model: The aim of a good ML model is to
generalize well from the training data to the test data at hand. Generalization
refers to how well the concepts learned by the model apply to data not seen by
the model during training. Within each technique, several methods exist ,
which vary in their prediction accuracy, training speed and the number of
variables they can handle. Algorithms must be chosen carefully to ensure that
they are suitable for the problem at hand and the amount and type of data
available. The amount of parameter tuning needed and how well the method
separates signal from noise are also important considerations.
DL is a modern
reincarnation of artificial neural networks from the 1980s and 1990s and uses
sophisticated, multi-level deep neural networks (DNNs) to create systems that
can perform feature detection from massive amounts of unlabelled or labelled
training data.
Applications:The first step in target identification is establishing a
causal association between the target and the disease. Establishing causality requires
demonstration that modulation of a target affects disease from either naturally
occurring (genetic) variation or carefully designed experimental intervention.
However, ML can be used to analyse large data sets with information on the
function of a putative target to make predictions about potential causality,
driven, for instance, by the properties of known true targets. ML methods have
been applied in this way across several aspects of the target identification
field. Costa et al.17 built a decision tree-based meta-classifier trained
on network topology of protein–protein, metabolic and transcriptional
interactions, as well as tissue expression and subcellular localization, to
predict genes associated with morbidity that are also druggable.
Li et
al. conducted a case study using standard-of-care drugs in which they
first built models for drug sensitivity to erlotinib and sorafenib. When they
applied the models to stratify patients, they demonstrated that the
models were predictive and drug-specific. The model-derived biomarker
genes were shown to be reflective of the mechanism of action of each drug, and
when combined with globally normalized public domain data from various cancer
types, the model predicted sensitivities of cancer types to each drug that were
consistent with their FDA-approved indications. Knowing this is useful because
it can be used to fight cancer cells that seem unaffected by certain drugs.
Machine learning for mining DNA
sequence data
Introduction: According to statistics, the amount of biological data
approximately doubles every 18 months. In 1982, GenBank’s first nucleic acid
sequence database had only 606 sequences, containing 680,000 nucleotide bases (Bilofsky
et al., 1986). As of February 2013, its database
already contains 162 million biological sequence data, containing 150 billion
nucleotide bases. How to mine knowledge from these huge data and guide
biological research s an important research content of bioinformatics.
The method:
1. Data cleaning. Because of the increasing amount of
heterogeneous data, data sets often have
missing data and inconsistent data. Low data quality will have a serious
negative impact on the
information extraction process. Therefore, deleting incomplete, or
inconsistent data is the first step in data
mining;
2. Data integration. If the source of the data to be
studied is different, it must be aggregated consistently;
3. Data selection. Accurately select relevant data based on
the research content;
4. Data conversion. Transform or merge data into a form
suitable for mining, and integrate new attributes or functions useful for the
data mining process;
5. Data mining. Select the appropriate model according to
the problem and make subsequent improvements;
6. Mode evaluation. After acquiring knowledge from the
data, select appropriate indicators to evaluate the model.
Challenges:
1. Large data sets are the key to machine learning. At
present, the magnitude of most biological data sets is still too small to meet
the requirements of machine learning algorithms. Although the total amount of
biological data is huge and increasing day by day, the collection of data comes
from different platforms. Due to the differences in technology and biology
itself, it is very difficult to integrate different data sets;
2. Due to the differences in biological data itself,
machine learning models trained on one data set may not be well generalized to
other data sets. If the new data is significantly different from the training
data, the analysis results of the machine learning model are likely to be
false;
3. The black-box nature of machine learning models brings
new challenges to biological applications. It is usually very difficult to
interpret the output of a given model from a biological point of view, which
limits the application of the model.
The BATTLE Trial: Personalizing Therapy for Lung Cancer -Edward
S. Kim; Roy S. Herbst; Ignacio I. Wistuba; J. Jack Lee; George R. Blumenschein,
Jr.; Anne Tsao; David J. Stewart; Marshall E. Hicks; Jeremy Erasmus, Jr.; Sanjay
Gupta; Christine M. Alden; Suyu Liu; Ximing Tang; Fadlo R. Khuri; Hai T. Tran;
Bruce E. Johnson; John V. Heymach; Li Mao; Frank Fossella; Merrill S. Kies;
Vassiliki Papadimitrakopoulou; Suzanne E. Davis; Scott M. Lippman; Waun K. Hong
Review on the Application of Machine Learning Algorithms in
the Sequence Data Mining of DNAAimin Yang1, Wei Zhang, Jiahao Wang, Ke Yang,
Yang Han and Limin Zhang
Applications of machine learning in drug discovery and
development Jessica Vamathevan 1*, Dominic Clark, Paul Czodrowski , Ian Dunham,
Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela
Spitzer and Shanrong Zhao
Niciun comentariu:
Trimiteți un comentariu