miercuri, 31 mai 2023

 

Machine learning for drug discovery

 

    Introduction: All stages of drug discovery and development, including clinical trials, have embarked on developing and utilizing ML algorithms and software  to identify novel targets, provide stronger evidence for target–disease associations, improve small-molecule compound design and optimization, increase understanding of disease mechanisms, increase understanding of disease and non-disease phenotypes, develop new biomarkers for prognosis, progression and drug efficacy, improve analysis of biometric and other data from patient monitoring and wearable devices, enhance digital pathology imaging and extract high-content information from images at all levels of resolution.

 

    Model: The aim of a good ML model is to generalize well from the training data to the test data at hand. Generalization refers to how well the concepts learned by the model apply to data not seen by the model during training. Within each technique, several methods exist , which vary in their prediction accuracy, training speed and the number of variables they can handle. Algorithms must be chosen carefully to ensure that they are suitable for the problem at hand and the amount and type of data available. The amount of parameter tuning needed and how well the method separates signal from noise are also important considerations.

DL is a modern reincarnation of artificial neural networks from the 1980s and 1990s and uses sophisticated, multi-level deep neural networks (DNNs) to create systems that can perform feature detection from massive amounts of unlabelled or labelled training data.

 

    Applications:The first step in target identification is establishing a causal association between the target and the disease. Establishing causality requires demonstration that modulation of a target affects disease from either naturally occurring (genetic) variation or carefully designed experimental intervention. However, ML can be used to analyse large data sets with information on the function of a putative target to make predictions about potential causality, driven, for instance, by the properties of known true targets. ML methods have been applied in this way across several aspects of the target identification field. Costa et al.17 built a decision tree-based meta-classifier trained on network topology of protein–protein, metabolic and transcriptional interactions, as well as tissue expression and subcellular localization, to predict genes associated with morbidity that are also druggable.

Li et al. conducted a case study using standard-of-care drugs in which they first built models for drug sensitivity to erlotinib and sorafenib. When they applied the models to stratify patients, they demonstrated that the models were predictive and drug-specific. The model-derived biomarker genes were shown to be reflective of the mechanism of action of each drug, and when combined with globally normalized public domain data from various cancer types, the model predicted sensitivities of cancer types to each drug that were consistent with their FDA-approved indications. Knowing this is useful because it can be used to fight cancer cells that seem unaffected by certain drugs.





 

Machine learning for mining DNA sequence data

 

    Introduction: According to statistics, the amount of biological data approximately doubles every 18 months. In 1982, GenBank’s first nucleic acid sequence database had only 606 sequences, containing 680,000 nucleotide bases (Bilofsky et al., 1986). As of February 2013, its database already contains 162 million biological sequence data, containing 150 billion nucleotide bases. How to mine knowledge from these huge data and guide biological research s an important research content of bioinformatics.

 

 

    The method: 

    1. Data cleaning. Because of the increasing amount of heterogeneous data, data sets often         have missing data and inconsistent data. Low data quality will have a serious negative            impact on the information extraction process. Therefore, deleting incomplete, or                        inconsistent data is the first step in data mining;

2. Data integration. If the source of the data to be studied is different, it must be aggregated consistently;

3. Data selection. Accurately select relevant data based on the research content;

4. Data conversion. Transform or merge data into a form suitable for mining, and integrate new attributes or functions useful for the data mining process;

5. Data mining. Select the appropriate model according to the problem and make subsequent improvements;

6. Mode evaluation. After acquiring knowledge from the data, select appropriate indicators to evaluate the model.

    Challenges:

1. Large data sets are the key to machine learning. At present, the magnitude of most biological data sets is still too small to meet the requirements of machine learning algorithms. Although the total amount of biological data is huge and increasing day by day, the collection of data comes from different platforms. Due to the differences in technology and biology itself, it is very difficult to integrate different data sets;

2. Due to the differences in biological data itself, machine learning models trained on one data set may not be well generalized to other data sets. If the new data is significantly different from the training data, the analysis results of the machine learning model are likely to be false;

3. The black-box nature of machine learning models brings new challenges to biological applications. It is usually very difficult to interpret the output of a given model from a biological point of view, which limits the application of the model.

 

 

The BATTLE Trial: Personalizing Therapy for Lung Cancer -Edward S. Kim; Roy S. Herbst; Ignacio I. Wistuba; J. Jack Lee; George R. Blumenschein, Jr.; Anne Tsao; David J. Stewart; Marshall E. Hicks; Jeremy Erasmus, Jr.; Sanjay Gupta; Christine M. Alden; Suyu Liu; Ximing Tang; Fadlo R. Khuri; Hai T. Tran; Bruce E. Johnson; John V. Heymach; Li Mao; Frank Fossella; Merrill S. Kies; Vassiliki Papadimitrakopoulou; Suzanne E. Davis; Scott M. Lippman; Waun K. Hong

 

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNAAimin Yang1, Wei Zhang, Jiahao Wang, Ke Yang, Yang Han and Limin Zhang

 

Applications of machine learning in drug discovery and development Jessica Vamathevan 1*, Dominic Clark, Paul Czodrowski , Ian Dunham, Edgardo Ferran, George Lee, Bin Li, Anant Madabhushi, Parantu Shah, Michaela Spitzer and Shanrong Zhao

Niciun comentariu:

Trimiteți un comentariu

MNIST Digit Classification

  MNIST MNIST este un set de date clasic în domeniul recunoașterii de imagini, utilizat  pentru a antrena și evalua algoritmi de învățare au...