Description

Book Synopsis

A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology

This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.

  • Surveys the development of techniques and approaches on pattern recognition in biomolecular data
  • Discusses pattern recognit

    Table of Contents

    LIST OF CONTRIBUTORS xxi

    PREFACE xxvii

    I PATTERN RECOGNITION IN SEQUENCES 1

    1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
    Giuseppe Lancia

    1.1 Introduction / 3

    1.2 Single Individual Haplotyping / 5

    1.2.1 The Minimum Error Correction Model / 8

    1.2.2 Probabilistic Approaches and Alternative Models / 10

    1.3 Population Haplotyping / 12

    1.3.1 Clark’s Rule / 14

    1.3.2 Pure Parsimony / 15

    1.3.3 Perfect Phylogeny / 19

    1.3.4 Disease Association / 21

    1.3.5 Other Models / 22

    References / 23

    2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
    Sima Behpour and Bhaskar DasGupta

    2.1 Introduction / 28

    2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32

    2.2.1 Average Length of Optimal Barcodes / 33

    2.3 Entropy-Based Information Content Technique for Designing

    Approximation Algorithms for String Barcoding Problems / 34

    2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36

    2.4.1 Reductions from Set Covering Problem / 36

    2.4.2 Reduction from Graph-Coloring Problem / 38

    2.5 Heuristic Algorithms for String Barcoding Problems / 39

    2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39

    2.5.2 Balanced Partitioning Approach / 40

    2.6 Conclusion / 40

    Acknowledgments / 41

    References / 41

    3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
    Matteo Comin and Davide Verzotto

    3.1 Introduction / 43

    3.2 Whole-Genome Sequence Analysis / 44

    3.2.1 Background on Whole-Genome Comparison / 44

    3.2.2 Alignment-Free Methods / 45

    3.2.3 Average Common Subword / 46

    3.2.4 Kullback–Leibler Information Divergence / 47

    3.3 Underlying Approach / 47

    3.3.1 Irredundant Common Subwords / 48

    3.3.2 Underlying Subwords / 49

    3.3.3 Efficient Computation of Underlying Subwords / 50

    3.3.4 Extension to Inversions and Complements / 53

    3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53

    3.4 Experimental Results / 54

    3.4.1 Genome Data sets and Reference Taxonomies / 54

    3.4.2 Whole-Genome Phylogeny Reconstruction / 56

    3.5 Conclusion / 61

    Author’s Contributions / 62

    Acknowledgments / 62

    References / 62

    4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
    Chengpeng Bi

    4.1 Introduction / 65

    4.2 Multiple Sequence Local Alignment / 67

    4.2.1 Overall Objective Function / 67

    4.2.2 Maximum Likelihood Model / 68

    4.3 Motif Finding Algorithms / 70

    4.3.1 DEM Motif Algorithm / 70

    4.3.2 WEM Motif Finding Algorithm / 70

    4.3.3 Metropolis Motif Finding Algorithm / 72

    4.3.4 Gibbs Motif Finding Algorithm / 73

    4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74

    4.4 Time Complexity / 75

    4.5 Case Studies / 75

    4.5.1 Performance Evaluation / 76

    4.5.2 CRP Binding Sites / 76

    4.5.3 Multiple Motifs in Helix–Turn–Helix Protein Structure / 78

    4.6 Conclusion / 80

    References / 81

    5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
    Carl Barton, Tomáš Flouri, Costas S. Iliopoulos, and Solon P. Pissis

    5.1 Introduction / 83

    5.2 Definitions and Notation / 85

    5.3 Problem Definition / 87

    5.4 Algorithms / 88

    5.5 Conclusion / 94

    References / 95

    II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97

    6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
    Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang

    6.1 Introduction / 99

    6.2 Representative Protein Secondary Structure Prediction Methods / 102

    6.2.1 Chou–Fasman / 103

    6.2.2 GOR / 104

    6.2.3 PHD / 104

    6.2.4 PSIPRED / 104

    6.2.5 SPINE-X / 105

    6.2.6 PSSpred / 105

    6.2.7 Meta Methods / 105

    6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106

    6.3.1 Measures / 106

    6.3.2 Benchmark / 106

    6.3.3 Performances / 107

    6.4 Conclusion / 110

    Acknowledgments / 110

    References / 111

    7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
    Yann Guermeur and Fabien Lauer

    7.1 Introduction / 114

    7.2 Biological Sequence Segmentation / 115

    7.3 MSVMpred / 117

    7.3.1 Base Classifiers / 117

    7.3.2 Ensemble Methods / 118

    7.3.3 Convex Combination / 119

    7.4 Postprocessing with A Generative Model / 119

    7.5 Dedication to Protein Secondary Structure Prediction / 120

    7.5.1 Biological Problem / 121

    7.5.2 MSVMpred2 / 121

    7.5.3 Hidden Semi-Markov Model / 122

    7.5.4 Experimental Results / 122

    7.6 Conclusions and Ongoing Research / 125

    Acknowledgments / 126

    References / 126

    8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
    Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih

    8.1 Introduction / 129

    8.2 A Few Basic Concepts / 130

    8.2.1 Hierarchy of Protein Structures / 130

    8.2.2 Secondary Structure Elements / 131

    8.2.3 Structural Motifs / 132

    8.2.4 Available Sources for Protein Data / 134

    8.3 State of the Art / 135

    8.3.1 Protein Structure Motif Search / 135

    8.3.2 Promotif / 136

    8.3.3 Secondary-Structure Matching / 137

    8.3.4 Multiple Structural Alignment by Secondary Structures / 138

    8.4 A Novel Geometrical Approach to Motif Retrieval / 138

    8.4.1 Secondary Structures Cooccurrences / 138

    8.4.2 Cross Motif Search / 143

    8.4.3 Complete Cross Motif Search / 146

    8.5 Implementation Notes / 149

    8.5.1 Optimizations / 149

    8.5.2 Parallel Approaches / 150

    8.6 Conclusions and Future Work / 151

    Acknowledgment / 152

    References / 152

    9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155
    Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang

    9.1 Introduction / 155

    9.2 Background / 156

    9.2.1 Noncoding RNAs and Their Secondary Structures / 156

    9.2.2 Pseudoknotted ncRNA Search Tools / 157

    9.3 Methodology / 157

    9.4 Results and Interpretation / 161

    9.5 Conclusion / 162

    References / 163

    III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165

    10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167
    Wajdi Dhifli and Engelbert Mephu Nguifo

    10.1 Introduction / 167

    10.2 From Protein 3D-Structures to Protein Graphs / 169

    10.2.1 Parsing Protein 3D-Structures into Graphs / 169

    10.3 Graph Mining / 172

    10.4 Subgraph Mining / 173

    10.5 Frequent Subgraph Discovery / 173

    10.5.1 Problem Definition / 174

    10.5.2 Candidates Generation / 176

    10.5.3 Frequent Subgraph Discovery Approaches / 177

    10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178

    10.6 Feature Selection / 179

    10.6.1 Relevance of a Feature / 179

    10.7 Feature Selection for Subgraphs / 180

    10.7.1 Problem Statement / 180

    10.7.2 Mining Top-k Subgraphs / 180

    10.7.3 Clustering-Based Subgraph Selection / 181

    10.7.4 Sampling-Based Approaches / 181

    10.7.5 Approximate Subgraph Mining / 181

    10.7.6 Discriminative Subgraph Selection / 182

    10.7.7 Other Significant Subgraph Selection Approaches / 182

    10.8 Discussion / 183

    10.9 Conclusion / 185

    Acknowledgments / 185

    References / 186

    11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190
    Chinua Umoja, Xiaxia Yu, and Robert Harrison

    11.1 Introduction / 190

    11.2 Genetic Algorithms / 192

    11.2.1 GA Model Selection in Protein Structure Prediction / 196

    11.2.2 Common Methodology / 198

    11.3 Supervised Machine Learning Algorithm / 201

    11.3.1 Artificial Neural Networks / 201

    11.3.2 ANNs in Protein Structure Prediction / 202

    11.3.3 Support Vector Machines / 203

    11.4 Fuzzy Application / 204

    11.4.1 Fuzzy Logic / 204

    11.4.2 Fuzzy SVMs / 204

    11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205

    11.4.4 Fuzzy Decision Trees / 206

    11.5 Conclusion / 207

    References / 208

    12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
    Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat

    12.1 Introduction / 212

    12.2 Protein Structure Overview / 213

    12.3 Technical Challenges and Open Issues / 214

    12.4 Prediction Assessment / 215

    12.5 Current Approaches / 216

    12.5.1 DomCut / 216

    12.5.2 Scooby-Domain / 217

    12.5.3 FIEFDom / 218

    12.5.4 Chatterjee et al. (2009) / 219

    12.5.5 Drop / 219

    12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220

    12.6.1 Multi-Domain Benchmark Data Set / 220

    12.6.2 Compact Domain Profile / 221

    12.6.3 The Enhanced Semi-Parametric Model / 222

    12.6.4 Training, Testing, and Validation / 225

    12.6.5 Experimental Results / 226

    12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227

    12.7.1 Compositional Index / 228

    12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229

    12.7.3 Experimental Results / 230

    12.8 Conclusion / 232

    References / 233

    13 PREDICTION OF PROLINE CIS–TRANS ISOMERIZATION 236
    Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya

    13.1 Introduction / 236

    13.2 Methods / 238

    13.2.1 Evolutionary Data Set Construction / 238

    13.2.2 Protein Secondary Structure Information / 239

    13.2.3 Method I: Intelligent Voting / 239

    13.2.4 Method II: Randomized Meta-Learning / 241

    13.2.5 Model Validation and Testing / 242

    13.2.6 Parameter Tuning / 242

    13.3 Model Evaluation and Analysis / 243

    13.4 Conclusion / 245

    References / 245

    IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249

    14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
    Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou

    14.1 Introduction / 251

    14.2 Protein Structure Prediction / 255

    14.2.1 Secondary Structure Prediction / 255

    14.2.2 Modeling of Tertiary Structure / 256

    14.3 Template-Based Predictions / 257

    14.3.1 Homology Modeling / 257

    14.3.2 Threading Methods / 257

    14.3.3 Ab initio Modeling / 257

    14.4 Critical Assessment of Protein Structure Prediction / 258

    14.5 Quaternary Structure Prediction / 258

    14.6 Conclusion / 261

    Acknowledgments / 261

    References / 261

    15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
    Sheng-Lung Peng and Yu-Wei Tsay

    15.1 Introduction / 266

    15.2 Similarity in the Graph Model / 268

    15.2.1 Graph Model for Proteins / 270

    15.3 Measuring Structural Similarity VIA MCES / 272

    15.3.1 Problem Formulation / 273

    15.3.2 Constructing P-Graphs / 274

    15.3.3 Constructing Line Graphs / 276

    15.3.4 Constructing Modular Graphs / 276

    15.3.5 Maximum Clique Detection / 277

    15.3.6 Experimental Results / 277

    15.4 Protein Comparison VIA Graph Spectra / 279

    15.4.1 Graph Spectra / 279

    15.4.2 Matrix Selection / 281

    15.4.3 Graph Cospectrality and Similarity / 283

    15.4.4 Cospectral Comparison / 283

    15.4.5 Experimental Results / 284

    15.5 Conclusion / 287

    References / 287

    16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN–PROTEIN INTERACTIONS 291
    Mina Maleki, Michael Hall, and Luis Rueda

    16.1 Introduction / 291

    16.2 Structural Domains / 293

    16.3 The Prediction Framework / 293

    16.4 Feature Extraction and Prediction Properties / 294

    16.4.1 Physicochemical Properties / 296

    16.4.2 Domain-Based Properties / 298

    16.5 Feature Selection / 299

    16.5.1 Filter Methods / 299

    16.5.2 Wrapper Methods / 301

    16.6 Classification / 301

    16.6.1 Linear Dimensionality Reduction / 301

    16.6.2 Support Vector Machines / 303

    16.6.3 k-Nearest Neighbor / 303

    16.6.4 Naive Bayes / 304

    16.7 Evaluation and Analysis / 304

    16.8 Results and Discussion / 304

    16.8.1 Analysis of the Prediction Properties / 304

    16.8.2 Analysis of Structural DDIs / 307

    16.9 Conclusion / 309

    References / 310

    V PATTERN RECOGNITION IN MICROARRAYS 315

    17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
    Hasan O¢gul

    17.1 Introduction / 317

    17.2 Information Retrieval: Terminology and Background / 318

    17.3 Content-Based Retrieval / 320

    17.4 Microarray Data and Databases / 322

    17.5 Methods for Retrieving Microarray Experiments / 324

    17.6 Similarity Metrics / 327

    17.7 Evaluating Retrieval Performance / 329

    17.8 Software Tools / 330

    17.9 Conclusion and Future Directions / 331

    Acknowledgment / 332

    References / 332

    18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
    Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa

    18.1 Introduction / 335

    18.2 From Microarray Image to Signal / 336

    18.2.1 Signal from Oligo DNA Array Image / 336

    18.2.2 Signal from Two-Color cDNA Array / 337

    18.3 Microarray Signal Analysis / 337

    18.3.1 Absolute Analysis and Replicates in Microarrays / 338

    18.3.2 Microarray Normalization / 339

    18.4 Algorithms for De Gene Selection / 339

    18.4.1 Within–Between DE Gene (WB-DEG) Selection Algorithm / 340

    18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection Methods on Latin Square Data / 341

    18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343

    18.6 Conclusion / 345

    References / 345

    19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347
    Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi

    19.1 Introduction / 347

    19.2 Transcriptome Analysis / 348

    19.3 Microarrays / 349

    19.3.1 Applications / 349

    19.3.2 Microarray Technology / 350

    19.3.3 Microarray Workflow / 350

    19.4 RNA-Seq / 351

    19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353

    19.6 Gene Expression Profile Analysis / 356

    19.6.1 Data Definition / 356

    19.6.2 Data Analysis / 357

    19.6.3 Normalization and Background Correction / 357

    19.6.4 Genes Clustering / 359

    19.6.5 Experiment Classification / 361

    19.6.6 Software Tools for Gene Expression Profile Analysis / 362

    19.7 Real Case Studies / 364

    19.8 Conclusions / 367

    References / 368

    20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
    Li Teng

    20.1 Introduction / 371

    20.2 Patterns with Similarity / 373

    20.2.1 Similarity Measurement / 374

    20.2.2 Clustering / 376

    20.2.3 Biclustering / 379

    20.2.4 Types of Biclusters / 380

    20.2.5 Measurement of the Homogeneity / 383

    20.2.6 Biclustering Algorithms with Different Searching Schemes / 387

    20.3 Conclusion / 391

    References / 391

    21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
    Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa

    21.1 Overview / 394

    21.1.1 Background Correction Methods / 395

    21.1.2 Normalization Methods / 396

    21.1.3 Literature Review / 397

    21.2 Arrow Plot / 399

    21.2.1 DE Genes Versus Special Genes / 399

    21.2.2 Definition and Properties of the ROC Curve / 400

    21.2.3 AUC and Degenerate ROC Curves / 401

    21.2.4 Overlapping Coefficient / 402

    21.2.5 Arrow Plot Construction / 403

    21.3 Significance Analysis of Microarrays / 404

    21.4 Correspondence Analysis / 405

    21.4.1 Basic Principles / 405

    21.4.2 Interpretation of CA Maps / 406

    21.5 Impact of the Preprocessing Methods / 407

    21.5.1 Class Prediction Context / 408

    21.5.2 Class Comparison Context / 408

    21.6 Conclusions / 412

    Acknowledgments / 413

    References / 413

    VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417

    22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
    David A. Morrison

    22.1 Introduction / 419

    22.2 Networks and Trees / 420

    22.3 Patterns and Their Processes / 424

    22.4 The Types of Patterns / 427

    22.5 Fingerprints / 431

    22.6 Constructing Networks / 433

    22.7 Multi-Labeled Trees / 435

    22.8 Conclusion / 436

    References / 437

    23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439
    Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas

    23.1 Introduction / 440

    23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440

    23.2.1 Inferring Gene Trees / 441

    23.2.2 Inferring Species Trees / 442

    23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445

    23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446

    23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447

    23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449

    23.7 Concluding Remarks / 449

    Acknowledgments / 450

    References / 450

    24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
    David Dao, Tomáš Flouri, and Alexandros Stamatakis

    24.1 Introduction / 457

    24.2 Preliminaries / 459

    24.3 A Naïve Approach / 462

    24.4 Toward a Faster Method / 463

    24.5 Improved Algorithm / 467

    24.5.1 Preprocessing / 467

    24.5.2 Computing Lowest Common Ancestors / 468

    24.5.3 Constructing the Induced Tree / 468

    24.5.4 Final Remarks / 471

    24.6 Implementation / 473

    24.6.1 Preprocessing / 473

    24.6.2 Reconstruction / 473

    24.6.3 Extracting Bipartitions / 474

    24.7 Evaluation / 474

    24.7.1 Test Data Sets / 474

    24.7.2 Experimental Results / 475

    24.8 Conclusion / 479

    Acknowledgment / 481

    References / 481

    25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
    Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov

    25.1 Introduction / 483

    25.2 Methods / 485

    25.2.1 Clustering Using Variability Functions / 485

    25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm / 487

    25.2.3 Description of the New Algorithm / 488

    25.2.4 Time Complexity / 491

    25.3 Experimental Study / 491

    25.3.1 Implementation / 491

    25.3.2 Synthetic Data / 491

    25.3.3 Real Prokaryotic (Genomic) Data / 495

    25.4 Results and Discussion / 501

    25.4.1 Analysis of Synthetic Data / 501

    25.4.2 Analysis of Prokaryotic Data / 502

    25.5 Conclusion / 502

    References / 503

    VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505

    26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
    Christos Makris and Evangelos Theodoridis

    26.1 Introduction / 507

    26.2 Measures/Metrics / 508

    26.3 Models of Biological Networks / 511

    26.4 Reconstructing and Partitioning Biological Networks / 511

    26.5 PPI Networks / 513

    26.6 Mining PPI Networks—Interaction Prediction / 517

    26.7 Conclusions / 519

    References / 519

    27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525
    Andrej Aderhold, V Anne Smith, and Dirk Husmeier

    27.1 Introduction / 525

    27.2 Molecular Systems / 528

    27.3 Ecological Systems / 528

    27.4 Models and Evaluation / 529

    27.4.1 Notations / 529

    27.4.2 Sparse Regression and the LASSO / 530

    27.4.3 Bayesian Regression / 530

    27.4.4 Evaluation Metric / 531

    27.5 Learning Gene Regulation Networks / 532

    27.5.1 Nonhomogeneous Bayesian Regression / 533

    27.5.2 Gradient Estimation / 534

    27.5.3 Simulated Bio-PEPA Data / 534

    27.5.4 Real mRNA Expression Profile Data / 535

    27.5.5 Method Evaluation and Learned Networks / 536

    27.6 Learning Species Interaction Networks / 540

    27.6.1 Regression Model of Species interactions / 540

    27.6.2 Multiple Global Change-Points / 541

    27.6.3 Mondrian Process Change-Points / 542

    27.6.4 Synthetic Data / 544

    27.6.5 Simulated Population Dynamics / 544

    27.6.6 Real World Plant Data / 546

    27.6.7 Method Evaluation and Learned Networks / 546

    27.7 Conclusion / 550

    References / 550

    28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555
    Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis

    28.1 Introduction / 555

    28.2 Toll-Like Receptors / 557

    28.2.1 Basics / 557

    28.2.2 Structure and Signaling of TLRs / 558

    28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559

    28.3 Structural Equation Modeling / 560

    28.3.1 Methodology of SEM Modeling / 560

    28.3.2 Assumptions / 561

    28.3.3 Estimation Methods / 562

    28.3.4 Missing Data / 562

    28.3.5 Goodness-of-Fit Indices / 563

    28.3.6 Other Indications of a Misspecified Model / 565

    28.4 Application / 566

    28.5 Conclusion / 580

    References / 581

    29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
    Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi

    29.1 Introduction / 585

    29.2 Related Work / 587

    29.3 Problem Formulation / 589

    29.3.1 The Algorithm / 591

    29.4 Experimental Setup / 592

    29.4.1 Data sets / 592

    29.4.2 Comparative Methods / 593

    29.4.3 Experimental Protocol / 594

    29.4.4 Evaluation Criteria / 594

    29.5 Experimental Analysis / 596

    29.5.1 Replenishing Missing Functions / 596

    29.5.2 Predicting Unlabeled Proteins / 600

    29.5.3 Component Analysis / 604

    29.5.4 Run Time Analysis / 604

    29.6 Conclusions / 605

    Acknowledgments / 606

    References / 606

    INDEX 609

Pattern Recognition in Computational Molecular

    Product form

    £109.76

    Includes FREE delivery

    RRP £121.95 – you save £12.19 (9%)

    Order before 4pm today for delivery by Fri 3 Jul 2026.

    A Hardback by Mourad Elloumi, Costas Iliopoulos, Jason T. L. Wang

    10 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Pattern Recognition in Computational Molecular by Mourad Elloumi

      Publisher: John Wiley & Sons Inc
      Publication Date: 09/02/2016
      ISBN13: 9781118893685, 978-1118893685
      ISBN10: 1118893689

      Description

      Book Synopsis

      A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology

      This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.

      • Surveys the development of techniques and approaches on pattern recognition in biomolecular data
      • Discusses pattern recognit

        Table of Contents

        LIST OF CONTRIBUTORS xxi

        PREFACE xxvii

        I PATTERN RECOGNITION IN SEQUENCES 1

        1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
        Giuseppe Lancia

        1.1 Introduction / 3

        1.2 Single Individual Haplotyping / 5

        1.2.1 The Minimum Error Correction Model / 8

        1.2.2 Probabilistic Approaches and Alternative Models / 10

        1.3 Population Haplotyping / 12

        1.3.1 Clark’s Rule / 14

        1.3.2 Pure Parsimony / 15

        1.3.3 Perfect Phylogeny / 19

        1.3.4 Disease Association / 21

        1.3.5 Other Models / 22

        References / 23

        2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
        Sima Behpour and Bhaskar DasGupta

        2.1 Introduction / 28

        2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32

        2.2.1 Average Length of Optimal Barcodes / 33

        2.3 Entropy-Based Information Content Technique for Designing

        Approximation Algorithms for String Barcoding Problems / 34

        2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36

        2.4.1 Reductions from Set Covering Problem / 36

        2.4.2 Reduction from Graph-Coloring Problem / 38

        2.5 Heuristic Algorithms for String Barcoding Problems / 39

        2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39

        2.5.2 Balanced Partitioning Approach / 40

        2.6 Conclusion / 40

        Acknowledgments / 41

        References / 41

        3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
        Matteo Comin and Davide Verzotto

        3.1 Introduction / 43

        3.2 Whole-Genome Sequence Analysis / 44

        3.2.1 Background on Whole-Genome Comparison / 44

        3.2.2 Alignment-Free Methods / 45

        3.2.3 Average Common Subword / 46

        3.2.4 Kullback–Leibler Information Divergence / 47

        3.3 Underlying Approach / 47

        3.3.1 Irredundant Common Subwords / 48

        3.3.2 Underlying Subwords / 49

        3.3.3 Efficient Computation of Underlying Subwords / 50

        3.3.4 Extension to Inversions and Complements / 53

        3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53

        3.4 Experimental Results / 54

        3.4.1 Genome Data sets and Reference Taxonomies / 54

        3.4.2 Whole-Genome Phylogeny Reconstruction / 56

        3.5 Conclusion / 61

        Author’s Contributions / 62

        Acknowledgments / 62

        References / 62

        4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
        Chengpeng Bi

        4.1 Introduction / 65

        4.2 Multiple Sequence Local Alignment / 67

        4.2.1 Overall Objective Function / 67

        4.2.2 Maximum Likelihood Model / 68

        4.3 Motif Finding Algorithms / 70

        4.3.1 DEM Motif Algorithm / 70

        4.3.2 WEM Motif Finding Algorithm / 70

        4.3.3 Metropolis Motif Finding Algorithm / 72

        4.3.4 Gibbs Motif Finding Algorithm / 73

        4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74

        4.4 Time Complexity / 75

        4.5 Case Studies / 75

        4.5.1 Performance Evaluation / 76

        4.5.2 CRP Binding Sites / 76

        4.5.3 Multiple Motifs in Helix–Turn–Helix Protein Structure / 78

        4.6 Conclusion / 80

        References / 81

        5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
        Carl Barton, Tomáš Flouri, Costas S. Iliopoulos, and Solon P. Pissis

        5.1 Introduction / 83

        5.2 Definitions and Notation / 85

        5.3 Problem Definition / 87

        5.4 Algorithms / 88

        5.5 Conclusion / 94

        References / 95

        II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97

        6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
        Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang

        6.1 Introduction / 99

        6.2 Representative Protein Secondary Structure Prediction Methods / 102

        6.2.1 Chou–Fasman / 103

        6.2.2 GOR / 104

        6.2.3 PHD / 104

        6.2.4 PSIPRED / 104

        6.2.5 SPINE-X / 105

        6.2.6 PSSpred / 105

        6.2.7 Meta Methods / 105

        6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106

        6.3.1 Measures / 106

        6.3.2 Benchmark / 106

        6.3.3 Performances / 107

        6.4 Conclusion / 110

        Acknowledgments / 110

        References / 111

        7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
        Yann Guermeur and Fabien Lauer

        7.1 Introduction / 114

        7.2 Biological Sequence Segmentation / 115

        7.3 MSVMpred / 117

        7.3.1 Base Classifiers / 117

        7.3.2 Ensemble Methods / 118

        7.3.3 Convex Combination / 119

        7.4 Postprocessing with A Generative Model / 119

        7.5 Dedication to Protein Secondary Structure Prediction / 120

        7.5.1 Biological Problem / 121

        7.5.2 MSVMpred2 / 121

        7.5.3 Hidden Semi-Markov Model / 122

        7.5.4 Experimental Results / 122

        7.6 Conclusions and Ongoing Research / 125

        Acknowledgments / 126

        References / 126

        8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
        Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih

        8.1 Introduction / 129

        8.2 A Few Basic Concepts / 130

        8.2.1 Hierarchy of Protein Structures / 130

        8.2.2 Secondary Structure Elements / 131

        8.2.3 Structural Motifs / 132

        8.2.4 Available Sources for Protein Data / 134

        8.3 State of the Art / 135

        8.3.1 Protein Structure Motif Search / 135

        8.3.2 Promotif / 136

        8.3.3 Secondary-Structure Matching / 137

        8.3.4 Multiple Structural Alignment by Secondary Structures / 138

        8.4 A Novel Geometrical Approach to Motif Retrieval / 138

        8.4.1 Secondary Structures Cooccurrences / 138

        8.4.2 Cross Motif Search / 143

        8.4.3 Complete Cross Motif Search / 146

        8.5 Implementation Notes / 149

        8.5.1 Optimizations / 149

        8.5.2 Parallel Approaches / 150

        8.6 Conclusions and Future Work / 151

        Acknowledgment / 152

        References / 152

        9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155
        Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang

        9.1 Introduction / 155

        9.2 Background / 156

        9.2.1 Noncoding RNAs and Their Secondary Structures / 156

        9.2.2 Pseudoknotted ncRNA Search Tools / 157

        9.3 Methodology / 157

        9.4 Results and Interpretation / 161

        9.5 Conclusion / 162

        References / 163

        III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165

        10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167
        Wajdi Dhifli and Engelbert Mephu Nguifo

        10.1 Introduction / 167

        10.2 From Protein 3D-Structures to Protein Graphs / 169

        10.2.1 Parsing Protein 3D-Structures into Graphs / 169

        10.3 Graph Mining / 172

        10.4 Subgraph Mining / 173

        10.5 Frequent Subgraph Discovery / 173

        10.5.1 Problem Definition / 174

        10.5.2 Candidates Generation / 176

        10.5.3 Frequent Subgraph Discovery Approaches / 177

        10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178

        10.6 Feature Selection / 179

        10.6.1 Relevance of a Feature / 179

        10.7 Feature Selection for Subgraphs / 180

        10.7.1 Problem Statement / 180

        10.7.2 Mining Top-k Subgraphs / 180

        10.7.3 Clustering-Based Subgraph Selection / 181

        10.7.4 Sampling-Based Approaches / 181

        10.7.5 Approximate Subgraph Mining / 181

        10.7.6 Discriminative Subgraph Selection / 182

        10.7.7 Other Significant Subgraph Selection Approaches / 182

        10.8 Discussion / 183

        10.9 Conclusion / 185

        Acknowledgments / 185

        References / 186

        11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190
        Chinua Umoja, Xiaxia Yu, and Robert Harrison

        11.1 Introduction / 190

        11.2 Genetic Algorithms / 192

        11.2.1 GA Model Selection in Protein Structure Prediction / 196

        11.2.2 Common Methodology / 198

        11.3 Supervised Machine Learning Algorithm / 201

        11.3.1 Artificial Neural Networks / 201

        11.3.2 ANNs in Protein Structure Prediction / 202

        11.3.3 Support Vector Machines / 203

        11.4 Fuzzy Application / 204

        11.4.1 Fuzzy Logic / 204

        11.4.2 Fuzzy SVMs / 204

        11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205

        11.4.4 Fuzzy Decision Trees / 206

        11.5 Conclusion / 207

        References / 208

        12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
        Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat

        12.1 Introduction / 212

        12.2 Protein Structure Overview / 213

        12.3 Technical Challenges and Open Issues / 214

        12.4 Prediction Assessment / 215

        12.5 Current Approaches / 216

        12.5.1 DomCut / 216

        12.5.2 Scooby-Domain / 217

        12.5.3 FIEFDom / 218

        12.5.4 Chatterjee et al. (2009) / 219

        12.5.5 Drop / 219

        12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220

        12.6.1 Multi-Domain Benchmark Data Set / 220

        12.6.2 Compact Domain Profile / 221

        12.6.3 The Enhanced Semi-Parametric Model / 222

        12.6.4 Training, Testing, and Validation / 225

        12.6.5 Experimental Results / 226

        12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227

        12.7.1 Compositional Index / 228

        12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229

        12.7.3 Experimental Results / 230

        12.8 Conclusion / 232

        References / 233

        13 PREDICTION OF PROLINE CIS–TRANS ISOMERIZATION 236
        Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya

        13.1 Introduction / 236

        13.2 Methods / 238

        13.2.1 Evolutionary Data Set Construction / 238

        13.2.2 Protein Secondary Structure Information / 239

        13.2.3 Method I: Intelligent Voting / 239

        13.2.4 Method II: Randomized Meta-Learning / 241

        13.2.5 Model Validation and Testing / 242

        13.2.6 Parameter Tuning / 242

        13.3 Model Evaluation and Analysis / 243

        13.4 Conclusion / 245

        References / 245

        IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249

        14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
        Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou

        14.1 Introduction / 251

        14.2 Protein Structure Prediction / 255

        14.2.1 Secondary Structure Prediction / 255

        14.2.2 Modeling of Tertiary Structure / 256

        14.3 Template-Based Predictions / 257

        14.3.1 Homology Modeling / 257

        14.3.2 Threading Methods / 257

        14.3.3 Ab initio Modeling / 257

        14.4 Critical Assessment of Protein Structure Prediction / 258

        14.5 Quaternary Structure Prediction / 258

        14.6 Conclusion / 261

        Acknowledgments / 261

        References / 261

        15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
        Sheng-Lung Peng and Yu-Wei Tsay

        15.1 Introduction / 266

        15.2 Similarity in the Graph Model / 268

        15.2.1 Graph Model for Proteins / 270

        15.3 Measuring Structural Similarity VIA MCES / 272

        15.3.1 Problem Formulation / 273

        15.3.2 Constructing P-Graphs / 274

        15.3.3 Constructing Line Graphs / 276

        15.3.4 Constructing Modular Graphs / 276

        15.3.5 Maximum Clique Detection / 277

        15.3.6 Experimental Results / 277

        15.4 Protein Comparison VIA Graph Spectra / 279

        15.4.1 Graph Spectra / 279

        15.4.2 Matrix Selection / 281

        15.4.3 Graph Cospectrality and Similarity / 283

        15.4.4 Cospectral Comparison / 283

        15.4.5 Experimental Results / 284

        15.5 Conclusion / 287

        References / 287

        16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN–PROTEIN INTERACTIONS 291
        Mina Maleki, Michael Hall, and Luis Rueda

        16.1 Introduction / 291

        16.2 Structural Domains / 293

        16.3 The Prediction Framework / 293

        16.4 Feature Extraction and Prediction Properties / 294

        16.4.1 Physicochemical Properties / 296

        16.4.2 Domain-Based Properties / 298

        16.5 Feature Selection / 299

        16.5.1 Filter Methods / 299

        16.5.2 Wrapper Methods / 301

        16.6 Classification / 301

        16.6.1 Linear Dimensionality Reduction / 301

        16.6.2 Support Vector Machines / 303

        16.6.3 k-Nearest Neighbor / 303

        16.6.4 Naive Bayes / 304

        16.7 Evaluation and Analysis / 304

        16.8 Results and Discussion / 304

        16.8.1 Analysis of the Prediction Properties / 304

        16.8.2 Analysis of Structural DDIs / 307

        16.9 Conclusion / 309

        References / 310

        V PATTERN RECOGNITION IN MICROARRAYS 315

        17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
        Hasan O¢gul

        17.1 Introduction / 317

        17.2 Information Retrieval: Terminology and Background / 318

        17.3 Content-Based Retrieval / 320

        17.4 Microarray Data and Databases / 322

        17.5 Methods for Retrieving Microarray Experiments / 324

        17.6 Similarity Metrics / 327

        17.7 Evaluating Retrieval Performance / 329

        17.8 Software Tools / 330

        17.9 Conclusion and Future Directions / 331

        Acknowledgment / 332

        References / 332

        18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
        Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa

        18.1 Introduction / 335

        18.2 From Microarray Image to Signal / 336

        18.2.1 Signal from Oligo DNA Array Image / 336

        18.2.2 Signal from Two-Color cDNA Array / 337

        18.3 Microarray Signal Analysis / 337

        18.3.1 Absolute Analysis and Replicates in Microarrays / 338

        18.3.2 Microarray Normalization / 339

        18.4 Algorithms for De Gene Selection / 339

        18.4.1 Within–Between DE Gene (WB-DEG) Selection Algorithm / 340

        18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection Methods on Latin Square Data / 341

        18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343

        18.6 Conclusion / 345

        References / 345

        19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347
        Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi

        19.1 Introduction / 347

        19.2 Transcriptome Analysis / 348

        19.3 Microarrays / 349

        19.3.1 Applications / 349

        19.3.2 Microarray Technology / 350

        19.3.3 Microarray Workflow / 350

        19.4 RNA-Seq / 351

        19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353

        19.6 Gene Expression Profile Analysis / 356

        19.6.1 Data Definition / 356

        19.6.2 Data Analysis / 357

        19.6.3 Normalization and Background Correction / 357

        19.6.4 Genes Clustering / 359

        19.6.5 Experiment Classification / 361

        19.6.6 Software Tools for Gene Expression Profile Analysis / 362

        19.7 Real Case Studies / 364

        19.8 Conclusions / 367

        References / 368

        20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
        Li Teng

        20.1 Introduction / 371

        20.2 Patterns with Similarity / 373

        20.2.1 Similarity Measurement / 374

        20.2.2 Clustering / 376

        20.2.3 Biclustering / 379

        20.2.4 Types of Biclusters / 380

        20.2.5 Measurement of the Homogeneity / 383

        20.2.6 Biclustering Algorithms with Different Searching Schemes / 387

        20.3 Conclusion / 391

        References / 391

        21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
        Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa

        21.1 Overview / 394

        21.1.1 Background Correction Methods / 395

        21.1.2 Normalization Methods / 396

        21.1.3 Literature Review / 397

        21.2 Arrow Plot / 399

        21.2.1 DE Genes Versus Special Genes / 399

        21.2.2 Definition and Properties of the ROC Curve / 400

        21.2.3 AUC and Degenerate ROC Curves / 401

        21.2.4 Overlapping Coefficient / 402

        21.2.5 Arrow Plot Construction / 403

        21.3 Significance Analysis of Microarrays / 404

        21.4 Correspondence Analysis / 405

        21.4.1 Basic Principles / 405

        21.4.2 Interpretation of CA Maps / 406

        21.5 Impact of the Preprocessing Methods / 407

        21.5.1 Class Prediction Context / 408

        21.5.2 Class Comparison Context / 408

        21.6 Conclusions / 412

        Acknowledgments / 413

        References / 413

        VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417

        22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
        David A. Morrison

        22.1 Introduction / 419

        22.2 Networks and Trees / 420

        22.3 Patterns and Their Processes / 424

        22.4 The Types of Patterns / 427

        22.5 Fingerprints / 431

        22.6 Constructing Networks / 433

        22.7 Multi-Labeled Trees / 435

        22.8 Conclusion / 436

        References / 437

        23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439
        Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas

        23.1 Introduction / 440

        23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440

        23.2.1 Inferring Gene Trees / 441

        23.2.2 Inferring Species Trees / 442

        23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445

        23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446

        23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447

        23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449

        23.7 Concluding Remarks / 449

        Acknowledgments / 450

        References / 450

        24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
        David Dao, Tomáš Flouri, and Alexandros Stamatakis

        24.1 Introduction / 457

        24.2 Preliminaries / 459

        24.3 A Naïve Approach / 462

        24.4 Toward a Faster Method / 463

        24.5 Improved Algorithm / 467

        24.5.1 Preprocessing / 467

        24.5.2 Computing Lowest Common Ancestors / 468

        24.5.3 Constructing the Induced Tree / 468

        24.5.4 Final Remarks / 471

        24.6 Implementation / 473

        24.6.1 Preprocessing / 473

        24.6.2 Reconstruction / 473

        24.6.3 Extracting Bipartitions / 474

        24.7 Evaluation / 474

        24.7.1 Test Data Sets / 474

        24.7.2 Experimental Results / 475

        24.8 Conclusion / 479

        Acknowledgment / 481

        References / 481

        25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
        Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov

        25.1 Introduction / 483

        25.2 Methods / 485

        25.2.1 Clustering Using Variability Functions / 485

        25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm / 487

        25.2.3 Description of the New Algorithm / 488

        25.2.4 Time Complexity / 491

        25.3 Experimental Study / 491

        25.3.1 Implementation / 491

        25.3.2 Synthetic Data / 491

        25.3.3 Real Prokaryotic (Genomic) Data / 495

        25.4 Results and Discussion / 501

        25.4.1 Analysis of Synthetic Data / 501

        25.4.2 Analysis of Prokaryotic Data / 502

        25.5 Conclusion / 502

        References / 503

        VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505

        26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
        Christos Makris and Evangelos Theodoridis

        26.1 Introduction / 507

        26.2 Measures/Metrics / 508

        26.3 Models of Biological Networks / 511

        26.4 Reconstructing and Partitioning Biological Networks / 511

        26.5 PPI Networks / 513

        26.6 Mining PPI Networks—Interaction Prediction / 517

        26.7 Conclusions / 519

        References / 519

        27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525
        Andrej Aderhold, V Anne Smith, and Dirk Husmeier

        27.1 Introduction / 525

        27.2 Molecular Systems / 528

        27.3 Ecological Systems / 528

        27.4 Models and Evaluation / 529

        27.4.1 Notations / 529

        27.4.2 Sparse Regression and the LASSO / 530

        27.4.3 Bayesian Regression / 530

        27.4.4 Evaluation Metric / 531

        27.5 Learning Gene Regulation Networks / 532

        27.5.1 Nonhomogeneous Bayesian Regression / 533

        27.5.2 Gradient Estimation / 534

        27.5.3 Simulated Bio-PEPA Data / 534

        27.5.4 Real mRNA Expression Profile Data / 535

        27.5.5 Method Evaluation and Learned Networks / 536

        27.6 Learning Species Interaction Networks / 540

        27.6.1 Regression Model of Species interactions / 540

        27.6.2 Multiple Global Change-Points / 541

        27.6.3 Mondrian Process Change-Points / 542

        27.6.4 Synthetic Data / 544

        27.6.5 Simulated Population Dynamics / 544

        27.6.6 Real World Plant Data / 546

        27.6.7 Method Evaluation and Learned Networks / 546

        27.7 Conclusion / 550

        References / 550

        28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555
        Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis

        28.1 Introduction / 555

        28.2 Toll-Like Receptors / 557

        28.2.1 Basics / 557

        28.2.2 Structure and Signaling of TLRs / 558

        28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559

        28.3 Structural Equation Modeling / 560

        28.3.1 Methodology of SEM Modeling / 560

        28.3.2 Assumptions / 561

        28.3.3 Estimation Methods / 562

        28.3.4 Missing Data / 562

        28.3.5 Goodness-of-Fit Indices / 563

        28.3.6 Other Indications of a Misspecified Model / 565

        28.4 Application / 566

        28.5 Conclusion / 580

        References / 581

        29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
        Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi

        29.1 Introduction / 585

        29.2 Related Work / 587

        29.3 Problem Formulation / 589

        29.3.1 The Algorithm / 591

        29.4 Experimental Setup / 592

        29.4.1 Data sets / 592

        29.4.2 Comparative Methods / 593

        29.4.3 Experimental Protocol / 594

        29.4.4 Evaluation Criteria / 594

        29.5 Experimental Analysis / 596

        29.5.1 Replenishing Missing Functions / 596

        29.5.2 Predicting Unlabeled Proteins / 600

        29.5.3 Component Analysis / 604

        29.5.4 Run Time Analysis / 604

        29.6 Conclusions / 605

        Acknowledgments / 606

        References / 606

        INDEX 609

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account