Description

Book Synopsis

A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology

This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.

  • Surveys the development of techniques and approaches on pattern recognition in biomolecular data
  • Discusses pattern recognit

    Table of Contents

    LIST OF CONTRIBUTORS xxi

    PREFACE xxvii

    I PATTERN RECOGNITION IN SEQUENCES 1

    1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
    Giuseppe Lancia

    1.1 Introduction / 3

    1.2 Single Individual Haplotyping / 5

    1.2.1 The Minimum Error Correction Model / 8

    1.2.2 Probabilistic Approaches and Alternative Models / 10

    1.3 Population Haplotyping / 12

    1.3.1 Clark’s Rule / 14

    1.3.2 Pure Parsimony / 15

    1.3.3 Perfect Phylogeny / 19

    1.3.4 Disease Association / 21

    1.3.5 Other Models / 22

    References / 23

    2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
    Sima Behpour and Bhaskar DasGupta

    2.1 Introduction / 28

    2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32

    2.2.1 Average Length of Optimal Barcodes / 33

    2.3 Entropy-Based Information Content Technique for Designing

    Approximation Algorithms for String Barcoding Problems / 34

    2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36

    2.4.1 Reductions from Set Covering Problem / 36

    2.4.2 Reduction from Graph-Coloring Problem / 38

    2.5 Heuristic Algorithms for String Barcoding Problems / 39

    2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39

    2.5.2 Balanced Partitioning Approach / 40

    2.6 Conclusion / 40

    Acknowledgments / 41

    References / 41

    3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
    Matteo Comin and Davide Verzotto

    3.1 Introduction / 43

    3.2 Whole-Genome Sequence Analysis / 44

    3.2.1 Background on Whole-Genome Comparison / 44

    3.2.2 Alignment-Free Methods / 45

    3.2.3 Average Common Subword / 46

    3.2.4 Kullback–Leibler Information Divergence / 47

    3.3 Underlying Approach / 47

    3.3.1 Irredundant Common Subwords / 48

    3.3.2 Underlying Subwords / 49

    3.3.3 Efficient Computation of Underlying Subwords / 50

    3.3.4 Extension to Inversions and Complements / 53

    3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53

    3.4 Experimental Results / 54

    3.4.1 Genome Data sets and Reference Taxonomies / 54

    3.4.2 Whole-Genome Phylogeny Reconstruction / 56

    3.5 Conclusion / 61

    Author’s Contributions / 62

    Acknowledgments / 62

    References / 62

    4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
    Chengpeng Bi

    4.1 Introduction / 65

    4.2 Multiple Sequence Local Alignment / 67

    4.2.1 Overall Objective Function / 67

    4.2.2 Maximum Likelihood Model / 68

    4.3 Motif Finding Algorithms / 70

    4.3.1 DEM Motif Algorithm / 70

    4.3.2 WEM Motif Finding Algorithm / 70

    4.3.3 Metropolis Motif Finding Algorithm / 72

    4.3.4 Gibbs Motif Finding Algorithm / 73

    4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74

    4.4 Time Complexity / 75

    4.5 Case Studies / 75

    4.5.1 Performance Evaluation / 76

    4.5.2 CRP Binding Sites / 76

    4.5.3 Multiple Motifs in Helix–Turn–Helix Protein Structure / 78

    4.6 Conclusion / 80

    References / 81

    5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
    Carl Barton, Tomáš Flouri, Costas S. Iliopoulos, and Solon P. Pissis

    5.1 Introduction / 83

    5.2 Definitions and Notation / 85

    5.3 Problem Definition / 87

    5.4 Algorithms / 88

    5.5 Conclusion / 94

    References / 95

    II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97

    6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
    Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang

    6.1 Introduction / 99

    6.2 Representative Protein Secondary Structure Prediction Methods / 102

    6.2.1 Chou–Fasman / 103

    6.2.2 GOR / 104

    6.2.3 PHD / 104

    6.2.4 PSIPRED / 104

    6.2.5 SPINE-X / 105

    6.2.6 PSSpred / 105

    6.2.7 Meta Methods / 105

    6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106

    6.3.1 Measures / 106

    6.3.2 Benchmark / 106

    6.3.3 Performances / 107

    6.4 Conclusion / 110

    Acknowledgments / 110

    References / 111

    7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
    Yann Guermeur and Fabien Lauer

    7.1 Introduction / 114

    7.2 Biological Sequence Segmentation / 115

    7.3 MSVMpred / 117

    7.3.1 Base Classifiers / 117

    7.3.2 Ensemble Methods / 118

    7.3.3 Convex Combination / 119

    7.4 Postprocessing with A Generative Model / 119

    7.5 Dedication to Protein Secondary Structure Prediction / 120

    7.5.1 Biological Problem / 121

    7.5.2 MSVMpred2 / 121

    7.5.3 Hidden Semi-Markov Model / 122

    7.5.4 Experimental Results / 122

    7.6 Conclusions and Ongoing Research / 125

    Acknowledgments / 126

    References / 126

    8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
    Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih

    8.1 Introduction / 129

    8.2 A Few Basic Concepts / 130

    8.2.1 Hierarchy of Protein Structures / 130

    8.2.2 Secondary Structure Elements / 131

    8.2.3 Structural Motifs / 132

    8.2.4 Available Sources for Protein Data / 134

    8.3 State of the Art / 135

    8.3.1 Protein Structure Motif Search / 135

    8.3.2 Promotif / 136

    8.3.3 Secondary-Structure Matching / 137

    8.3.4 Multiple Structural Alignment by Secondary Structures / 138

    8.4 A Novel Geometrical Approach to Motif Retrieval / 138

    8.4.1 Secondary Structures Cooccurrences / 138

    8.4.2 Cross Motif Search / 143

    8.4.3 Complete Cross Motif Search / 146

    8.5 Implementation Notes / 149

    8.5.1 Optimizations / 149

    8.5.2 Parallel Approaches / 150

    8.6 Conclusions and Future Work / 151

    Acknowledgment / 152

    References / 152

    9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155
    Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang

    9.1 Introduction / 155

    9.2 Background / 156

    9.2.1 Noncoding RNAs and Their Secondary Structures / 156

    9.2.2 Pseudoknotted ncRNA Search Tools / 157

    9.3 Methodology / 157

    9.4 Results and Interpretation / 161

    9.5 Conclusion / 162

    References / 163

    III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165

    10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167
    Wajdi Dhifli and Engelbert Mephu Nguifo

    10.1 Introduction / 167

    10.2 From Protein 3D-Structures to Protein Graphs / 169

    10.2.1 Parsing Protein 3D-Structures into Graphs / 169

    10.3 Graph Mining / 172

    10.4 Subgraph Mining / 173

    10.5 Frequent Subgraph Discovery / 173

    10.5.1 Problem Definition / 174

    10.5.2 Candidates Generation / 176

    10.5.3 Frequent Subgraph Discovery Approaches / 177

    10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178

    10.6 Feature Selection / 179

    10.6.1 Relevance of a Feature / 179

    10.7 Feature Selection for Subgraphs / 180

    10.7.1 Problem Statement / 180

    10.7.2 Mining Top-k Subgraphs / 180

    10.7.3 Clustering-Based Subgraph Selection / 181

    10.7.4 Sampling-Based Approaches / 181

    10.7.5 Approximate Subgraph Mining / 181

    10.7.6 Discriminative Subgraph Selection / 182

    10.7.7 Other Significant Subgraph Selection Approaches / 182

    10.8 Discussion / 183

    10.9 Conclusion / 185

    Acknowledgments / 185

    References / 186

    11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190
    Chinua Umoja, Xiaxia Yu, and Robert Harrison

    11.1 Introduction / 190

    11.2 Genetic Algorithms / 192

    11.2.1 GA Model Selection in Protein Structure Prediction / 196

    11.2.2 Common Methodology / 198

    11.3 Supervised Machine Learning Algorithm / 201

    11.3.1 Artificial Neural Networks / 201

    11.3.2 ANNs in Protein Structure Prediction / 202

    11.3.3 Support Vector Machines / 203

    11.4 Fuzzy Application / 204

    11.4.1 Fuzzy Logic / 204

    11.4.2 Fuzzy SVMs / 204

    11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205

    11.4.4 Fuzzy Decision Trees / 206

    11.5 Conclusion / 207

    References / 208

    12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
    Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat

    12.1 Introduction / 212

    12.2 Protein Structure Overview / 213

    12.3 Technical Challenges and Open Issues / 214

    12.4 Prediction Assessment / 215

    12.5 Current Approaches / 216

    12.5.1 DomCut / 216

    12.5.2 Scooby-Domain / 217

    12.5.3 FIEFDom / 218

    12.5.4 Chatterjee et al. (2009) / 219

    12.5.5 Drop / 219

    12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220

    12.6.1 Multi-Domain Benchmark Data Set / 220

    12.6.2 Compact Domain Profile / 221

    12.6.3 The Enhanced Semi-Parametric Model / 222

    12.6.4 Training, Testing, and Validation / 225

    12.6.5 Experimental Results / 226

    12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227

    12.7.1 Compositional Index / 228

    12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229

    12.7.3 Experimental Results / 230

    12.8 Conclusion / 232

    References / 233

    13 PREDICTION OF PROLINE CIS–TRANS ISOMERIZATION 236
    Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya

    13.1 Introduction / 236

    13.2 Methods / 238

    13.2.1 Evolutionary Data Set Construction / 238

    13.2.2 Protein Secondary Structure Information / 239

    13.2.3 Method I: Intelligent Voting / 239

    13.2.4 Method II: Randomized Meta-Learning / 241

    13.2.5 Model Validation and Testing / 242

    13.2.6 Parameter Tuning / 242

    13.3 Model Evaluation and Analysis / 243

    13.4 Conclusion / 245

    References / 245

    IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249

    14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
    Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou

    14.1 Introduction / 251

    14.2 Protein Structure Prediction / 255

    14.2.1 Secondary Structure Prediction / 255

    14.2.2 Modeling of Tertiary Structure / 256

    14.3 Template-Based Predictions / 257

    14.3.1 Homology Modeling / 257

    14.3.2 Threading Methods / 257

    14.3.3 Ab initio Modeling / 257

    14.4 Critical Assessment of Protein Structure Prediction / 258

    14.5 Quaternary Structure Prediction / 258

    14.6 Conclusion / 261

    Acknowledgments / 261

    References / 261

    15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
    Sheng-Lung Peng and Yu-Wei Tsay

    15.1 Introduction / 266

    15.2 Similarity in the Graph Model / 268

    15.2.1 Graph Model for Proteins / 270

    15.3 Measuring Structural Similarity VIA MCES / 272

    15.3.1 Problem Formulation / 273

    15.3.2 Constructing P-Graphs / 274

    15.3.3 Constructing Line Graphs / 276

    15.3.4 Constructing Modular Graphs / 276

    15.3.5 Maximum Clique Detection / 277

    15.3.6 Experimental Results / 277

    15.4 Protein Comparison VIA Graph Spectra / 279

    15.4.1 Graph Spectra / 279

    15.4.2 Matrix Selection / 281

    15.4.3 Graph Cospectrality and Similarity / 283

    15.4.4 Cospectral Comparison / 283

    15.4.5 Experimental Results / 284

    15.5 Conclusion / 287

    References / 287

    16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN–PROTEIN INTERACTIONS 291
    Mina Maleki, Michael Hall, and Luis Rueda

    16.1 Introduction / 291

    16.2 Structural Domains / 293

    16.3 The Prediction Framework / 293

    16.4 Feature Extraction and Prediction Properties / 294

    16.4.1 Physicochemical Properties / 296

    16.4.2 Domain-Based Properties / 298

    16.5 Feature Selection / 299

    16.5.1 Filter Methods / 299

    16.5.2 Wrapper Methods / 301

    16.6 Classification / 301

    16.6.1 Linear Dimensionality Reduction / 301

    16.6.2 Support Vector Machines / 303

    16.6.3 k-Nearest Neighbor / 303

    16.6.4 Naive Bayes / 304

    16.7 Evaluation and Analysis / 304

    16.8 Results and Discussion / 304

    16.8.1 Analysis of the Prediction Properties / 304

    16.8.2 Analysis of Structural DDIs / 307

    16.9 Conclusion / 309

    References / 310

    V PATTERN RECOGNITION IN MICROARRAYS 315

    17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
    Hasan O¢gul

    17.1 Introduction / 317

    17.2 Information Retrieval: Terminology and Background / 318

    17.3 Content-Based Retrieval / 320

    17.4 Microarray Data and Databases / 322

    17.5 Methods for Retrieving Microarray Experiments / 324

    17.6 Similarity Metrics / 327

    17.7 Evaluating Retrieval Performance / 329

    17.8 Software Tools / 330

    17.9 Conclusion and Future Directions / 331

    Acknowledgment / 332

    References / 332

    18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
    Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa

    18.1 Introduction / 335

    18.2 From Microarray Image to Signal / 336

    18.2.1 Signal from Oligo DNA Array Image / 336

    18.2.2 Signal from Two-Color cDNA Array / 337

    18.3 Microarray Signal Analysis / 337

    18.3.1 Absolute Analysis and Replicates in Microarrays / 338

    18.3.2 Microarray Normalization / 339

    18.4 Algorithms for De Gene Selection / 339

    18.4.1 Within–Between DE Gene (WB-DEG) Selection Algorithm / 340

    18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection Methods on Latin Square Data / 341

    18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343

    18.6 Conclusion / 345

    References / 345

    19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347
    Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi

    19.1 Introduction / 347

    19.2 Transcriptome Analysis / 348

    19.3 Microarrays / 349

    19.3.1 Applications / 349

    19.3.2 Microarray Technology / 350

    19.3.3 Microarray Workflow / 350

    19.4 RNA-Seq / 351

    19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353

    19.6 Gene Expression Profile Analysis / 356

    19.6.1 Data Definition / 356

    19.6.2 Data Analysis / 357

    19.6.3 Normalization and Background Correction / 357

    19.6.4 Genes Clustering / 359

    19.6.5 Experiment Classification / 361

    19.6.6 Software Tools for Gene Expression Profile Analysis / 362

    19.7 Real Case Studies / 364

    19.8 Conclusions / 367

    References / 368

    20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
    Li Teng

    20.1 Introduction / 371

    20.2 Patterns with Similarity / 373

    20.2.1 Similarity Measurement / 374

    20.2.2 Clustering / 376

    20.2.3 Biclustering / 379

    20.2.4 Types of Biclusters / 380

    20.2.5 Measurement of the Homogeneity / 383

    20.2.6 Biclustering Algorithms with Different Searching Schemes / 387

    20.3 Conclusion / 391

    References / 391

    21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
    Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa

    21.1 Overview / 394

    21.1.1 Background Correction Methods / 395

    21.1.2 Normalization Methods / 396

    21.1.3 Literature Review / 397

    21.2 Arrow Plot / 399

    21.2.1 DE Genes Versus Special Genes / 399

    21.2.2 Definition and Properties of the ROC Curve / 400

    21.2.3 AUC and Degenerate ROC Curves / 401

    21.2.4 Overlapping Coefficient / 402

    21.2.5 Arrow Plot Construction / 403

    21.3 Significance Analysis of Microarrays / 404

    21.4 Correspondence Analysis / 405

    21.4.1 Basic Principles / 405

    21.4.2 Interpretation of CA Maps / 406

    21.5 Impact of the Preprocessing Methods / 407

    21.5.1 Class Prediction Context / 408

    21.5.2 Class Comparison Context / 408

    21.6 Conclusions / 412

    Acknowledgments / 413

    References / 413

    VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417

    22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
    David A. Morrison

    22.1 Introduction / 419

    22.2 Networks and Trees / 420

    22.3 Patterns and Their Processes / 424

    22.4 The Types of Patterns / 427

    22.5 Fingerprints / 431

    22.6 Constructing Networks / 433

    22.7 Multi-Labeled Trees / 435

    22.8 Conclusion / 436

    References / 437

    23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439
    Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas

    23.1 Introduction / 440

    23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440

    23.2.1 Inferring Gene Trees / 441

    23.2.2 Inferring Species Trees / 442

    23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445

    23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446

    23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447

    23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449

    23.7 Concluding Remarks / 449

    Acknowledgments / 450

    References / 450

    24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
    David Dao, Tomáš Flouri, and Alexandros Stamatakis

    24.1 Introduction / 457

    24.2 Preliminaries / 459

    24.3 A Naïve Approach / 462

    24.4 Toward a Faster Method / 463

    24.5 Improved Algorithm / 467

    24.5.1 Preprocessing / 467

    24.5.2 Computing Lowest Common Ancestors / 468

    24.5.3 Constructing the Induced Tree / 468

    24.5.4 Final Remarks / 471

    24.6 Implementation / 473

    24.6.1 Preprocessing / 473

    24.6.2 Reconstruction / 473

    24.6.3 Extracting Bipartitions / 474

    24.7 Evaluation / 474

    24.7.1 Test Data Sets / 474

    24.7.2 Experimental Results / 475

    24.8 Conclusion / 479

    Acknowledgment / 481

    References / 481

    25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
    Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov

    25.1 Introduction / 483

    25.2 Methods / 485

    25.2.1 Clustering Using Variability Functions / 485

    25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm / 487

    25.2.3 Description of the New Algorithm / 488

    25.2.4 Time Complexity / 491

    25.3 Experimental Study / 491

    25.3.1 Implementation / 491

    25.3.2 Synthetic Data / 491

    25.3.3 Real Prokaryotic (Genomic) Data / 495

    25.4 Results and Discussion / 501

    25.4.1 Analysis of Synthetic Data / 501

    25.4.2 Analysis of Prokaryotic Data / 502

    25.5 Conclusion / 502

    References / 503

    VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505

    26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
    Christos Makris and Evangelos Theodoridis

    26.1 Introduction / 507

    26.2 Measures/Metrics / 508

    26.3 Models of Biological Networks / 511

    26.4 Reconstructing and Partitioning Biological Networks / 511

    26.5 PPI Networks / 513

    26.6 Mining PPI Networks—Interaction Prediction / 517

    26.7 Conclusions / 519

    References / 519

    27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525
    Andrej Aderhold, V Anne Smith, and Dirk Husmeier

    27.1 Introduction / 525

    27.2 Molecular Systems / 528

    27.3 Ecological Systems / 528

    27.4 Models and Evaluation / 529

    27.4.1 Notations / 529

    27.4.2 Sparse Regression and the LASSO / 530

    27.4.3 Bayesian Regression / 530

    27.4.4 Evaluation Metric / 531

    27.5 Learning Gene Regulation Networks / 532

    27.5.1 Nonhomogeneous Bayesian Regression / 533

    27.5.2 Gradient Estimation / 534

    27.5.3 Simulated Bio-PEPA Data / 534

    27.5.4 Real mRNA Expression Profile Data / 535

    27.5.5 Method Evaluation and Learned Networks / 536

    27.6 Learning Species Interaction Networks / 540

    27.6.1 Regression Model of Species interactions / 540

    27.6.2 Multiple Global Change-Points / 541

    27.6.3 Mondrian Process Change-Points / 542

    27.6.4 Synthetic Data / 544

    27.6.5 Simulated Population Dynamics / 544

    27.6.6 Real World Plant Data / 546

    27.6.7 Method Evaluation and Learned Networks / 546

    27.7 Conclusion / 550

    References / 550

    28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555
    Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis

    28.1 Introduction / 555

    28.2 Toll-Like Receptors / 557

    28.2.1 Basics / 557

    28.2.2 Structure and Signaling of TLRs / 558

    28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559

    28.3 Structural Equation Modeling / 560

    28.3.1 Methodology of SEM Modeling / 560

    28.3.2 Assumptions / 561

    28.3.3 Estimation Methods / 562

    28.3.4 Missing Data / 562

    28.3.5 Goodness-of-Fit Indices / 563

    28.3.6 Other Indications of a Misspecified Model / 565

    28.4 Application / 566

    28.5 Conclusion / 580

    References / 581

    29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
    Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi

    29.1 Introduction / 585

    29.2 Related Work / 587

    29.3 Problem Formulation / 589

    29.3.1 The Algorithm / 591

    29.4 Experimental Setup / 592

    29.4.1 Data sets / 592

    29.4.2 Comparative Methods / 593

    29.4.3 Experimental Protocol / 594

    29.4.4 Evaluation Criteria / 594

    29.5 Experimental Analysis / 596

    29.5.1 Replenishing Missing Functions / 596

    29.5.2 Predicting Unlabeled Proteins / 600

    29.5.3 Component Analysis / 604

    29.5.4 Run Time Analysis / 604

    29.6 Conclusions / 605

    Acknowledgments / 606

    References / 606

    INDEX 609

Pattern Recognition in Computational Molecular

Product form

£109.76

Includes FREE delivery

RRP £121.95 – you save £12.19 (9%)

Order before 4pm today for delivery by Tue 20 Jan 2026.

A Hardback by Mourad Elloumi, Costas Iliopoulos, Jason T. L. Wang

10 in stock


    View other formats and editions of Pattern Recognition in Computational Molecular by Mourad Elloumi

    Publisher: John Wiley & Sons Inc
    Publication Date: 09/02/2016
    ISBN13: 9781118893685, 978-1118893685
    ISBN10: 1118893689

    Description

    Book Synopsis

    A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology

    This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks.

    • Surveys the development of techniques and approaches on pattern recognition in biomolecular data
    • Discusses pattern recognit

      Table of Contents

      LIST OF CONTRIBUTORS xxi

      PREFACE xxvii

      I PATTERN RECOGNITION IN SEQUENCES 1

      1 COMBINATORIAL HAPLOTYPING PROBLEMS 3
      Giuseppe Lancia

      1.1 Introduction / 3

      1.2 Single Individual Haplotyping / 5

      1.2.1 The Minimum Error Correction Model / 8

      1.2.2 Probabilistic Approaches and Alternative Models / 10

      1.3 Population Haplotyping / 12

      1.3.1 Clark’s Rule / 14

      1.3.2 Pure Parsimony / 15

      1.3.3 Perfect Phylogeny / 19

      1.3.4 Disease Association / 21

      1.3.5 Other Models / 22

      References / 23

      2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28
      Sima Behpour and Bhaskar DasGupta

      2.1 Introduction / 28

      2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32

      2.2.1 Average Length of Optimal Barcodes / 33

      2.3 Entropy-Based Information Content Technique for Designing

      Approximation Algorithms for String Barcoding Problems / 34

      2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36

      2.4.1 Reductions from Set Covering Problem / 36

      2.4.2 Reduction from Graph-Coloring Problem / 38

      2.5 Heuristic Algorithms for String Barcoding Problems / 39

      2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39

      2.5.2 Balanced Partitioning Approach / 40

      2.6 Conclusion / 40

      Acknowledgments / 41

      References / 41

      3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43
      Matteo Comin and Davide Verzotto

      3.1 Introduction / 43

      3.2 Whole-Genome Sequence Analysis / 44

      3.2.1 Background on Whole-Genome Comparison / 44

      3.2.2 Alignment-Free Methods / 45

      3.2.3 Average Common Subword / 46

      3.2.4 Kullback–Leibler Information Divergence / 47

      3.3 Underlying Approach / 47

      3.3.1 Irredundant Common Subwords / 48

      3.3.2 Underlying Subwords / 49

      3.3.3 Efficient Computation of Underlying Subwords / 50

      3.3.4 Extension to Inversions and Complements / 53

      3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53

      3.4 Experimental Results / 54

      3.4.1 Genome Data sets and Reference Taxonomies / 54

      3.4.2 Whole-Genome Phylogeny Reconstruction / 56

      3.5 Conclusion / 61

      Author’s Contributions / 62

      Acknowledgments / 62

      References / 62

      4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65
      Chengpeng Bi

      4.1 Introduction / 65

      4.2 Multiple Sequence Local Alignment / 67

      4.2.1 Overall Objective Function / 67

      4.2.2 Maximum Likelihood Model / 68

      4.3 Motif Finding Algorithms / 70

      4.3.1 DEM Motif Algorithm / 70

      4.3.2 WEM Motif Finding Algorithm / 70

      4.3.3 Metropolis Motif Finding Algorithm / 72

      4.3.4 Gibbs Motif Finding Algorithm / 73

      4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74

      4.4 Time Complexity / 75

      4.5 Case Studies / 75

      4.5.1 Performance Evaluation / 76

      4.5.2 CRP Binding Sites / 76

      4.5.3 Multiple Motifs in Helix–Turn–Helix Protein Structure / 78

      4.6 Conclusion / 80

      References / 81

      5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83
      Carl Barton, Tomáš Flouri, Costas S. Iliopoulos, and Solon P. Pissis

      5.1 Introduction / 83

      5.2 Definitions and Notation / 85

      5.3 Problem Definition / 87

      5.4 Algorithms / 88

      5.5 Conclusion / 94

      References / 95

      II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97

      6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99
      Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang

      6.1 Introduction / 99

      6.2 Representative Protein Secondary Structure Prediction Methods / 102

      6.2.1 Chou–Fasman / 103

      6.2.2 GOR / 104

      6.2.3 PHD / 104

      6.2.4 PSIPRED / 104

      6.2.5 SPINE-X / 105

      6.2.6 PSSpred / 105

      6.2.7 Meta Methods / 105

      6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106

      6.3.1 Measures / 106

      6.3.2 Benchmark / 106

      6.3.3 Performances / 107

      6.4 Conclusion / 110

      Acknowledgments / 110

      References / 111

      7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114
      Yann Guermeur and Fabien Lauer

      7.1 Introduction / 114

      7.2 Biological Sequence Segmentation / 115

      7.3 MSVMpred / 117

      7.3.1 Base Classifiers / 117

      7.3.2 Ensemble Methods / 118

      7.3.3 Convex Combination / 119

      7.4 Postprocessing with A Generative Model / 119

      7.5 Dedication to Protein Secondary Structure Prediction / 120

      7.5.1 Biological Problem / 121

      7.5.2 MSVMpred2 / 121

      7.5.3 Hidden Semi-Markov Model / 122

      7.5.4 Experimental Results / 122

      7.6 Conclusions and Ongoing Research / 125

      Acknowledgments / 126

      References / 126

      8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129
      Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih

      8.1 Introduction / 129

      8.2 A Few Basic Concepts / 130

      8.2.1 Hierarchy of Protein Structures / 130

      8.2.2 Secondary Structure Elements / 131

      8.2.3 Structural Motifs / 132

      8.2.4 Available Sources for Protein Data / 134

      8.3 State of the Art / 135

      8.3.1 Protein Structure Motif Search / 135

      8.3.2 Promotif / 136

      8.3.3 Secondary-Structure Matching / 137

      8.3.4 Multiple Structural Alignment by Secondary Structures / 138

      8.4 A Novel Geometrical Approach to Motif Retrieval / 138

      8.4.1 Secondary Structures Cooccurrences / 138

      8.4.2 Cross Motif Search / 143

      8.4.3 Complete Cross Motif Search / 146

      8.5 Implementation Notes / 149

      8.5.1 Optimizations / 149

      8.5.2 Parallel Approaches / 150

      8.6 Conclusions and Future Work / 151

      Acknowledgment / 152

      References / 152

      9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155
      Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang

      9.1 Introduction / 155

      9.2 Background / 156

      9.2.1 Noncoding RNAs and Their Secondary Structures / 156

      9.2.2 Pseudoknotted ncRNA Search Tools / 157

      9.3 Methodology / 157

      9.4 Results and Interpretation / 161

      9.5 Conclusion / 162

      References / 163

      III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165

      10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167
      Wajdi Dhifli and Engelbert Mephu Nguifo

      10.1 Introduction / 167

      10.2 From Protein 3D-Structures to Protein Graphs / 169

      10.2.1 Parsing Protein 3D-Structures into Graphs / 169

      10.3 Graph Mining / 172

      10.4 Subgraph Mining / 173

      10.5 Frequent Subgraph Discovery / 173

      10.5.1 Problem Definition / 174

      10.5.2 Candidates Generation / 176

      10.5.3 Frequent Subgraph Discovery Approaches / 177

      10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178

      10.6 Feature Selection / 179

      10.6.1 Relevance of a Feature / 179

      10.7 Feature Selection for Subgraphs / 180

      10.7.1 Problem Statement / 180

      10.7.2 Mining Top-k Subgraphs / 180

      10.7.3 Clustering-Based Subgraph Selection / 181

      10.7.4 Sampling-Based Approaches / 181

      10.7.5 Approximate Subgraph Mining / 181

      10.7.6 Discriminative Subgraph Selection / 182

      10.7.7 Other Significant Subgraph Selection Approaches / 182

      10.8 Discussion / 183

      10.9 Conclusion / 185

      Acknowledgments / 185

      References / 186

      11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190
      Chinua Umoja, Xiaxia Yu, and Robert Harrison

      11.1 Introduction / 190

      11.2 Genetic Algorithms / 192

      11.2.1 GA Model Selection in Protein Structure Prediction / 196

      11.2.2 Common Methodology / 198

      11.3 Supervised Machine Learning Algorithm / 201

      11.3.1 Artificial Neural Networks / 201

      11.3.2 ANNs in Protein Structure Prediction / 202

      11.3.3 Support Vector Machines / 203

      11.4 Fuzzy Application / 204

      11.4.1 Fuzzy Logic / 204

      11.4.2 Fuzzy SVMs / 204

      11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205

      11.4.4 Fuzzy Decision Trees / 206

      11.5 Conclusion / 207

      References / 208

      12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212
      Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat

      12.1 Introduction / 212

      12.2 Protein Structure Overview / 213

      12.3 Technical Challenges and Open Issues / 214

      12.4 Prediction Assessment / 215

      12.5 Current Approaches / 216

      12.5.1 DomCut / 216

      12.5.2 Scooby-Domain / 217

      12.5.3 FIEFDom / 218

      12.5.4 Chatterjee et al. (2009) / 219

      12.5.5 Drop / 219

      12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220

      12.6.1 Multi-Domain Benchmark Data Set / 220

      12.6.2 Compact Domain Profile / 221

      12.6.3 The Enhanced Semi-Parametric Model / 222

      12.6.4 Training, Testing, and Validation / 225

      12.6.5 Experimental Results / 226

      12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227

      12.7.1 Compositional Index / 228

      12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229

      12.7.3 Experimental Results / 230

      12.8 Conclusion / 232

      References / 233

      13 PREDICTION OF PROLINE CIS–TRANS ISOMERIZATION 236
      Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya

      13.1 Introduction / 236

      13.2 Methods / 238

      13.2.1 Evolutionary Data Set Construction / 238

      13.2.2 Protein Secondary Structure Information / 239

      13.2.3 Method I: Intelligent Voting / 239

      13.2.4 Method II: Randomized Meta-Learning / 241

      13.2.5 Model Validation and Testing / 242

      13.2.6 Parameter Tuning / 242

      13.3 Model Evaluation and Analysis / 243

      13.4 Conclusion / 245

      References / 245

      IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249

      14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251
      Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou

      14.1 Introduction / 251

      14.2 Protein Structure Prediction / 255

      14.2.1 Secondary Structure Prediction / 255

      14.2.2 Modeling of Tertiary Structure / 256

      14.3 Template-Based Predictions / 257

      14.3.1 Homology Modeling / 257

      14.3.2 Threading Methods / 257

      14.3.3 Ab initio Modeling / 257

      14.4 Critical Assessment of Protein Structure Prediction / 258

      14.5 Quaternary Structure Prediction / 258

      14.6 Conclusion / 261

      Acknowledgments / 261

      References / 261

      15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266
      Sheng-Lung Peng and Yu-Wei Tsay

      15.1 Introduction / 266

      15.2 Similarity in the Graph Model / 268

      15.2.1 Graph Model for Proteins / 270

      15.3 Measuring Structural Similarity VIA MCES / 272

      15.3.1 Problem Formulation / 273

      15.3.2 Constructing P-Graphs / 274

      15.3.3 Constructing Line Graphs / 276

      15.3.4 Constructing Modular Graphs / 276

      15.3.5 Maximum Clique Detection / 277

      15.3.6 Experimental Results / 277

      15.4 Protein Comparison VIA Graph Spectra / 279

      15.4.1 Graph Spectra / 279

      15.4.2 Matrix Selection / 281

      15.4.3 Graph Cospectrality and Similarity / 283

      15.4.4 Cospectral Comparison / 283

      15.4.5 Experimental Results / 284

      15.5 Conclusion / 287

      References / 287

      16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN–PROTEIN INTERACTIONS 291
      Mina Maleki, Michael Hall, and Luis Rueda

      16.1 Introduction / 291

      16.2 Structural Domains / 293

      16.3 The Prediction Framework / 293

      16.4 Feature Extraction and Prediction Properties / 294

      16.4.1 Physicochemical Properties / 296

      16.4.2 Domain-Based Properties / 298

      16.5 Feature Selection / 299

      16.5.1 Filter Methods / 299

      16.5.2 Wrapper Methods / 301

      16.6 Classification / 301

      16.6.1 Linear Dimensionality Reduction / 301

      16.6.2 Support Vector Machines / 303

      16.6.3 k-Nearest Neighbor / 303

      16.6.4 Naive Bayes / 304

      16.7 Evaluation and Analysis / 304

      16.8 Results and Discussion / 304

      16.8.1 Analysis of the Prediction Properties / 304

      16.8.2 Analysis of Structural DDIs / 307

      16.9 Conclusion / 309

      References / 310

      V PATTERN RECOGNITION IN MICROARRAYS 315

      17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317
      Hasan O¢gul

      17.1 Introduction / 317

      17.2 Information Retrieval: Terminology and Background / 318

      17.3 Content-Based Retrieval / 320

      17.4 Microarray Data and Databases / 322

      17.5 Methods for Retrieving Microarray Experiments / 324

      17.6 Similarity Metrics / 327

      17.7 Evaluating Retrieval Performance / 329

      17.8 Software Tools / 330

      17.9 Conclusion and Future Directions / 331

      Acknowledgment / 332

      References / 332

      18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335
      Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa

      18.1 Introduction / 335

      18.2 From Microarray Image to Signal / 336

      18.2.1 Signal from Oligo DNA Array Image / 336

      18.2.2 Signal from Two-Color cDNA Array / 337

      18.3 Microarray Signal Analysis / 337

      18.3.1 Absolute Analysis and Replicates in Microarrays / 338

      18.3.2 Microarray Normalization / 339

      18.4 Algorithms for De Gene Selection / 339

      18.4.1 Within–Between DE Gene (WB-DEG) Selection Algorithm / 340

      18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection Methods on Latin Square Data / 341

      18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343

      18.6 Conclusion / 345

      References / 345

      19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347
      Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi

      19.1 Introduction / 347

      19.2 Transcriptome Analysis / 348

      19.3 Microarrays / 349

      19.3.1 Applications / 349

      19.3.2 Microarray Technology / 350

      19.3.3 Microarray Workflow / 350

      19.4 RNA-Seq / 351

      19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353

      19.6 Gene Expression Profile Analysis / 356

      19.6.1 Data Definition / 356

      19.6.2 Data Analysis / 357

      19.6.3 Normalization and Background Correction / 357

      19.6.4 Genes Clustering / 359

      19.6.5 Experiment Classification / 361

      19.6.6 Software Tools for Gene Expression Profile Analysis / 362

      19.7 Real Case Studies / 364

      19.8 Conclusions / 367

      References / 368

      20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371
      Li Teng

      20.1 Introduction / 371

      20.2 Patterns with Similarity / 373

      20.2.1 Similarity Measurement / 374

      20.2.2 Clustering / 376

      20.2.3 Biclustering / 379

      20.2.4 Types of Biclusters / 380

      20.2.5 Measurement of the Homogeneity / 383

      20.2.6 Biclustering Algorithms with Different Searching Schemes / 387

      20.3 Conclusion / 391

      References / 391

      21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
      Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa

      21.1 Overview / 394

      21.1.1 Background Correction Methods / 395

      21.1.2 Normalization Methods / 396

      21.1.3 Literature Review / 397

      21.2 Arrow Plot / 399

      21.2.1 DE Genes Versus Special Genes / 399

      21.2.2 Definition and Properties of the ROC Curve / 400

      21.2.3 AUC and Degenerate ROC Curves / 401

      21.2.4 Overlapping Coefficient / 402

      21.2.5 Arrow Plot Construction / 403

      21.3 Significance Analysis of Microarrays / 404

      21.4 Correspondence Analysis / 405

      21.4.1 Basic Principles / 405

      21.4.2 Interpretation of CA Maps / 406

      21.5 Impact of the Preprocessing Methods / 407

      21.5.1 Class Prediction Context / 408

      21.5.2 Class Comparison Context / 408

      21.6 Conclusions / 412

      Acknowledgments / 413

      References / 413

      VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417

      22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419
      David A. Morrison

      22.1 Introduction / 419

      22.2 Networks and Trees / 420

      22.3 Patterns and Their Processes / 424

      22.4 The Types of Patterns / 427

      22.5 Fingerprints / 431

      22.6 Constructing Networks / 433

      22.7 Multi-Labeled Trees / 435

      22.8 Conclusion / 436

      References / 437

      23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439
      Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas

      23.1 Introduction / 440

      23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440

      23.2.1 Inferring Gene Trees / 441

      23.2.2 Inferring Species Trees / 442

      23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445

      23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446

      23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447

      23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449

      23.7 Concluding Remarks / 449

      Acknowledgments / 450

      References / 450

      24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457
      David Dao, Tomáš Flouri, and Alexandros Stamatakis

      24.1 Introduction / 457

      24.2 Preliminaries / 459

      24.3 A Naïve Approach / 462

      24.4 Toward a Faster Method / 463

      24.5 Improved Algorithm / 467

      24.5.1 Preprocessing / 467

      24.5.2 Computing Lowest Common Ancestors / 468

      24.5.3 Constructing the Induced Tree / 468

      24.5.4 Final Remarks / 471

      24.6 Implementation / 473

      24.6.1 Preprocessing / 473

      24.6.2 Reconstruction / 473

      24.6.3 Extracting Bipartitions / 474

      24.7 Evaluation / 474

      24.7.1 Test Data Sets / 474

      24.7.2 Experimental Results / 475

      24.8 Conclusion / 479

      Acknowledgment / 481

      References / 481

      25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483
      Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov

      25.1 Introduction / 483

      25.2 Methods / 485

      25.2.1 Clustering Using Variability Functions / 485

      25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm / 487

      25.2.3 Description of the New Algorithm / 488

      25.2.4 Time Complexity / 491

      25.3 Experimental Study / 491

      25.3.1 Implementation / 491

      25.3.2 Synthetic Data / 491

      25.3.3 Real Prokaryotic (Genomic) Data / 495

      25.4 Results and Discussion / 501

      25.4.1 Analysis of Synthetic Data / 501

      25.4.2 Analysis of Prokaryotic Data / 502

      25.5 Conclusion / 502

      References / 503

      VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505

      26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507
      Christos Makris and Evangelos Theodoridis

      26.1 Introduction / 507

      26.2 Measures/Metrics / 508

      26.3 Models of Biological Networks / 511

      26.4 Reconstructing and Partitioning Biological Networks / 511

      26.5 PPI Networks / 513

      26.6 Mining PPI Networks—Interaction Prediction / 517

      26.7 Conclusions / 519

      References / 519

      27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525
      Andrej Aderhold, V Anne Smith, and Dirk Husmeier

      27.1 Introduction / 525

      27.2 Molecular Systems / 528

      27.3 Ecological Systems / 528

      27.4 Models and Evaluation / 529

      27.4.1 Notations / 529

      27.4.2 Sparse Regression and the LASSO / 530

      27.4.3 Bayesian Regression / 530

      27.4.4 Evaluation Metric / 531

      27.5 Learning Gene Regulation Networks / 532

      27.5.1 Nonhomogeneous Bayesian Regression / 533

      27.5.2 Gradient Estimation / 534

      27.5.3 Simulated Bio-PEPA Data / 534

      27.5.4 Real mRNA Expression Profile Data / 535

      27.5.5 Method Evaluation and Learned Networks / 536

      27.6 Learning Species Interaction Networks / 540

      27.6.1 Regression Model of Species interactions / 540

      27.6.2 Multiple Global Change-Points / 541

      27.6.3 Mondrian Process Change-Points / 542

      27.6.4 Synthetic Data / 544

      27.6.5 Simulated Population Dynamics / 544

      27.6.6 Real World Plant Data / 546

      27.6.7 Method Evaluation and Learned Networks / 546

      27.7 Conclusion / 550

      References / 550

      28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555
      Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis

      28.1 Introduction / 555

      28.2 Toll-Like Receptors / 557

      28.2.1 Basics / 557

      28.2.2 Structure and Signaling of TLRs / 558

      28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559

      28.3 Structural Equation Modeling / 560

      28.3.1 Methodology of SEM Modeling / 560

      28.3.2 Assumptions / 561

      28.3.3 Estimation Methods / 562

      28.3.4 Missing Data / 562

      28.3.5 Goodness-of-Fit Indices / 563

      28.3.6 Other Indications of a Misspecified Model / 565

      28.4 Application / 566

      28.5 Conclusion / 580

      References / 581

      29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585
      Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi

      29.1 Introduction / 585

      29.2 Related Work / 587

      29.3 Problem Formulation / 589

      29.3.1 The Algorithm / 591

      29.4 Experimental Setup / 592

      29.4.1 Data sets / 592

      29.4.2 Comparative Methods / 593

      29.4.3 Experimental Protocol / 594

      29.4.4 Evaluation Criteria / 594

      29.5 Experimental Analysis / 596

      29.5.1 Replenishing Missing Functions / 596

      29.5.2 Predicting Unlabeled Proteins / 600

      29.5.3 Component Analysis / 604

      29.5.4 Run Time Analysis / 604

      29.6 Conclusions / 605

      Acknowledgments / 606

      References / 606

      INDEX 609

    Recently viewed products

    © 2026 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account