Creative Biolabs bridges the gap between strain phenotype and molecular mechanism. Our service utilizes Whole Genome Sequencing (WGS) and advanced bioinformatic mining to identify Biosynthetic Gene Clusters (BGCs)—RiPPs, NRPS, PKS, and more—predicting the structure and prioritizing the cryptic metabolites responsible for your strain's efficacy.
A common bottleneck in Live Biotherapeutic Product (LBP) development is the "Phenotype-Genotype Gap." Researchers often isolate a strain with a confirmed phenotype—such as pathogen inhibition, immunomodulation, or gut barrier enhancement—but remain unaware of the specific bioactive molecules driving these effects. Without identifying the causative agent, MoA determination, intellectual property protection, and safety assessments (e.g., cytotoxicity screening) remain incomplete.
Microbial secondary metabolites are synthesized by colocalized groups of genes known as Biosynthetic Gene Clusters (BGCs). However, many of these clusters are "cryptic" or silent under standard laboratory conditions. Creative Biolabs addresses this by employing a rigorous in silico mining approach. We move beyond simple annotation to predict the chemical classes (e.g., Polyketides, Non-ribosomal peptides) and structures of potential metabolites, providing a prioritized roadmap for downstream metabolomic validation or heterologous expression.
| BGC Type | Enzymatic Machinery | Potential Bioactivity | Drug Relevance |
|---|---|---|---|
| NRPS (Non-ribosomal Peptide Synthetases) | Modular megasynthases independent of ribosomes | Antibiotics, Immunosuppressants, Siderophores | Vancomycin, Cyclosporine |
| PKS (Polyketide Synthases) | Type I, II, and III synthases using acyl-CoA precursors | Antibiotics, Antifungals, Antitumor agents | Erythromycin, Doxorubicin |
| RiPPs (Ribosomally Synthesized and Post-translationally Modified Peptides) | Ribosomal synthesis followed by extensive modification | Bacteriocins, Lantibiotics, Lasso peptides | Nisin, Thiostrepton |
| Terpenes | Terpene synthases/cyclases | Anti-inflammatory, Antimicrobial | Artemisinin, Taxol |
From raw genomic data to prioritized chemical candidates, our workflow is designed to uncover the hidden chemical arsenal of your probiotic strains.
Accurate BGC mining requires high-quality genome assemblies. Fragmented genomes often break large gene clusters (like NRPS/PKS which can exceed 50kb) into separate contigs, making identification impossible. We perform Whole Genome Sequencing (WGS) using hybrid approaches (short-read Illumina for accuracy + long-read PacBio/Nanopore for continuity) to generate scaffold-level or complete circularized genomes, ensuring full cluster integrity.
We utilize industry-standard and cutting-edge mining tools including antiSMASH, PRISM, and BAGEL4 to scan the genome. Our pipeline identifies core biosynthetic genes and auxiliary tailoring enzymes (halogenases, glycosyltransferases). We screen for diverse cluster types:
Identification is only the first step. We perform advanced in silico structural prediction to estimate the final chemical scaffold. By analyzing the specificity-conferring domains (e.g., Adenylation domains in NRPS, Acyltransferase domains in PKS), we predict the monomer incorporation sequence. We also perform "ClusterBlast" analyses to compare identified BGCs against the MIBiG database, determining if the cluster produces a known compound (dereplication) or a novel analog.
Not all BGCs are active or relevant. We deliver a ranked "Target List" based on:
1. Novelty Score: Similarity to known clusters.
2. Completeness: Presence of all essential transport and regulatory genes.
3. Safety Profile: Screening for known toxin-associated domains (e.g., colibactin-like).
4. Link-to-Phenotype Likelihood: Bioinformatic correlation with observed strain bioactivity (e.g., antimicrobial, immunomodulatory).
5. Detectability: Feasibility of downstream LC-MS capture based on predicted physiochemical properties (e.g., m/z clues).
Our service provides a structured, actionable data package designed to accelerate your R&D decision-making.
A comprehensive list of all identified clusters including genomic coordinates, cluster type (RiPPs, PKS, etc.), completeness status, and key biosynthetic gene annotations.
Predicted chemical class and structural scaffolds. Includes homology analysis against known compounds to distinguish novel candidates from known analogs.
Ranking of candidates based on multi-dimensional scoring: Novelty, Safety, Link-to-Phenotype, and Detectability/LC-MS suitability.
Recommendations for next steps, including specific metabolomic targets (m/z values) and genetic knockout/heterologous expression strategies for verification.
A streamlined path from biological sample to predicted chemical structure.
Submit gDNA or live bacterial culture. QC performed to ensure purity and integrity.
Sequencing and de novo assembly into high-continuity scaffolds suitable for mining.
Application of rule-based and ML-based algorithms (antiSMASH, DeepBGC) to locate clusters.
Domain analysis, substrate prediction, and comparative homology search (Dereplication).
Delivery of BGC inventory, structural hypotheses, and recommended validation assays.
Our analysis pipeline incorporates state-of-the-art detection logic referenced in leading genomic studies. For example, modern iterations of antiSMASH (as detailed in Nucleic Acids Research) have significantly improved the detection of thiopeptides, NRPS/PKS hybrids, and RiPPs through updated profile Hidden Markov Models (pHMMs).
The visualization demonstrates the complexity of BGC domain architecture. In the analysis of the kirromycin cluster (pictured), specific modules responsible for loading, extension, and termination are clearly delineated.
This level of modular resolution allows us to predict the stepwise assembly of the metabolite. By mapping the specificity of Adenylation (A) and Acyltransferase (AT) domains, we can infer the amino acid or acyl-CoA building blocks, generating a hypothetical chemical structure even before the compound is physically isolated in the lab. This "genome-first" approach is critical for prioritizing strains that possess the genetic capacity for novel antimicrobial or immunomodulatory production.
Fig.1 The NRPS/PKS domain view of the kirromycin biosynthetic gene cluster.1,2
We cross-reference findings against MIBiG, DoBISCUIT, and proprietary internal datasets to ensure high-confidence dereplication and novelty assessment.
Our integrated wet-lab capability ensures we generate the long-read data necessary to assemble complex, repetitive PKS/NRPS clusters that short-read data often misses.
We don't just list genes; we predict chemical structures. This allows you to target specific masses in LC-MS validation, turning "big data" into a precise analytical search.
Identify the specific bacteriocin or small molecule responsible for pathogen exclusion or immune modulation. Linking a specific BGC to a phenotype strengthens regulatory submissions and patent claims.
Proactively screen candidate strains for BGCs encoding known toxins (e.g., colibactin, tilivalline) or antibiotic resistance mechanisms. This "safety-by-design" approach prevents costly failures in late-stage preclinical toxicity studies.
Mine the microbiome for novel antimicrobials effective against multi-drug resistant pathogens. Our structural prediction helps identify scaffolds with novel mechanisms of action distinct from existing antibiotic classes.
Rapidly determine if your bioactive strain is producing a known, off-patent compound or a novel chemical entity (NCE). This early insight is crucial for making "Go/No-Go" decisions on commercial development.
A "silent" or "cryptic" BGC is a gene cluster present in the genome but not expressed (transcribed/translated) under standard laboratory culture conditions. This means the bacteria has the potential to make the compound, but isn't currently making it. Mining identifies these potentials. To make them "active," we may need to alter culture conditions (OSMAC approach), apply stress triggers, or use heterologous expression hosts.
While possible, it is risky. BGCs, especially PKS and NRPS, are often large (>50kb) and contain repetitive sequences. Short-read sequencing often breaks these clusters across multiple contigs, leading to incomplete annotation and failed structure prediction. We strongly recommend hybrid sequencing (Short + Long reads) or high-depth long-read sequencing to generate scaffold-level or complete genomes for reliable BGC mining.
We use a process called "Dereplication." By comparing the sequence of your identified BGC against comprehensive databases like MIBiG (Minimum Information about a Biosynthetic Gene cluster), we calculate a similarity score. High homology to a known cluster suggests the production of a known compound. Low homology, particularly in the core biosynthetic genes, suggests a potential novel analog or a new chemical entity.
Our pipeline covers the major classes of microbial secondary metabolites, including Polyketides (Type I, II, III), Non-ribosomal peptides (NRPs), Ribosomally synthesized and post-translationally modified peptides (RiPPs such as lanthipeptides, bacteriocins, lassopeptides), Terpenes, Siderophores, and Saccharides.
This specific service focuses on the in silico prediction and prioritization (Genomics). However, Creative Biolabs offers downstream "wet-lab" services including fermentation optimization, metabolomics (LC-MS/MS), and compound purification. The BGC mining report serves as the blueprint to guide these downstream isolation efforts efficiently.
For Research Use Only. Not intended for use in food manufacturing or medical procedures (diagnostics or therapeutics). Do Not Use in Humans.
Copyright © 2026 Creative Biolabs. All Rights Reserved.