Open Phage Data Sheet

This data sheet is cached + generated from the original Open Phage Google Sheet. Please contribute if you can! Every bit helps! https://open.phage.directory/sheet
(Original Google sheets link: https://docs.google.com/spreadsheets/d/1fhBigiisdCc8-YWD4K8U6yvXK47p67KzA7JfAWsh-Iw/edit#gid=1571383366 ).

To access the data (for bioinformatics, scraping, LLMs, etc.) use: https://open.phage.directory/api — the results are updated and cached every so often.

Code can be found at: https://github.com/janzheng/openphage — the results are updated and cached every so often. This project relies on Hono and SpreadAPI to work.

Table of Contents

Bioinformatics

_idLast ModifiedNameCategoryNotesAuthorURLCitationSourceImageVersionDeveloped ByLicenseLanguage/FrameworkInput FormatOutput FormatDate of Release
22023-11-13T11:12:18.883ZdrVMAssembly TooldrVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomeshttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5466706/Rob Edwards' Viral Bioinfo Tools
32023-11-13T11:12:18.883ZGenomeDetective VirusAssembly Toolhttps://www.genomedetective.com/app/typingtool/virus/Rob Edwards' Viral Bioinfo Tools
42023-11-13T11:12:18.883ZIVAAssembly Toolde-novo assembly, needs to be incorporated in pipeline with host sequence removal, e.g., shiverhttp://sanger-pathogens.github.io/iva/Rob Edwards' Viral Bioinfo Tools
52023-11-13T11:12:18.883ZIVARAssembly ToolDesigned for mapping-based "assembly" of amplicon sequencing datahttps://github.com/andersen-lab/ivarRob Edwards' Viral Bioinfo Tools
62023-11-13T11:12:18.883ZmetaViralSpadesAssembly Toolhttps://academic.oup.com/bioinformatics/article-abstract/36/14/4126/5837667Rob Edwards' Viral Bioinfo Tools
72023-11-13T11:12:18.883ZrnaViralSpadesAssembly Toolhttps://www.biorxiv.org/content/10.1101/2020.07.28.224584v1Rob Edwards' Viral Bioinfo Tools
82023-11-13T11:12:18.883ZsavageAssembly Toolhttps://bitbucket.org/jbaaijens/savage/src/master/Rob Edwards' Viral Bioinfo Tools
92023-11-13T11:12:18.883Zv-pipeAssembly Toolhttps://github.com/cbg-ethz/V-pipe/tree/Rob Edwards' Viral Bioinfo Tools
102023-11-13T11:12:18.883ZvicunaAssembly Toolhttps://www.broadinstitute.org/viral-genomics/vicunaRob Edwards' Viral Bioinfo Tools
112023-11-13T11:12:18.883ZVIPAssembly ToolPhage VIrion Protein classification based on chaos game representation and Vision Transformer; Both |https://github.com/KennthShang/PhaVIP; https://github.com/keylabivdc/VIP/https://www.nature.com/articles/srep23774Rob Edwards' Viral Bioinfo Tools; Rob Edwards' Viral Bioinfo Tools
122023-11-13T11:12:18.883Zviral-ngsAssembly Toolhttps://viral-ngs.readthedocs.io/en/latest/Rob Edwards' Viral Bioinfo Tools
132023-11-13T11:12:18.883ZVirusTAPAssembly ToolWEBSERVER - No option to registerhttps://gph.niid.go.jp/virustap/system_inRob Edwards' Viral Bioinfo Tools
142023-11-13T11:12:18.883ZChoice of assembly software has a critical impact on virome characterisationBenchmarkPhage assembly benchmarkhttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5/tables/1Rob Edwards' Viral Bioinfo Tools
152023-11-13T11:12:18.883ZEvaluation of computational phage detection tools for metagenomic datasetsBioinformaticsPhage detection in metagenomes tools benchmarkhttps://www.frontiersin.org/articles/10.3389/fmicb.2023.1078760/fullRob Edwards' Viral Bioinfo Tools
162023-11-13T11:12:18.883ZMaGplotRCRISPRVirus | CRISPR Screens |https://github.com/alematia/MaGplotRhttps://www.biorxiv.org/content/10.1101/2023.01.12.523725v1Rob Edwards' Viral Bioinfo Tools20230112
172023-11-13T11:12:18.883ZSpacePHARERCRISPRPhage | CRISPR Spacer Phage-Host Pair Finder |spacepharer.soedinglab.orghttps://www.biorxiv.org/content/10.1101/2020.05.15.090266v1Rob Edwards' Viral Bioinfo Tools20220906
182023-11-13T11:12:18.883ZBVBRCCyberinfrastructure-supported virus toolsBoth | Website |https://bitbucket.org/srouxjgi/iphophttp://bvbrc.orgRob Edwards' Viral Bioinfo ToolsActively developed
192023-11-13T11:12:18.883ZiVirus 2.0Cyberinfrastructure-supported virus toolsPhage | integrating iVirus apps on CyVerse and KBase |CyVerse ( KBase (https://www.nature.com/articles/s43705-021-00083-3 http://tinyurl.com/4ndkt4n2), https://kbase.us/applist/)Rob Edwards' Viral Bioinfo Tools
202023-11-13T11:12:18.883ZPhageAIData repository, life cycle, taxonomy and proteins structure prediction, phage similarity, phage annotationPhage | NLP, ML |https://www.biorxiv.org/content/10.1101/2020.07.11.198606v1 https://app.phage.ai/Rob Edwards' Viral Bioinfo Tools
212024-01-05T17:21:26.566ZDePPDepolymerase finderPhagehttps://timskvortsov.github.io/WebDePP/https://doi.org/10.1186%2Fs12859-023-05341-wRob Edwards' Viral Bioinfo Tools
222023-11-13T11:12:18.883ZPhageDPODepolymerase finderPhage | SVM and ANNbit.ly/phagedpoRob Edwards' Viral Bioinfo Tools2022
232023-11-13T11:12:18.883ZOLGenieDiversity and selection analysisBoth | Program for estimating dN/dS in overlapping genes (OLGs); inferring purifying selection in alternative reading frames; intrahost; within-host; evolution; selection; nucleotide diversity |https://github.com/chasewnelson/OLGeniehttps://academic.oup.com/mbe/article/37/8/2440/5815567Rob Edwards' Viral Bioinfo Tools20221202
242023-11-13T11:12:18.883ZSNPGenieDiversity and selection analysisBoth | Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data; intrahost; within-host; evolution; selection; nucleotide diversity |https://github.com/chasewnelson/snpgeniehttps://academic.oup.com/bioinformatics/article/31/22/3709/241742Rob Edwards' Viral Bioinfo Tools20230822
252023-11-13T11:12:18.883ZVCFgenieDiversity and selection analysisBoth | Program for reproducibly filtering VCF files and eliminating false positive variants; intrahost; within-host; evolution; selection; nucleotide diversity | In revisionhttps://github.com/chasewnelson/VCFgenieRob Edwards' Viral Bioinfo Tools20220825
262023-11-13T11:12:18.883ZVIPERAEvolutionary analysisVirus | Phylogenetic and population genetics-based analysis of intra-patient SARS-CoV-2 |https://github.com/PathoGenOmics-Lab/VIPERAhttps://doi.org/10.1101/2023.10.24.561010Rob Edwards' Viral Bioinfo Tools20231108
272023-11-13T11:12:18.883ZMetaCerberusGenome and virome annotationBoth | HMM-based with Ray MPP |https://github.com/raw-lab/MetaCerberushttps://www.biorxiv.org/content/10.1101/2023.08.10.552700v1Rob Edwards' Viral Bioinfo Tools2023
282023-11-13T11:12:18.883ZDRAMvGenome annnotationPhage | Distilling and refining annotation of metabolism |https://github.com/WrightonLabCSU/DRAMhttps://academic.oup.com/nar/article/48/16/8883/5884738Rob Edwards' Viral Bioinfo Tools2023
292023-11-13T11:12:18.883ZPhANNsGenome annnotationPhage |PhANNshttps://journals.plos.org/ploscompbiol/article/authors?id=10.1371/journal.pcbi.1007845Rob Edwards' Viral Bioinfo Tools
302023-11-13T11:12:18.883ZPharokkaGenome annnotationPhage |https://github.com/gbouras13/pharokkahttps://doi.org/10.1093/bioinformatics/btac776Rob Edwards' Viral Bioinfo Tools20230124
312023-11-13T11:12:18.883ZcoronaSPAdesGenome assemblyBoth | HMM-synteny guided assembly (works for all viruses) |https://github.com/ablab/spades/tree/metaviral_publicationhttps://academic.oup.com/bioinformatics/article/38/1/1/6354349Rob Edwards' Viral Bioinfo Tools
322023-11-13T11:12:18.883ZmetaviralSPAdesGenome assemblyBoth | MetaviralSPAdes: assembly of viruses from metagenomic data | Bioinformatics | Oxford Academichttps://github.com/ablab/spades/tree/metaviral_publicationRob Edwards' Viral Bioinfo Tools
332023-11-13T11:12:18.883ZmulitPHATEGenome comparisonPhagehttps://github.com/carolzhou/multiPhATERob Edwards' Viral Bioinfo Tools
342023-11-13T11:12:18.883ZPhageCloudsGenome comparisonPhage | network graphs |https://doi.org/10.1089/phage.2021.0008Rob Edwards' Viral Bioinfo Tools
352023-11-13T11:12:18.883ZCheckVGenome completenessBoth |; CheckV: assessing the quality of metagenome-assembled viral genomeshttps://bitbucket.org/berkeleylab/checkv/src/master/https://www.biorxiv.org/content/10.1101/2020.05.06.081778v1Rob Edwards' Viral Bioinfo Tools20220906
362023-11-13T11:12:18.883ZviralCompleteGenome completenessBothhttps://github.com/ablab/viralComplete/Rob Edwards' Viral Bioinfo Tools
372023-11-13T11:12:18.883ZviralVerifyGenome completenessBothhttps://github.com/ablab/viralVerify/Rob Edwards' Viral Bioinfo Tools
382023-11-13T11:12:18.883ZBacteriophageHostPredictionHost predictionPhage |https://github.com/dimiboeckaerts/BacteriophageHostPredictionhttps://www.nature.com/articles/s41598-021-81063-4Rob Edwards' Viral Bioinfo Tools
392023-11-13T11:12:18.883ZCHERRYHost predictionPhage |https://github.com/KennthShang/CHERRYhttps://academic.oup.com/bib/article/23/5/bbac182/6589865Rob Edwards' Viral Bioinfo Tools
402023-11-13T11:12:18.883ZCrisprOpenDBHost predictionPhage |https://github.com/edzuf/CrisprOpenDBhttps://doi.org/10.1093/nar/gkab133Rob Edwards' Viral Bioinfo Tools
412023-11-13T11:12:18.883ZDeePaCHost predictionBoth | CNN, ResNet, Shapley values (interpretability) |https://academic.oup.com/nargab/article/3/1/lqab004/6125551, https://academic.oup.com/bib/article/22/6/bbab269/6326527, https://academic.oup.com/bioinformatics/article/38/Supplement_2/ii168/6702016 https://gitlab.com/dacs-hpi/deepacRob Edwards' Viral Bioinfo Tools20221216
422023-11-13T11:12:18.883ZDeePaC-LiveHost predictionBoth | ResNet |https://academic.oup.com/bib/article/22/6/bbab269/6326527 https://gitlab.com/dacs-hpi/deepac-liveRob Edwards' Viral Bioinfo Tools20210123
432023-11-13T11:12:18.883ZDeepHostHost predictionPhage | CNN | **Description:** DeepHost is a phage host prediction tool.; DeepHost is a phage host prediction tool.https://github.com/deepomicslab/DeepHost https://github.com/deepomicslab/DeepHost; https://github.com/deepomicslab/DeepHosthttps://academic.oup.com/bib/article-abstract/23/1/bbab385/6374063?redirectedFrom=fulltextRob Edwards' Viral Bioinfo Tools; Phage Kitchen20220804
442023-11-13T11:12:18.883ZHostGHost predictionPhage | GCN |https://github.com/KennthShang/HostGhttps://bmcbiol.biomedcentral.com/articles/10.1186/s12915-021-01180-4Rob Edwards' Viral Bioinfo Tools20220316
452023-11-13T11:12:18.883ZHostPhinderHost predictionPhage | k-mers |https://github.com/julvi/HostPhinderhttps://pubmed.ncbi.nlm.nih.gov/27153081/Rob Edwards' Viral Bioinfo Tools20200902
462023-11-13T11:12:18.883ZINFH-VHHost predictionPhage |https://github.com/liudan111/ILMF-VHhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3082-0Rob Edwards' Viral Bioinfo Tools
472023-11-13T11:12:18.883ZiPHoPHost predictionPhage |https://www.biorxiv.org/content/10.1101/2022.07.28.501908v1.abstractRob Edwards' Viral Bioinfo Tools
482023-11-13T11:12:18.883ZMVPHost predictionBoth |https://academic.oup.com/nar/article/46/D1/D700/4643372?login=true http://mvp.medgenius.info/homeRob Edwards' Viral Bioinfo Tools
492023-11-13T11:12:18.883ZPHERIHost predictionPhage | PHERIhttps://github.com/andynet/pheriRob Edwards' Viral Bioinfo Tools
502023-11-13T11:12:18.883ZPHIAFHost predictionPhage | GAN |https://github.com/BioMedicalBigDataMiningLab/PHIAFhttps://academic.oup.com/bib/article-abstract/23/1/bbab348/6362109Rob Edwards' Viral Bioinfo Tools
512023-11-13T11:12:18.883ZPHISDetectorHost predictionPhagehttp://www.microbiome-bigdata.com/PHISDetector/index/Rob Edwards' Viral Bioinfo Tools
522023-11-13T11:12:18.883ZPHISTHost predictionPhage | k-mers |https://github.com/refresh-bio/phisthttps://academic.oup.com/bioinformatics/article/38/5/1447/6460800Rob Edwards' Viral Bioinfo Tools
532023-11-13T11:12:18.883ZPHPHost predictionPhagehttps://github.com/congyulu-bioinfo/PHPRob Edwards' Viral Bioinfo Tools
542023-11-13T11:12:18.883ZPredPHIHost predictionPhagehttps://github.com/xialab-ahu/PredPHIRob Edwards' Viral Bioinfo Tools
552023-11-13T11:12:18.883ZRaFaHHost predictionPhage |https://www.sciencedirect.com/science/article/pii/S2666389921001008 https://sourceforge.net/projects/rafah/Rob Edwards' Viral Bioinfo Tools
562023-11-13T11:12:18.883ZvHulkHost predictionPhage | **Description:** **Phage Host Prediction using high level features and neural networks** Metagenomics and sequencing techniques have greatly improved in these last five years and, as a consequence, the amount of data from microbial communities is astronomic. An import part of the microbial community are phages, which have their own ecological roles in the environment. Besides that, they have also been given a possible human relevant (clinical) role as terminators of multidrug resistant bacterial infections. A lot of basic research still need to be done in the Phage therapy field, and part of this research involves gathering knowledge from new phages present in the environment as well as about their relationship with clinical relevant bacterial pathogens. Having this scenario in mind, we have developed vHULK. A user-friendly tool for prediction of phage hosts given their complete or partial genome in FASTA format. Our tool outputs an ensemble prediction at the genus or species level based on scores of four different neural network models. Each model was trained with more than 4,000 genomes whose phage-host relationship was known. v.HULK also outputs a mesure of entropy for each final prediction, which we have demonstrated to be correlated with prediction's accuracy. The user might understand this value as additional information of how certain v.HULK is about a particular prediction. We also suspect that phages with higher entropy values may have a broad host-range. But that hypothesis is to be tested later. Accuracy results in test datasets were >99% for predictions at the genus level and >98% at the species level. vHULK currently supports predictions for 52 different prokaryotic host species and 61 different genera.nan https://github.com/LaboratorioBioinformatica/vHULKhttps://www.biorxiv.org/content/10.1101/2020.12.06.413476v1 https://www.biorxiv.org/content/10.1101/2020.12.06.413476v1.fullRob Edwards' Viral Bioinfo Tools
572023-11-13T11:12:18.883ZVIDHOPHost predictionBoth | Deep learning |https://github.com/flomock/vidhophttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7454304/Rob Edwards' Viral Bioinfo Tools
582023-11-13T11:12:18.883ZVirHostMatcherHost predictionPhage | oligonucleotide frequency based distance and dissimilarity measures |https://github.com/jessieren/VirHostMatcherhttps://pubmed.ncbi.nlm.nih.gov/27899557/Rob Edwards' Viral Bioinfo Tools
592023-11-13T11:12:18.883ZVirHostMatcher-NetHost predictionVirus | **Description:** Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus–prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures (⁠s∗2 and WIsH). Evaluation of this method on a benchmark set of 1462 known virus–prokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16–27% and 6–10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n > 27 000), greatly expanding the diversity of known virus–host interactions.; Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus‚Äìprokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures (‚ņs‚àó2 and WIsH). Evaluation of this method on a benchmark set of 1462 known virus‚Äìprokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16‚Äì27% and 6‚Äì10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n > 27 000), greatly expanding the diversity of known virus‚Äìhost interactions.https://github.com/WeiliWw/VirHostMatcher-Net https://github.com/WeiliWw/VirHostMatcher-Net; https://github.com/WeiliWw/VirHostMatcher-Nethttps://academic.oup.com/nargab/article/2/2/lqaa044/5861484?login=true https://academic.oup.com/nargab/article/2/2/lqaa044/5861484; https://academic.oup.com/nargab/article/2/2/lqaa044/5861484Rob Edwards' Viral Bioinfo Tools; Phage Kitchenhttps://trello.com/1/cards/61948b43aa653c636dd10832/attachments/61ba828f6d63dd22924262d0/download/image.png
602023-11-13T11:12:18.883ZVirMatcherHost predictionPhage | Leveraging multiple methods and assigning a confidence score |https://bitbucket.org/MAVERICLab/virmatcher/src/master/https://www.cell.com/cell-host-microbe/fulltext/S1931-3128(20)30456-XRob Edwards' Viral Bioinfo Tools20220429
612023-11-13T11:12:18.883ZVirus Host DBHost predictionBoth |https://pubmed.ncbi.nlm.nih.gov/26938550/ https://www.genome.jp/virushostdb/Rob Edwards' Viral Bioinfo Tools
622023-11-13T11:12:18.883ZVirus Host PredictHost predictionBothhttps://github.com/youngfran/virus_host_predictRob Edwards' Viral Bioinfo Tools
632023-11-13T11:12:18.883ZWIsHHost predictionPhage |https://github.com/soedinglab/WIsHhttps://academic.oup.com/bioinformatics/article/33/19/3113/3964377#:~:text=WIsH%20predicts%20prokaryotic%20hosts%20of,3%20kbp%2Dlong%20phage%20contigs.Rob Edwards' Viral Bioinfo Tools
642023-11-13T11:12:18.883ZReadItAndKeepHost Removal Toolhttps://github.com/GenomePathogenAnalysisService/read-it-and-keepRob Edwards' Viral Bioinfo Tools
652023-11-13T11:12:18.883ZshiverHost Removal Toolhttps://github.com/ChrisHIV/shiverRob Edwards' Viral Bioinfo Tools
662023-11-13T11:12:18.883ZDRADIdentify Integrated VirusesPhage | Dinucleotide Relative Abundance difference |Does not exist any morehttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0001193Rob Edwards' Viral Bioinfo ToolsNone
672023-11-13T11:12:18.883ZgeNomadIdentify Integrated VirusesBothhttps://github.com/apcamargo/genomadRob Edwards' Viral Bioinfo Tools20221015
682023-11-13T11:12:18.883ZhafeZIdentify Integrated VirusesPhage | Readmapping |https://github.com/Chrisjrt/hafeZhttps://www.biorxiv.org/content/10.1101/2021.07.21.453177v1Rob Edwards' Viral Bioinfo Tools20211004
692023-11-13T11:12:18.883ZLysoPhDIdentify Integrated VirusesPhage |No code availablehttps://ieeexplore.ieee.org/document/8983280Rob Edwards' Viral Bioinfo ToolsNone
702023-11-13T11:12:18.883Zphage_finderIdentify Integrated VirusesPhage |https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635311/ http://phage-finder.sourceforge.net/Rob Edwards' Viral Bioinfo Tools
712023-11-13T11:12:18.883ZphageboostIdentify Integrated VirusesPhage | boost ml |https://www.biorxiv.org/content/10.1101/2020.08.09.243022v1 http://phageboost.mlRob Edwards' Viral Bioinfo Tools
722023-11-13T11:12:18.883ZPhageWebIdentify Integrated VirusesPhage |https://www.frontiersin.org/articles/10.3389/fgene.2018.00644/full http://computationalbiology.ufpa.br/phageweb/Rob Edwards' Viral Bioinfo Tools
732023-11-13T11:12:18.883ZPHASTERIdentify Integrated VirusesPhage | **Description:** PHASTER (PHAge Search Tool Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. While the steps in the phage identification pipeline in PHASTER remain largely the same as in the original PHAST, numerous software improvements and significant hardware enhancements have now made PHASTER faster, more efficient, more visually appealing and much more user friendly. In particular, PHASTER is now 4.3X faster than PHAST when analyzing a typical bacterial genome. More specifically, software optimizations have made the backend of PHASTER 2.7X faster than PHAST. Likewise, the addition of more than 120 CPUs to the PHASTER compute cluster have greatly reduced processing times. PHASTER can now process a typical bacterial genome in 3 minutes from the raw sequence alone, or in 1.5 minutes when given a pre-annotated GenBank file. A number of other optimizations have been implemented, including automated algorithms to reduce the size and redundancy of PHASTER’s databases, improvements in handling multiple (metagenomic) queries and high user traffic, and the ability to perform automated look-ups against >14,000 previously PHAST/PHASTER annotated bacterial genomes (which can lead to complete phage annotations in seconds as opposed to minutes). PHASTER’s web interface has also been entirely rewritten. A new graphical genome browser has been added, gene/genome visualization tools have been improved, and the graphical interface is now more modern, robust, and user-friendly.; PHASTER (PHAge Search Tool Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. While the steps in the phage identification pipeline in PHASTER remain largely the same as in the original PHAST, numerous software improvements and significant hardware enhancements have now made PHASTER faster, more efficient, more visually appealing and much more user friendly. In particular, PHASTER is now 4.3X faster than PHAST when analyzing a typical bacterial genome. More specifically, software optimizations have made the backend of PHASTER 2.7X faster than PHAST. Likewise, the addition of more than 120 CPUs to the PHASTER compute cluster have greatly reduced processing times. PHASTER can now process a typical bacterial genome in 3 minutes from the raw sequence alone, or in 1.5 minutes when given a pre-annotated GenBank file. A number of other optimizations have been implemented, including automated algorithms to reduce the size and redundancy of PHASTER‚Äôs databases, improvements in handling multiple (metagenomic) queries and high user traffic, and the ability to perform automated look-ups against >14,000 previously PHAST/PHASTER annotated bacterial genomes (which can lead to complete phage annotations in seconds as opposed to minutes). PHASTER‚Äôs web interface has also been entirely rewritten. A new graphical genome browser has been added, gene/genome visualization tools have been improved, and the graphical interface is now more modern, robust, and user-friendly.https://pubmed.ncbi.nlm.nih.gov/27141966/ https://phaster.ca/ https://phaster.ca/ http://www.ncbi.nlm.nih.gov/pubmed/27141966; http://www.ncbi.nlm.nih.gov/pubmed/27141966 https://phaster.ca/Rob Edwards' Viral Bioinfo Tools; Phage Kitchen
742023-11-13T11:12:18.883ZPhigaroIdentify Integrated VirusesPhage |; Phigaro: high throughput prophage sequence annotationhttps://github.com/bobeobibo/phigarohttps://www.biorxiv.org/content/10.1101/598243v1Rob Edwards' Viral Bioinfo Tools
752023-11-13T11:12:18.883ZPhiSpyIdentify Integrated VirusesPhage | PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies - PMChttps://github.com/linsalrob/PhiSpyRob Edwards' Viral Bioinfo Tools20220202
762023-11-13T11:12:18.883ZProphage HunterIdentify Integrated VirusesPhage | logistic regression |https://academic.oup.com/nar/article/47/W1/W74/5494712 https://pro-hunter.bgi.com/Rob Edwards' Viral Bioinfo Tools
772023-11-13T11:12:18.883ZProphetIdentify Integrated VirusesPhage |https://github.com/jaumlrc/ProphEThttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0223364Rob Edwards' Viral Bioinfo Tools
782023-11-13T11:12:18.883ZProphinderIdentify Integrated VirusesPhage |https://academic.oup.com/bioinformatics/article/24/6/863/194494 http://aclame.ulb.ac.be/Tools/Prophinder/Rob Edwards' Viral Bioinfo Tools
792023-11-13T11:12:18.883ZVAPiDIdentify Integrated VirusesVirus |https://github.com/rcs333/VAPiDhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2606-yRob Edwards' Viral Bioinfo Tools
802023-11-13T11:12:18.883ZviralintegrationIdentify Integrated VirusesVirus | Nextflow pipelinehttps://github.com/nf-core/viralintegrationRob Edwards' Viral Bioinfo Tools2023
812023-11-13T11:12:18.883ZBACPHLIPLifestyle classificationPhage | Random Forest classifier | **Description:** Bacteriophages are broadly classified into two distinct lifestyles: temperate and virulent. Temperate phages are capable of a latent phase of infection within a host cell (lysogenic cycle), whereas virulent phages directly replicate and lyse host cells upon infection (lytic cycle). Accurate lifestyle identification is critical for determining the role of individual phage species within ecosystems and their effect on host evolution. Here, we present BACPHLIP, a BACterioPHage LIfestyle Predictor. BACPHLIP detects the presence of a set of conserved protein domains within an input genome and uses this data to predict lifestyle via a Random Forest classifier that was trained on a dataset of 634 phage genomes. On an independent test set of 423 phages, BACPHLIP has an accuracy of 98% greatly exceeding that of the previously existing tools (79%). BACPHLIP is freely available on GitHub ( and the code used to build and test the classifier is provided in a separate repository ( for users wishing to interrogate and re-train the underlying classification model.; Bacteriophages are broadly classified into two distinct lifestyles: temperate and virulent. Temperate phages are capable of a latent phase of infection within a host cell (lysogenic cycle), whereas virulent phages directly replicate and lyse host cells upon infection (lytic cycle). Accurate lifestyle identification is critical for determining the role of individual phage species within ecosystems and their effect on host evolution. Here, we present BACPHLIP, a BACterioPHage LIfestyle Predictor. BACPHLIP detects the presence of a set of conserved protein domains within an input genome and uses this data to predict lifestyle via a Random Forest classifier that was trained on a dataset of 634 phage genomes. On an independent test set of 423 phages, BACPHLIP has an accuracy of 98% greatly exceeding that of the previously existing tools (79%). BACPHLIP is freely available on GitHub ( and the code used to build and test the classifier is provided in a separate repository ( for users wishing to interrogate and re-train the underlying classification model.https://github.com/adamhockenberry/bacphlip https://github.com/adamhockenberry/bacphlip) https://github.com/adamhockenberry/bacphlip-model-dev); https://github.com/adamhockenberry/bacphlip) https://github.com/adamhockenberry/bacphlip-model-dev)https://pubmed.ncbi.nlm.nih.gov/33996289/ https://pubmed.ncbi.nlm.nih.gov/33996289/; https://pubmed.ncbi.nlm.nih.gov/33996289/Rob Edwards' Viral Bioinfo Tools; Phage Kitchen20210128
822023-11-13T11:12:18.883ZPHACTSLifestyle classificationPhage | **Description:** PHACTS-0.3.tar.gz **Abstract** *Motivation*: Bacteriophages have two distinct lifestyles: virulent and temperate. The virulent lifestyle has many implications for phage therapy, genomics and microbiology. Determining which lifestyle a newly sequenced phage falls into is currently determined using standard culturing techniques. Such laboratory work is not only costly and time consuming, but also cannot be used on phage genomes constructed from environmental sequencing. Therefore, a computational method that utilizes the sequence data of phage genomes is needed. *Results*: Phage Classification Tool Set (PHACTS) utilizes a novel similarity algorithm and a supervised Random Forest classifier to make a prediction whether the lifestyle of a phage, described by its proteome, is virulent or temperate. The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage. PHACTS predictions are shown to have a 99% precision rate. *Availability and implementation*: PHACTS was implemented in the PERL programming language and utilizes the FASTA program (Pearson and Lipman, 1988) and the R programming language library 'Random Forest' (Liaw and Weiner, 2010). The PHACTS software is open source and is available as downloadable stand-alone version or can be accessed online as a user-friendly web interface. The source code, help files and online version are available at; PHACTS-0.3.tar.gz Abstract Motivation: Bacteriophages have two distinct lifestyles: virulent and temperate. The virulent lifestyle has many implications for phage therapy, genomics and microbiology. Determining which lifestyle a newly sequenced phage falls into is currently determined using standard culturing techniques. Such laboratory work is not only costly and time consuming, but also cannot be used on phage genomes constructed from environmental sequencing. Therefore, a computational method that utilizes the sequence data of phage genomes is needed. Results: Phage Classification Tool Set (PHACTS) utilizes a novel similarity algorithm and a supervised Random Forest classifier to make a prediction whether the lifestyle of a phage, described by its proteome, is virulent or temperate. The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage. PHACTS predictions are shown to have a 99% precision rate. Availability and implementation: PHACTS was implemented in the PERL programming language and utilizes the FASTA program (Pearson and Lipman, 1988) and the R programming language library 'Random Forest' (Liaw and Weiner, 2010). The PHACTS software is open source and is available as downloadable stand-alone version or can be accessed online as a user-friendly web interface. The source code, help files and online version are available athttps://pubmed.ncbi.nlm.nih.gov/22238260/ https://pubmed.ncbi.nlm.nih.gov/22238260/ https://edwards.sdsu.edu/PHACTS/ https://edwards.sdsu.edu/PHACTS/PHACTS-0.3.tar.gz http://www.phantome.org/PHACTS/.; https://edwards.sdsu.edu/PHACTS/ https://edwards.sdsu.edu/PHACTS/PHACTS-0.3.tar.gz http://www.phantome.org/PHACTS/. https://pubmed.ncbi.nlm.nih.gov/22238260/Rob Edwards' Viral Bioinfo Tools; Phage Kitchen
832023-11-13T11:12:18.883ZViralMSAMultiple Sequence AlignmentVirus | Python script that wraps around read mappers (e.g. Minimap2) |https://github.com/niemasd/ViralMSAhttps://doi.org/10.1093/bioinformatics/btaa743Rob Edwards' Viral Bioinfo ToolsActively developed
842023-11-13T11:12:18.883ZPhanotatePhage genesPhage | **Description:** PHANOTATE is a tool to annotate phage genomes. It uses the assumption that non-coding bases in a phage genome is disadvantageous, and then populates a weighted graph to find the optimal path through the six frames of the DNA where open reading frames are beneficial paths, while gaps and overlaps are penalized paths.https://github.com/deprekate/PHANOTATE https://github.com/deprekate/PHANOTATEhttps://academic.oup.com/bioinformatics/article/35/22/4537/5480131 https://academic.oup.com/bioinformatics/article/35/22/4537/5480131Rob Edwards' Viral Bioinfo Tools
852023-11-13T11:12:18.883ZPHROGsPhage genesPhage |https://academic.oup.com/nargab/article/3/3/lqab067/6342220Rob Edwards' Viral Bioinfo Tools
862023-11-13T11:12:18.883ZPHREDPhage receptorsPhage |No longer availablehttps://academic.oup.com/femsle/article/363/4/fnw002/1845417Rob Edwards' Viral Bioinfo Tools
872023-11-13T11:12:18.883ZPlaqueSizeToolPlaque size calculationPhage | Based on the optimized Computer Vision library |https://github.com/ellinium/plaque_size_toolhttps://www.sciencedirect.com/science/article/pii/S004268222100115X?via%3DihubRob Edwards' Viral Bioinfo Tools2022
882023-11-13T11:12:18.883ZPlaqueSizeTool (colab version)Plaque size calculationPhage | Based on the optimized Computer Vision library |https://www.sciencedirect.com/science/article/pii/S004268222100115X?via%3Dihub https://colab.research.google.com/drive/1HJe8V26l7n82zX8vJ7bO5C8-xrs_aWuq?usp=sharingRob Edwards' Viral Bioinfo Tools2023
892023-11-13T11:12:18.883ZPhageTermPredicting phage packaging mechanismPhage | Read mapping |https://www.nature.com/articles/s41598-017-07910-5 https://gitlab.pasteur.fr/vlegrand/ptv/-/releasesRob Edwards' Viral Bioinfo Tools
902023-11-13T11:12:18.883ZVirus-Host Interaction Predictor (VHIP)PredictionG. Eric Bastien and colleagues have developed a [machine learning model called Virus-Host Interaction Predictor (VHIP)]( to predict virus-host interactions and reconstruct complex virus-host networks in natural systems.G. Eric Bastien and othershttps://www.biorxiv.org/content/10.1101/2023.11.03.565433v1) https://www.biorxiv.org/content/10.1101/2023.11.03.565433v1Rob Edwards' Viral Bioinfo Tools
912023-11-13T11:12:18.883ZPhagePromoterPromotersPhage | artificial neural network (ANN), support vector machines (SVM) |https://github.com/martaS95/PhagePromoterhttps://academic.oup.com/bioinformatics/article/35/24/5301/5540317Rob Edwards' Viral Bioinfo Tools
922023-11-13T11:12:18.883ZDeepVHPPIProtein:Protein InteractionsVirus |https://github.com/QData/DeepVHPPIhttps://dl.acm.org/doi/abs/10.1145/3459930.3469527Rob Edwards' Viral Bioinfo Tools
932023-11-13T11:12:18.883ZPhageRBPdetectRBPPhage | HMMs & machine learning |https://www.mdpi.com/1999-4915/14/6/1329Rob Edwards' Viral Bioinfo Tools
942023-11-13T11:12:18.883ZEVBC Virus Bioinformatics ToolsResourceA collection of useful tools in Virus Bioinformatics curated by the European Virus Bioinformatics Center. Please note, that the EVBC is not maintaining the toolsEVBChttps://evirusbioinfc.notion.site/evirusbioinfc/18e21bc49827484b8a2f84463cb40b8d?v=92e7eb6703be4720abf17a901bc9a947Rob Edwards' Viral Bioinfo Tools
952023-11-13T11:12:18.883ZMGE detection toolsResourceA collection of bacteria/virus toolshttps://docs.google.com/spreadsheets/d/1dL5o524IX_-hJB6iYV1FB4QrK_U5KgFcfM4rZDZV_Dw/edit#gid=0Rob Edwards' Viral Bioinfo Tools
962024-01-23T14:08:28.106ZPhage KitchenResourceComparison and categorization of MANY phage bioinformatics toolsNouri Ben Zakourhttps://github.com/nbenzakour/phage-kitchenNouri Ben Zakour
972024-01-23T14:08:28.106ZPhage prediction toolsResourceGithub repo accompanying paper: "Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data" by Siu Fung Stanley Ho, Nicole E. Wheeler, Andrew D. Millard & Willem van Schaik (https://github.com/sxh1136/Phage_toolshttps://doi.org/10.1186/s40168-023-01533-x)Rob Edwards' Viral Bioinfo Tools
982023-11-13T11:12:18.883ZRob Edwards' Viral Bioinformatics ToolsResourcePeriodically updated open spreadsheet of bioinformatics tools; owned by Rob EdwardsRob Edwardshttps://docs.google.com/spreadsheets/d/1ClNgip08olKK-oBMMlPHBwIcilqSxsan8MEaYphUei4/edit#gid=1636291468Rob Edwards' Viral Bioinfo Tools
992023-11-13T11:12:18.883ZTE Hub Repeat DatabasesResourceA list of databases for the storage of sequences and metadata associated with repetitive, mobile and selfish DNATyler Elliotthttps://tehub.org/en/resources/repeat_databasesRob Edwards' Viral Bioinfo Tools
1002023-11-13T11:12:18.883ZTesting (5) Prophage finding toolsResourceComparison of five (text updated with 5th tool) prophage finding tools for bacterial genomics — Phispy, VirSorter, Phigaro, ProphET, PHASTERhttps://nickp60.github.io/weird_one_offs/testing_3_prophage_finders/Rob Edwards' Viral Bioinfo Tools
1012023-11-13T11:12:18.883ZVEGARNA viral assembly toolkitBoth | snakemake workflow |https://github.com/pauloluniyi/VGEAhttps://peerj.com/articles/12129/Rob Edwards' Viral Bioinfo Tools
1022023-11-13T11:12:18.883ZpalmIDRNA Virus (RdRp) search toolVirus | Website / R |https://peerj.com/articles/14055/ https://serratus.io/palmidRob Edwards' Viral Bioinfo Tools2023
1032023-11-13T11:12:18.883ZRdRp-scanRNA Virus (RdRp) search toolBoth | seacrh against the RdRp database |https://github.com/JustineCharon/RdRp-scan/https://academic.oup.com/ve/article/8/2/veac082/6679729?login=trueRob Edwards' Viral Bioinfo Tools
1042023-11-13T11:12:18.883ZrdrpsearchRNA Virus (RdRp) search toolBoth | Iterative HMM search of viral RdRp |https://www.science.org/doi/abs/10.1126/science.abm5847 https://zenodo.org/record/5731488#.Y-6yFXbMKUkRob Edwards' Viral Bioinfo Tools20211127
1052023-11-13T11:12:18.883ZCHVDSequence DatabaseBoth |https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201803/ https://zenodo.org/record/4498884#.Y2Q9sHZBxD8Rob Edwards' Viral Bioinfo Tools20210203
1062023-11-13T11:12:18.883ZEarth ViromeSequence DatabaseBoth |https://www.nature.com/articles/nprot.2017.063 https://portal.nersc.gov/dna/microbial/prokpubs/EarthVirome_DP/Rob Edwards' Viral Bioinfo Tools20151210
1072023-11-13T11:12:18.883ZGOV-RNASequence DatabaseBoth | RNA viruses from the Global Ocean |https://www.science.org/doi/abs/10.1126/science.abm5847 https://datacommons.cyverse.org/browse/iplant/home/shared/iVirus/ZayedWainainaDominguez-Huerta_RNAevolution_Dec2021Rob Edwards' Viral Bioinfo Tools20211206
1082023-11-13T11:12:18.883ZGOV2.0Sequence DatabaseBoth | DNA viruses from the Global Ocean |https://www.cell.com/cell/fulltext/S0092-8674(19)30341-1 https://datacommons.cyverse.org/browse/iplant/home/shared/iVirus/GOV2.0Rob Edwards' Viral Bioinfo Tools20190424
1092023-11-13T11:12:18.883ZGPDBSequence DatabaseBoth |https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7895897/?report=reader http://ftp.ebi.ac.uk/pub/databases/metagenomics/genome_sets/gut_phage_database/Rob Edwards' Viral Bioinfo Tools20201029
1102023-11-13T11:12:18.883ZGVDSequence DatabaseBoth |https://www.sciencedirect.com/science/article/pii/S193131282030456X https://datacommons.cyverse.org/browse/iplant/home/shared/iVirus/Gregory_and_Zablocki_GVD_Jul2020Rob Edwards' Viral Bioinfo Tools
1112023-11-13T11:12:18.883ZKEGG VirusSequence DatabaseBothhttps://www.genome.jp/kegg/genome/virus.htmlRob Edwards' Viral Bioinfo Tools
1122023-11-13T11:12:18.883ZmMGESequence DatabaseBoth | mobile genetic element databasehttps://mai.fudan.edu.cn/mgedb/client/index.html#/Rob Edwards' Viral Bioinfo Tools
1132023-11-13T11:12:18.883ZPhagesDBSequence Databasehttps://phagesdb.org/Rob Edwards' Viral Bioinfo Tools
1142023-11-13T11:12:18.883ZViruses.StringSequence DatabaseBothhttp://viruses.string-db.org/Rob Edwards' Viral Bioinfo Tools
1152023-11-13T11:12:18.883ZFAVITESSimulate networksBoth | Simulate contact networks, transmission networks, phylogenies, and sequences |https://github.com/niemasd/FAVITEShttps://doi.org/10.1093/bioinformatics/bty921Rob Edwards' Viral Bioinfo Tools20221124
1162023-11-13T11:12:18.883ZFAVITES-LiteSimulate networksBoth | Simulate contact networks, transmission networks, phylogenies, and sequences | TBDhttps://github.com/niemasd/FAVITES-LiteRob Edwards' Viral Bioinfo ToolsActively developed
1172023-11-13T11:12:18.883ZefamViral orthologous groupsBoth | Concensus viral identification, network-based clustering, metaproteomics |https://academic.oup.com/bioinformatics/article/37/22/4202/6300514 https://datacommons.cyverse.org/browse/iplant/home/shared/iVirus/Zayed_efam_2020.1Rob Edwards' Viral Bioinfo Tools20210505
1182023-11-13T11:12:18.883ZpVOGsViral orthologous groupsPhage |https://academic.oup.com/nar/article/45/D1/D491/2333930 http://dmk-brain.ecn.uiowa.edu/pVOGs/Rob Edwards' Viral Bioinfo Tools
1192023-11-13T11:12:18.883ZVogDBViral orthologous groupsBothhttp://vogdb.org/Rob Edwards' Viral Bioinfo Tools
1202023-11-13T11:12:18.883ZVStrainsViral strain reconstructionVirus |https://github.com/metagentools/VStrainshttps://www.biorxiv.org/content/10.1101/2022.10.21.513181v2Rob Edwards' Viral Bioinfo Tools
1212023-11-13T11:12:18.883ZvAMPirusVirus amplicon sequencingboth | Nextflow pipeline |https://github.com/Aveglia/vAMPirushttps://www.authorea.com/users/584435/articles/623635-vampirus-a-versatile-amplicon-processing-and-analysis-program-for-studying-viruses?commit=4bde44de2b3f3816288a47c0a72ec4075e6438ccRob Edwards' Viral Bioinfo Tools2023
1222023-11-13T11:12:18.883ZCOBRAVirus genome improvementBoth |https://www.biorxiv.org/content/10.1101/2023.05.30.542503v2.abstractRob Edwards' Viral Bioinfo Tools
1232023-11-30T14:44:02.217ZCenote-Taker2Virus identification in metagenomesBoth |https://github.com/mtisza1/Cenote-Taker2https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7816666/pdf/veaa100.pdfRob Edwards' Viral Bioinfo Tools20220719
1242023-11-13T11:12:18.883ZCoCoNetVirus identification in metagenomesVirus | Neural networks |https://github.com/Puumanamana/CoCoNethttps://academic.oup.com/bioinformatics/article/37/18/2803/6211156Rob Edwards' Viral Bioinfo Tools20211022
1252023-11-13T11:12:18.883ZcrassusVirus identification in metagenomesPhage | snakemake workflowhttps://github.com/dcarrillox/CrassUSRob Edwards' Viral Bioinfo Tools20220704
1262023-11-13T11:12:18.883ZDBSCAN-SWAVirus identification in metagenomesPhage | DBSCAN |https://github.com/HIT-ImmunologyLab/DBSCAN-SWA/.https://www.frontiersin.org/articles/10.3389/fgene.2022.885048/fullRob Edwards' Viral Bioinfo Tools20221103
1272023-11-13T11:12:18.883ZDeepVirFinderVirus identification in metagenomesBoth | neural network |; Identifying viruses from metagenomic data by deep learninghttps://github.com/jessieren/DeepVirFinderhttps://arxiv.org/pdf/1806.07810.pdfRob Edwards' Viral Bioinfo Tools20221008
1282023-11-13T11:12:18.883ZDePhTVirus identification in metagenomesphage |https://github.com/chg60/DEPhThttps://academic.oup.com/nar/article/50/13/e75/6572362Rob Edwards' Viral Bioinfo Tools20220930
1292023-11-13T11:12:18.883ZFastViromeExplorerVirus identification in metagenomesBoth |https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5768174/ https://code.vt.edu/saima5/FastViromeExplorerRob Edwards' Viral Bioinfo Tools20180220
1302023-11-13T11:12:18.883ZGenomePeekVirus identification in metagenomesPhage |https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476108/Rob Edwards' Viral Bioinfo Tools
1312023-11-13T11:12:18.883ZhecatombVirus identification in metagenomesBoth | **Description:** A hecatomb is a great sacrifice or an extensive loss. Heactomb the software empowers an analyst to make data driven decisions to 'sacrifice' false-positive viral reads from metagenomes to enrich for true-positive viral reads. This process frequently results in a great loss of suspected viral sequences / contigs.https://github.com/shandley/hecatomb https://github.com/shandley/hecatombhttps://www.biorxiv.org/content/10.1101/2022.05.15.492003v2 https://hecatomb.readthedocs.io/en/latest/Rob Edwards' Viral Bioinfo Tools20220902
1322023-11-13T11:12:18.883ZHoloVirVirus identification in metagenomesBoth |https://github.com/plaffy/HoloVirhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4899465/Rob Edwards' Viral Bioinfo Tools20181113
1332023-11-13T11:12:18.883ZINHERITVirus identification in metagenomesPhage | Embedding (BERT) |https://github.com/Celestial-Bai/INHERIThttps://academic.oup.com/bioinformatics/article/38/18/4264/6654586Rob Edwards' Viral Bioinfo Tools20221024
1342023-11-13T11:12:18.883ZislingVirus identification in metagenomesVirus | Split read alignment |https://github.com/szsctt/intvi_other-toolshttps://www.sciencedirect.com/science/article/pii/S0022283621006458Rob Edwards' Viral Bioinfo Tools20210811
1352023-11-13T11:12:18.883ZJaegerVirus identification in metagenomesPhagehttps://github.com/Yasas1994/JaegerRob Edwards' Viral Bioinfo Tools20230210
1362023-11-13T11:12:18.883ZJovianVirus identification in metagenomesVirushttps://github.com/DennisSchmitz/JovianRob Edwards' Viral Bioinfo Tools20210604
1372023-11-30T14:44:01.213ZLazyPipeVirus identification in metagenomesBoth |https://academic.oup.com/ve/article/6/2/veaa091/6017186?login=false https://www.helsinki.fi/en/projects/lazypipeRob Edwards' Viral Bioinfo Tools20200706
1382023-11-13T11:12:18.883ZMARVELVirus identification in metagenomesPhage | random forest |; MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Binshttps://github.com/LaboratorioBioinformatica/MARVELhttps://www.frontiersin.org/articles/10.3389/fgene.2018.00304/fullRob Edwards' Viral Bioinfo Tools
1392023-11-13T11:12:18.883ZmetaPhageVirus identification in metagenomesboth | pipelinehttps://mattiapandolfovr.github.io/MetaPhage/Rob Edwards' Viral Bioinfo Tools
1402023-11-13T11:12:18.883ZMetaPhinderVirus identification in metagenomesBoth |; MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Setshttps://github.com/vanessajurtz/MetaPhinderhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5042410/Rob Edwards' Viral Bioinfo Tools
1412023-11-13T11:12:18.883ZPhablesVirus identification in metagenomesPhage | Flow decomposition on assembly graphs |https://github.com/Vini2/phableshttps://biorxiv.org/cgi/content/short/2023.04.04.535632v1Rob Edwards' Viral Bioinfo Tools2023
1422023-11-13T11:12:18.883ZPhage toolsVirus identification in metagenomesPhagehttps://github.com/sxh1136/Phage_toolsRob Edwards' Viral Bioinfo Tools
1432023-11-13T11:12:18.883ZPHAMBVirus identification in metagenomesPhage | Random forest |https://github.com/RasmussenLab/phambhttps://www.nature.com/articles/s41467-022-28581-5Rob Edwards' Viral Bioinfo Tools
1442023-11-13T11:12:18.883ZphaMersVirus identification in metagenomesPhage | kmers + machine learning |https://github.com/jondeaton/PhaMershttps://doi.org/10.1002/adbi.201900108Rob Edwards' Viral Bioinfo Tools
1452023-11-13T11:12:18.883ZPhantaVirus identification in metagenomesBoth | K-mer read based classification, snakemake workflow |https://github.com/bhattlab/phantahttps://www.biorxiv.org/content/10.1101/2022.08.05.502982v1.fullRob Edwards' Viral Bioinfo Tools2023
1462023-11-13T11:12:18.883ZPIGvVirus identification in metagenomesGiant virus | Metabat binning, k-mer scoring, marker geneshttps://github.com/BenMinch/PIGvRob Edwards' Viral Bioinfo Tools2023
1472023-11-13T11:12:18.883ZPPR-MetaVirus identification in metagenomesPhage | neural network - CNN |; PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learninghttps://github.com/zhenchengfang/PPR-Metahttps://doi.org/10.1093/gigascience/giz066Rob Edwards' Viral Bioinfo Tools
1482023-11-13T11:12:18.883ZProphage TracerVirus identification in metagenomesPhage | Split read alignment |https://academic.oup.com/nar/article/49/22/e128/6374144Rob Edwards' Viral Bioinfo Tools
1492023-11-13T11:12:18.883ZSeekerVirus identification in metagenomesPhage* | LSTM |; Seeker: alignment-free identification of bacteriophage genomes by deep learninghttps://github.com/gussow/seekerhttps://academic.oup.com/nar/article/48/21/e121/5921300Rob Edwards' Viral Bioinfo Tools
1502023-11-13T11:12:18.883ZSerratusVirus identification in metagenomesBoth | Website |https://www.nature.com/articles/s41586-021-04332-2 https://serratus.io/Rob Edwards' Viral Bioinfo Tools2023
1512023-11-13T11:12:18.883ZVFMVirus identification in metagenomesPhage |https://github.com/liuql2019/VFMhttps://ieeexplore.ieee.org/document/8924706Rob Edwards' Viral Bioinfo Tools
1522023-11-13T11:12:18.883ZVIBRANTVirus identification in metagenomesBoth |; Automated recovery, annotation and curation of microbial viruses, and evaluation of virome function from genomic sequenceshttps://github.com/AnantharamanLab/VIBRANThttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00867-0Rob Edwards' Viral Bioinfo Tools
1532023-11-13T11:12:18.883ZVIGAVirus identification in metagenomesBothhttps://github.com/EGTortuero/viga/tree/developerRob Edwards' Viral Bioinfo Tools2022
1542023-11-13T11:12:18.883ZViralCCVirus identification in metagenomesBoth |https://github.com/dyxstat/ViralCChttps://www.nature.com/articles/s41467-023-35945-yRob Edwards' Viral Bioinfo Tools2022
1552023-11-13T11:12:18.883ZViralConsensusVirus identification in metagenomesVirus | Viral consensus sequence calling |https://github.com/niemasd/ViralConsensushttps://doi.org/10.1101/2020.11.10.377499Rob Edwards' Viral Bioinfo ToolsActively developed
1562023-11-13T11:12:18.883ZviralMetagenomicsPipelineVirus identification in metagenomesSnakemakehttps://github.com/wclose/viralMetagenomicsPipelineRob Edwards' Viral Bioinfo Tools
1572023-11-13T11:12:18.883ZViralWasmVirus identification in metagenomesVirus | WebAssembly |https://zenodo.org/doi/10.5281/zenodo.8427588 https://niema-lab.github.io/ViralWasmRob Edwards' Viral Bioinfo ToolsActively developed
1582023-11-13T11:12:18.883ZviraMinerVirus identification in metagenomesBoth | CNN classifier |https://github.com/NeuroCSUT/ViraMinerhttps://doi.org/10.1371/journal.pone.0222271Rob Edwards' Viral Bioinfo Tools
1592023-11-13T11:12:18.883ZvirAnnotVirus identification in metagenomesVirus | pipeline |https://github.com/marieBvr/virAnnothttps://doi.org/10.1094/PBIOMES-07-19-0037-ARob Edwards' Viral Bioinfo Tools2022
1602023-11-13T11:12:18.883ZvirFinderVirus identification in metagenomesBoth | neural network,machine learning |https://github.com/jessieren/VirFinderhttps://doi.org/10.1186/s40168-017-0283-5Rob Edwards' Viral Bioinfo Tools
1612023-11-13T11:12:18.883ZVirhunterVirus identification in metagenomesVirus |https://github.com/cbib/virhunterhttps://www.frontiersin.org/articles/10.3389/fbinf.2022.867111/fullRob Edwards' Viral Bioinfo Tools
1622023-11-13T11:12:18.883ZVirMineVirus identification in metagenomesBoth |https://github.com/thatzopoulos/virMinehttps://peerj.com/articles/6695/Rob Edwards' Viral Bioinfo Tools
1632023-11-13T11:12:18.883ZvirMinerVirus identification in metagenomesBoth | random forest |https://github.com/TingtZHENG/VirMinerhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425642/Rob Edwards' Viral Bioinfo Tools
1642023-11-13T11:12:18.883ZVirNetVirus identification in metagenomesPhage |; Deep attention model for viral reads identificationhttps://github.com/alyosama/virnethttps://doi.org/10.1109/ICCES.2018.8639400Rob Edwards' Viral Bioinfo Tools
1652023-11-13T11:12:18.883ZVirSorterVirus identification in metagenomesPhage |; VirSorter: mining viral signal from microbial genomic datahttps://github.com/simroux/VirSorterhttps://peerj.com/articles/985/Rob Edwards' Viral Bioinfo Tools
1662023-11-13T11:12:18.883ZVirSorter2Virus identification in metagenomesBoth | Random Forest |https://bitbucket.org/MAVERICLab/virsorter2/src/master/https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00990-yRob Edwards' Viral Bioinfo Tools
1672023-11-13T11:12:18.883ZVirtifierVirus identification in metagenomesBoth | LSTM neural network |https://github.com/crazyinter/Seq2Vechttps://academic.oup.com/bioinformatics/article/38/5/1216/6462188Rob Edwards' Viral Bioinfo Tools
1682023-11-13T11:12:18.883Zvirus_predictionVirus identification in metagenomesBoth | Nextflowhttps://github.com/rujinlong/virus_predictionRob Edwards' Viral Bioinfo Tools
1692023-11-13T11:12:18.883ZViruSpyVirus identification in metagenomesBothhttps://github.com/NCBI-Hackathons/ViruSpyRob Edwards' Viral Bioinfo Tools
1702023-11-13T11:12:18.883ZVirusSeekerVirus identification in metagenomesBoth |https://www.sciencedirect.com/science/article/pii/S0042682217300053?via%3Dihub https://wupathlabs.wustl.edu/virusseeker/Rob Edwards' Viral Bioinfo Tools20160824
1712023-11-30T14:46:12.093ZWhat_the_phageVirus identification in metagenomesPhage | Nextflow |https://github.com/replikation/What_the_Phagehttps://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giac110/6833029#:~:text=https%3A//doi.org/10.1093/gigascience/giac110Rob Edwards' Viral Bioinfo Tools
1722023-11-13T11:12:18.883ZvRhymeVirus identification in metagenomesBoth | Machine learning | vRhyme enables binning of viral genomes from metagenomes | Nucleic Acids Research | Oxford Academichttps://github.com/AnantharamanLab/vRhymeRob Edwards' Viral Bioinfo Tools
1732023-11-13T11:12:18.883ZDeep6Virus identification in metatranscriptomesBoth | Machine Learning |https://github.com/janfelix/Deep6https://journals.asm.org/doi/10.1128/mra.01079-22Rob Edwards' Viral Bioinfo Tools
1742023-11-13T11:12:18.883ZBERTaxVirus taxonomyBoth |https://github.com/f-kretschmer/bertaxhttps://www.pnas.org/doi/full/10.1073/pnas.2122636119Rob Edwards' Viral Bioinfo Tools
1752023-11-13T11:12:18.883ZClassiphages 2.0Virus taxonomyPhage | ANN |No code availablehttps://www.biorxiv.org/content/10.1101/558171v1Rob Edwards' Viral Bioinfo ToolsNone
1762023-11-13T11:12:18.883ZGraViTyVirus taxonomyBoth | HMMs and genome organisation models |https://github.com/PAiewsakun/GRAViTyhttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0422-7Rob Edwards' Viral Bioinfo Tools20200224
1772023-11-13T11:12:18.883ZPhaGCNVirus taxonomyPhage | GCN |https://github.com/KennthShang/PhaGCNhttps://academic.oup.com/bioinformatics/article/37/Supplement_1/i25/6319660Rob Edwards' Viral Bioinfo Tools
1782023-11-13T11:12:18.883ZvConTACTVirus taxonomyBoth | Whole-genome gene-sharing networks for virus taxonomy |https://bitbucket.org/MAVERICLab/vcontact/src/master/https://peerj.com/articles/3243/Rob Edwards' Viral Bioinfo Tools
1792023-11-13T11:12:18.883ZvConTACT2.0Virus taxonomyBoth | Whole-genome gene-sharing networks for virus taxonomy |https://bitbucket.org/MAVERICLab/vcontact2/src/master/https://www.nature.com/articles/s41587-019-0100-8Rob Edwards' Viral Bioinfo Tools
1802023-11-13T11:12:18.883ZVICTORVirus taxonomyPhage |https://github.com/vdclab/vdclab-wiki/blob/master/VICTOR.mdhttps://academic.oup.com/bioinformatics/article/33/21/3396/3933260Rob Edwards' Viral Bioinfo Tools
1812023-11-13T11:12:18.883ZVIPtreeVirus taxonomyPhage |https://github.com/yosuken/ViPTreeGenhttps://academic.oup.com/bioinformatics/article/33/21/3396/3933260Rob Edwards' Viral Bioinfo Tools
1822023-11-13T11:12:18.883ZVIRIDICVirus taxonomyPhage |https://www.mdpi.com/1999-4915/12/11/1268Rob Edwards' Viral Bioinfo Tools
1832023-11-13T11:12:18.883ZVIRifyVirus taxonomyBoth |; VIRify is a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies. The pipeline is part of the repertoire of analysis services offered by MGnify. VIRify’s taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs. The pipeline is implemented and available in CWL and Nextflow.https://github.com/EBI-Metagenomics/emg-viral-pipeline; https://github.com/EBI-Metagenomics/emg-viral-pipelinehttps://www.researchgate.net/publication/362871600_VIRify_an_integrated_detection_annotation_and_taxonomic_classification_pipeline_using_virus-specific_protein_profile_hidden_Markov_models/link/6304eb9961e4553b95322c97/downloadRob Edwards' Viral Bioinfo Tools; Phage Kitchenhttps://trello.com/1/cards/6183429621514236f30e5700/attachments/618342c9a3769e0a1ada4128/download/chart.png
1842023-11-13T11:12:18.883ZVirusTaxoVirus taxonomyVirus | k-mer enrichment method |https://github.com/nahid18/virustaxo-wfhttps://www.sciencedirect.com/science/article/pii/S0888754322001598?via%3DihubRob Edwards' Viral Bioinfo Tools
1852023-11-13T11:12:18.883ZVPF ToolsVirus taxonomyBoth |https://github.com/biocom-uib/vpf-toolshttps://academic.oup.com/bioinformatics/article/37/13/1805/6104829Rob Edwards' Viral Bioinfo Tools
1862023-11-29T15:14:06.683ZAmerican Type Culture Collection (ATCC) (US)https://www.atcc.org/microbe-products/bacteriology-and-archaea/bacteriophages#t=productTab&numberOfResults=24Phage Kitchen
1872023-11-13T11:12:18.883ZAnnotation & classification
1882023-11-13T11:12:18.883ZApollo
1892023-11-13T11:12:18.883ZARAGORN
1902023-11-13T11:12:18.883ZAssembly
1912023-11-13T11:12:18.883ZBALROG - Bacterial Annotation by Learned Representation Of GenesBalrog is a prokaryotic gene finder based on a Temporal Convolutional Network. We took a data-driven approach to prokaryotic gene finding, relying on the large and diverse collection of already-sequenced genomes. By training a single, universal model of bacterial genes on protein sequences from many different species, we were able to match the sensitivity of current gene finders while reducing the overall number of gene predictions. Balrog does not need to be refit on any new genome. **Description:** Balrog is a prokaryotic gene finder based on a Temporal Convolutional Network. We took a data-driven approach to prokaryotic gene finding, relying on the large and diverse collection of already-sequenced genomes. By training a single, universal model of bacterial genes on protein sequences from many different species, we were able to match the sensitivity of current gene finders while reducing the overall number of gene predictions. Balrog does not need to be refit on any new genome.https://github.com/salzberg-lab/Balrog https://github.com/salzberg-lab/Balroghttps://www.biorxiv.org/content/10.1101/2020.09.06.285304v1 https://www.biorxiv.org/content/10.1101/2020.09.06.285304v1Phage Kitchen
1922023-11-13T11:12:18.883Zbarrnap
1932023-11-13T11:12:18.883ZBaylor College of Medicine (US)https://www.bcm.edu/departments/biochemistry-and-molecular-biology/faculty-staff/researchers/bacteria-and-phagePhage Kitchen
1942023-11-13T11:12:18.883ZBBtools
1952023-11-13T11:12:18.883Zblast-DB-phage
1962023-11-13T11:12:18.883Zblastn
1972023-11-13T11:12:18.883Zblastp
1982023-11-13T11:12:18.883Zblastx
1992023-11-13T11:12:18.883ZBowtie2
2002023-11-13T11:12:18.883ZC++
2012023-11-13T11:12:18.883ZCategory
2022023-11-13T11:12:18.883ZCAZY
2032023-11-13T11:12:18.883ZCD-HIT
2042023-11-13T11:12:18.883ZCDD
2052023-11-13T11:12:18.883ZCenote_Unlimited_BreadsticksUnlimited Breadsticks uses probabilistic models (i.e. HMMs) of virus hallmark genes to identify virus sequences from any dataset of contigs (e.g. metagenomic assemblies) or genomes (e.g. bacterial genomes). Optionally, Unlimited Breadsticks will use gene content information to remove flanking cellular chromosomes from contigs representing putative prophages. Generally, the prophage-cellular chromosome boundary will be identified within 100 nt - 2000 nt of the actual location. + The code is currently functional. Feel free to consume Unlimited Breadsticks at will. + Minor update to handle very large contig files AND update to HMM databases on June 16th, 2021 Unlimited Breadsticks is derived from Cenote-Taker 2, but several time-consuming computations are skipped in order to analyze datasets as quickly as possible. Also, Unlimited Breadsticks only takes approximately 16 minutes to download and install (Cenote-Taker 2 takes about 2 hours due to large databases required for thorough sequence annotation). See installation instructions below. **Limitations** Compared to Cenote-Taker 2, there are a few limitations. * Unlimited Breadsticks does not do post-hallmark-gene-identification computations to flag plasmid and conjugative element sequences that occasionally slip through. * Unlimited Breadsticks does not make genome maps for manual inspection of putative viruses. * Contigs are not extensively annotated by Unlimited Breadsticks. No genome maps are created. **Description:** Unlimited Breadsticks uses probabilistic models (i.e. HMMs) of virus hallmark genes to identify virus sequences from any dataset of contigs (e.g. metagenomic assemblies) or genomes (e.g. bacterial genomes). Optionally, Unlimited Breadsticks will use gene content information to remove flanking cellular chromosomes from contigs representing putative prophages. Generally, the prophage-cellular chromosome boundary will be identified within 100 nt - 2000 nt of the actual location. + The code is currently functional. Feel free to consume Unlimited Breadsticks at will. + Minor update to handle very large contig files AND update to HMM databases on June 16th, 2021 Unlimited Breadsticks is derived from Cenote-Taker 2, but several time-consuming computations are skipped in order to analyze datasets as quickly as possible. Also, Unlimited Breadsticks only takes approximately 16 minutes to download and install (Cenote-Taker 2 takes about 2 hours due to large databases required for thorough sequence annotation). See installation instructions below. **Limitations** Compared to Cenote-Taker 2, there are a few limitations. * Unlimited Breadsticks does not do post-hallmark-gene-identification computations to flag plasmid and conjugative element sequences that occasionally slip through. * Unlimited Breadsticks does not make genome maps for manual inspection of putative viruses. * Contigs are not extensively annotated by Unlimited Breadsticks. No genome maps are created.https://github.com/mtisza1/Cenote_Unlimited_Breadsticks https://github.com/mtisza1/Cenote_Unlimited_BreadsticksPhage Kitchen
2062023-11-13T11:12:18.883ZCenote-Taker DB
2072023-11-13T11:12:18.883ZCenote-Taker2 vs DeepVirFinder - VirSorter2 - VIGAhttps://academic.oup.com/ve/article/7/1/veaa100/6055568Phage Kitchen
2082023-11-13T11:12:18.883ZCenote-Taker2: Discover and Annotate Divergent Viral ContigsCenote-Taker 2 is a dual function bioinformatics tool. On the one hand, Cenote-Taker 2 discovers/predicts virus sequences from any kind of genome or metagenomic assembly. Second, virus sequences/genomes are annotated with a variety of sequences features, genes, and taxonomy. Either the discovery or the the annotation module can be used independently. Cenote-Taker 2 democratizes virus discovery and sequence annotation.https://github.com/mtisza1/Cenote-Taker2https://academic.oup.com/ve/article/7/1/veaa100/6055568Phage Kitchenhttps://trello.com/1/cards/6178a8d0da64201e6344ef18/attachments/6178a996f0d585697c7bd27e/download/m_veaa100f1.jpeg
2092023-11-13T11:12:18.883ZCFU.AICFU Counting with Artificial Intelligence A Deep-Learning based counting tool that offers accurate and robust analyses **Description:** CFU Counting with Artificial Intelligence A Deep-Learning based counting tool that offers accurate and robust analyseshttp://www.cfu.ai/ http://www.cfu.ai/Phage Kitchenhttps://trello.com/1/cards/618336f7f8e6df32edda8cf7/attachments/61833795717a46409bb85e40/download/app_3.png
2102023-11-13T11:12:18.883ZcheckV
2112023-11-13T11:12:18.883ZCheckV - assesses the quality and completeness of metagenome-assembled viral genomesHere we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. **Description:** Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome.https://bitbucket.org/berkeleylab/CheckV https://bitbucket.org/berkeleylab/CheckVhttps://portal.nersc.gov/CheckV/ https://www.nature.com/articles/s41587-020-00774-7 https://www.nature.com/articles/s41587-020-00774-7 https://portal.nersc.gov/CheckV/Phage Kitchenhttps://trello.com/1/cards/61831d1ed4b96d70640d6719/attachments/61831db1aaf79980868421d9/download/checkv.png
2122023-11-13T11:12:18.883ZCheckV vs Vibrant, VirSorter, PhiSpy, Phigarohttps://www.nature.com/articles/s41587-020-00774-7Phage Kitchen
2132023-11-13T11:12:18.883ZChromomap
2142023-11-13T11:12:18.883Zchromomap
2152023-11-13T11:12:18.883ZCirclator
2162023-11-13T11:12:18.883ZClustal(W and or O)
2172023-11-13T11:12:18.883ZClustering/comparison
2182023-11-13T11:12:18.883ZCOG
2192023-11-13T11:12:18.883ZCPT GalaxyAt the Center for Phage Technology (CPT), we developed a suite of phage-oriented tools housed in open, user-friendly web-based interfaces. A Galaxy platform conducts computationally intensive analyses and Apollo, a collaborative genome annotation editor, visualizes the results of these analyses. The collection includes open source applications such as the BLAST+ suite, InterProScan, and several gene callers, as well as unique tools developed at the CPT that allow maximum user flexibility. We describe in detail programs for finding Shine-Dalgarno sequences, resources used for confident identification of lysis genes such as spanins, and methods used for identifying interrupted genes that contain frameshifts or introns. At the CPT, genome annotation is separated into two robust segments that are facilitated through the automated execution of many tools chained together in an operation called a workflow. First, the structural annotation workflow results in gene and other feature calls. This is followed by a functional annotation workflow that combines sequence comparisons and conserved domain searching, which is contextualized to allow integrated evidence assessment in functional prediction. Finally, we describe a workflow used for comparative genomics. Using this multi-purpose platform enables researchers to easily and accurately annotate an entire phage genome. The portal can be accessed at with accompanying user training material.https://cpt.tamu.edu/galaxy-pub https://cpt.tamu.edu/training-material/ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008214Phage Kitchenhttps://trello.com/1/cards/6202ff7745a7472839e0b529/attachments/6202ffb95b625372d32d41c2/download/image.png
2202023-11-13T11:12:18.883ZCuffcompare
2212023-11-13T11:12:18.883ZCWL (nextflow, snamemake)
2222023-11-13T11:12:18.883ZDatabase reference
2232023-11-13T11:12:18.883ZDatabases
2242023-11-13T11:12:18.883ZdbCAN
2252023-11-13T11:12:18.883ZDeePhage: a tool for identifying temperate phage-derived and virulent phage-derived sequence in metavirome data using deep learningDeePhage is designed to identify metavirome sequences as temperate phage-derived and virulent phage-derived sequences. The program calculate a score reflecting the likelihood of each input fragment as temperate phage-derived and virulent phage-derived sequences. DeePhage can run either on the virtual machine or physical host. For non-computer professionals, we recommend running the virtual machine version of DeePhage on local PC. In this way, users do not need to install any dependency package. If GPU is available, you can also choose to run the physical host version. This version can automatically speed up with GPU and is more suitable to handle large scale data. The program is also available athttp://cqb.pku.edu.cn/ZhuLab/DeePhage/.Phage Kitchen
2262023-11-13T11:12:18.883ZDeepVirFinder: Identifying viruses from metagenomic data by deep learningDeepVirFinder predicts viral sequences using deep learning method. The method has good prediction accuracy for short viral sequences, so it can be used to predict sequences from the metagenomic data. DeepVirFinder significantly improves the prediction accuracy compared to our k-mer based method VirFinder by using convolutional neural networks (CNN). CNN can automatically learn genomic patterns from the viral and prokaryotic sequences and simultaneously build a predictive model based on the learned genomic patterns. The learned patterns are represented in the form of weight matrices of size 4 by k, where k is the length of the pattern. This representation is similar to the position weight matrix (PWM), the commonly used representation of biological motifs, which are also of size 4 by k and each column specifies the probabilities of having the 4 nucleotides at that position. When only one type of nucleotide can be chosen at each position with probability 1, the motif degenerates to a k-mer. Thus, the CNN is a natural generalization of k-mer based model. The more flexible CNN model indeed outperforms the k-mer based model on viral sequence prediction problem.https://github.com/jessieren/DeepVirFinderhttps://link.springer.com/article/10.1007/s40484-019-0187-4 https://link.springer.com/content/pdf/10.1007/s40484-019-0187-4.pdfPhage Kitchen
2272023-11-13T11:12:18.883ZDemovirDemocratic taxonomic classification of viral contigs to Order and Family level When performing metagenomic sequencing of Viral-Like Particle (VLP), the majority of returned sequences often bare little to no homology to reference sequences - Viral Dark Matter. Frequently it may be useful to know which viral taxonomic group these novel viruses are likely to belong to as this will give information about nucleic acid type, size and behaviour. DemoVir will classify viral contigs to the Order or Family taxonomic level by comparing genes on the amino acid level against the viral subset of the TrEMBL database, and then taking a vote of the Order and Family hits. Homology searches are performed by Usearch in order to increase speed. This type of method has previously been implemented in multiple published virome studies but to our knowledge none have performed benchmarking or made it available as a simple executable script easily downloaded and installed. **Note - DemoVir is for classification of sequences into viral families and orders only and should not be used for discrminating viral contigs from bacterial/archael/eukaryotic sequences in a metagenomic sample.**https://github.com/feargalr/DemovirPhage Kitchen
2282023-11-13T11:12:18.883ZDEPtH - Detection and Extraction of Phages ToolDEPhT is a new tool for identifying prophages in bacteria, and was developed with a particular interest in being able to rapidly scan hundreds to thousands of genomes and accurately extract complete (likely active) prophages from them. A detailed manuscript has been submitted to Nucleic Acids Research, but in brief DEPhT works by using genome architecture (rather than homology) to identify genomic regions likely to contain a prophage. Any regions with phage-like architecture (characterized as regions with high gene density and few transcription direction changes) are then further scrutinized using two passes of homology detection. * The first pass identifies genes on putative prophages that are homologs of (species/clade/genus-level) conserved bacterial genes, and uses any such genes to disrupt the prophage prediction. * The second pass (disabled in the 'fast' runmode) identifies genes on putative prophages that are homologs of conserved, functionally annotated phage genes. * Finally, prophage regions that got through the previous filters are subjected to a BLASTN-based attL/attR detection scheme that gives DEPhT better boundary detection than any tool we are aware of. **Description:** DEPhT is a new tool for identifying prophages in bacteria, and was developed with a particular interest in being able to rapidly scan hundreds to thousands of genomes and accurately extract complete (likely active) prophages from them. A detailed manuscript has been submitted to Nucleic Acids Research, but in brief DEPhT works by using genome architecture (rather than homology) to identify genomic regions likely to contain a prophage. Any regions with phage-like architecture (characterized as regions with high gene density and few transcription direction changes) are then further scrutinized using two passes of homology detection. * The first pass identifies genes on putative prophages that are homologs of (species/clade/genus-level) conserved bacterial genes, and uses any such genes to disrupt the prophage prediction. * The second pass (disabled in the 'fast' runmode) identifies genes on putative prophages that are homologs of conserved, functionally annotated phage genes. * Finally, prophage regions that got through the previous filters are subjected to a BLASTN-based attL/attR detection scheme that gives DEPhT better boundary detection than any tool we are aware of.https://trello.com/c/jkFJv63E/87-depth-detection-and-extraction-of-phages-toolPhage Kitchenhttps://trello.com/1/cards/61e73e0bf7184b8e5955dcac/attachments/61e73f3ba79a440a155436c2/download/image.png
2292023-11-13T11:12:18.883ZDIAMOND
2302023-11-13T11:12:18.883ZDigital Phagogramhttps://trello.com/c/RpFe431X/90-digital-phagogramPhage Kitchenhttps://trello.com/1/cards/61e754a4fa07561408df7647/attachments/61e754a7b8f5a2897553f1ab/download/image.png
2312023-11-13T11:12:18.883ZDistance-based
2322023-11-13T11:12:18.883ZDRAM - Distilled and Refined Annotation of MetabolismDRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases. DRAM is run in two stages. First an annotation step to assign database identifiers to gene and then a distill step to curate these annotations into useful functional categories. Additionally viral contigs are further analyzed during to identify potential AMGs. This is done via assigning an auxiliary score and flags representing the confidence that a gene is both metabolic and viral. For more detail on DRAM and how DRAM works please see our paper as well as the wiki. /wiki **Description:** DRAM (Distilled and Refined Annotation of Metabolism) is a tool for annotating metagenomic assembled genomes and VirSorter identified viral contigs. DRAM annotates MAGs and viral contigs using KEGG (if provided by the user), UniRef90, PFAM, dbCAN, RefSeq viral, VOGDB and the MEROPS peptidase database as well as custom user databases. DRAM is run in two stages. First an annotation step to assign database identifiers to gene and then a distill step to curate these annotations into useful functional categories. Additionally viral contigs are further analyzed during to identify potential AMGs. This is done via assigning an auxiliary score and flags representing the confidence that a gene is both metabolic and viral. For more detail on DRAM and how DRAM works please see our paper as well as the wiki. /wikihttps://github.com/shafferm/DRAM https://github.com/shafferm/DRAM https://github.com/shafferm/DRAM/wikihttps://academic.oup.com/nar/article/48/16/8883/5884738 https://academic.oup.com/nar/article/48/16/8883/5884738Phage Kitchenhttps://trello.com/1/cards/61932c07bad28b1fa56ae5d7/attachments/61932eab8d2cac726d8f7ed2/download/image.png
2332023-11-13T11:12:18.883ZEDGE bioinformatics - Empowering the Development of Genomics ExpertiseEDGE bioinformatics is intended to help truly democratize the use of Next Generation Sequencing for exploring genomes and metagenomes. Given that bioinformatic analysis is now the rate limiting factor in genomics, we developed EDGE bioinformatics with a user-friendly interface that allows scientists to perform a number of tailored analyses using many cutting-edge tools. A complete version of EDGE is available as a variety of packages that can fit individual needs, including source code, or images in VMware and Docker formats. For basic information about EDGE, visit the EDGE ABCs, that provide a brief overview of EDGE, the various workflows, and the computational environment restraints for local use.https://edgebioinformatics.org/Phage Kitchenhttps://trello.com/1/cards/6202fd9fc544f30bcf2439eb/attachments/6202fdc23b3fb3251f8e3378/download/image.png
2342023-11-13T11:12:18.883ZEfam
2352023-11-13T11:12:18.883ZEfam-XC
2362023-11-13T11:12:18.883ZeggNOG
2372023-11-13T11:12:18.883ZEligohttps://trello.com/c/nvGNnVhR/3-eligoPhage Kitchen
2382023-11-13T11:12:18.883ZFactoMineR (PCA)
2392023-11-13T11:12:18.883ZFageBank (Netherlands)https://www.fagenbank.nl/english/Phage Kitchen
2402023-11-13T11:12:18.883ZFASTA
2412023-11-13T11:12:18.883ZFastME
2422023-11-13T11:12:18.883ZFastp
2432023-11-13T11:12:18.883ZFastQC
2442023-11-13T11:12:18.883ZFelix d'Herelle Reference Center for Bacterial Viruses (CANADA)https://www.phage.ulaval.ca/en/phages-catalog/Phage Kitchen
2452023-11-13T11:12:18.883ZFIGfams
2462023-11-13T11:12:18.883ZFunctional annotation
2472023-11-13T11:12:18.883ZGene and accessory prediction
2482023-11-13T11:12:18.883ZGenemark(s)
2492023-11-13T11:12:41.087ZGenome detective - an automated system for virus identification from high-throughput sequencing dataGenome Detective is an easy to use web-based software application that assembles the genomes of viruses quickly and accurately. The application uses a novel alignment method that constructs genomes by reference-based linking of de novo contigs by combining amino-acids and nucleotide scores. The software was optimized using synthetic datasets to represent the great diversity of virus genomes. The application was then validated with next generation sequencing data of hundreds of viruses. User time is minimal and it is limited to the time required to upload the data. Supp data Availability and implementation Available online:https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioinformatics/35/5/10.1093_bioinformatics_bty695/2/bty695_supplementary_information.docx http://www.genomedetective.com/app/typingtool/virus/. https://doi.org/10.1093/bioinformatics/bty695Phage Kitchenhttps://trello.com/1/cards/6201e4fffa436475e2f6fa30/attachments/6201e58d1fb7793fba2806ff/download/image.png
2502023-11-13T11:12:18.883ZGerman Collection of Microorganisms and Cell Cultures (GERMANY)https://www.dsmz.de/collection/catalogue/microorganisms/special-groups-of-organisms/phagesPhage Kitchen
2512023-11-13T11:12:18.883ZGlimmer
2522023-11-13T11:12:18.883ZGOM (Genome Organisation Models)
2532023-11-13T11:12:18.883ZGRAViTy: Genome Relationships Applied to Virus TaxonomyGRAViTy - "Genome Relationships Applied to Virus Taxonomy" is an analysis pipeline that is effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into Orders (Aiewsakun and Simmonds, 2018). It can additionally be used to correctly differentiate assigned viruses from unassigned viruses and classify them into correct taxonomic groups. The method provides a rapid and objective means to explore metagenomic viral diversity and make informed recommendations for classification that are consistent with the current ICTV taxonomic framework. Methods like GRAViTy are increasingly required as the vast diversity of viruses found in metagenomic sequence datasets is explored.https://github.com/PAiewsakun/GRAViTyhttps://www.microbiologyresearch.org/content/journal/jgv/10.1099/jgv.0.001110 https://link.springer.com/article/10.1007%2Fs00705-018-3938-z https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0422-7 http://gravity.cvr.gla.ac.uk/Phage Kitchenhttps://trello.com/1/cards/6178b2f91667bb2c47a57b47/attachments/6178b33f7a97de18b8fc8033/download/gravity_overview_small.png
2542023-11-13T11:12:18.883ZHaskell
2552023-11-13T11:12:18.883ZHecatombA hecatomb is a great sacrifice or an extensive loss. Heactomb the software empowers an analyst to make data driven decisions to 'sacrifice' false-positive viral reads from metagenomes to enrich for true-positive viral reads. This process frequently results in a great loss of suspected viral sequences / contigs.https://github.com/shandley/hecatombhttps://hecatomb.readthedocs.io/en/latest/Phage Kitchenhttps://trello.com/1/cards/61bac43d4ed8e71713a429d7/attachments/61bac51c7ec5eb88bf53a977/download/image.png
2562023-11-13T11:12:18.883Zhhmake
2572023-11-13T11:12:18.883Zhhsuite
2582023-11-13T11:12:18.883Zhmmer3nhmmer: DNA homology search with profile HMMs
2592023-11-13T11:12:18.883ZHmmscan
2602023-11-13T11:12:18.883Zhmmsearch
2612023-11-13T11:12:18.883ZHost prediction
2622023-11-13T11:12:18.883ZHost Taxon Predictor
2632023-11-13T11:12:18.883ZHTP - Host Taxon PredictorThe initial repo was split into two parts. This part contains a software designed to fetch complete viral genomic reference sequences from NCBI Nucleotide, get viral host's lineage from NCBI Taxonomy and transform the sequence into some features. The second part, available on has been designed to infer host of previously unknown virus. Recent advances in metagenomics provided a valuable alternative to culture-based approaches for better sampling viral diversity. However, some of newly identified viruses lack sequence similarity to any of previously sequenced ones, and cannot be easily assigned to their hosts. Here we present a bioinformatic approach to this problem. We developed classifiers capable of distinguishing eukaryotic viruses from the phages achieving almost 95% prediction accuracy. The classifiers are wrapped in Host Taxon Predictor (HTP) software written in Python which is freely available at . HTP‚Äôs performance was later demonstrated on a collection of newly identified viral genomes and genome fragments. In summary, HTP is a culture- and alignment-free approach for distinction between phages and eukaryotic viruses. We have also shown that it is possible to further extend our method to go up the evolutionary tree and predict whether a virus can infect narrower taxa. **Description:** The initial repo was split into two parts. This part contains a software designed to fetch complete viral genomic reference sequences from NCBI Nucleotide, get viral host's lineage from NCBI Taxonomy and transform the sequence into some features. The second part, available on has been designed to infer host of previously unknown virus. Recent advances in metagenomics provided a valuable alternative to culture-based approaches for better sampling viral diversity. However, some of newly identified viruses lack sequence similarity to any of previously sequenced ones, and cannot be easily assigned to their hosts. Here we present a bioinformatic approach to this problem. We developed classifiers capable of distinguishing eukaryotic viruses from the phages achieving almost 95% prediction accuracy. The classifiers are wrapped in Host Taxon Predictor (HTP) software written in Python which is freely available at . HTP’s performance was later demonstrated on a collection of newly identified viral genomes and genome fragments. In summary, HTP is a culture- and alignment-free approach for distinction between phages and eukaryotic viruses. We have also shown that it is possible to further extend our method to go up the evolutionary tree and predict whether a virus can infect narrower taxa.https://github.com/wojciech-galan/Viral_feature_extractor https://github.com/wojciech-galan/viruses_classifier https://github.com/wojciech-galan/viruses_classifier https://github.com/wojciech-galan/viruses_classifier. https://github.com/wojciech-galan/Viral_feature_extractor https://github.com/wojciech-galan/viruses_classifier https://github.com/wojciech-galan/viruses_classifier https://github.com/wojciech-galan/viruses_classifier.https://www.nature.com/articles/s41598-019-39847-2 https://www.nature.com/articles/s41598-019-39847-2Phage Kitchen
2642023-11-13T11:12:18.883ZIdentification
2652023-11-13T11:12:18.883ZIdentification/classification
2662023-11-13T11:12:18.883ZIMG/VR
2672023-11-13T11:12:18.883ZIMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated virusesViruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at and the underlying data are available to download athttps://img.jgi.doe.gov/vr, https://genome.jgi.doe.gov/portal/IMG_VR. https://doi.org/10.1093/nar/gkaa946Phage Kitchenhttps://trello.com/1/cards/6178b4a04f04fc85bf214923/attachments/6178b4d75f1f0e06e212ce16/download/gkaa946fig1.jpeg
2682023-11-13T11:12:18.883Zinfernal (rRNA)
2692023-11-13T11:12:18.883ZINPHARED - INfrastructure for a PHAge REference DatabaseProviding up-to-date bacteriophage genome databases, metrics and useful input files for a number of bioinformatic pipelines including vConTACT2 and MASH. The aim is to produce a useful starting point for viral genomics and meta-omics. Citation: If you find our database useful, please see our recently published paper in PHAGE HERE Cook R, Brown N, Redgwell T, Rihtman B, Barnes M, Clokie M, Stekel DJ, Hobman JL, Jones MA, Millard A. INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes. PHAGE. 2021. Available from:https://github.com/RyanCook94/inpharedhttp://doi.org/10.1089/phage.2021.0007.Phage Kitchen
2702023-11-13T11:12:18.883ZINPHARED - INfrastructure for a PHAge REference Database: Identification of large-scale biases in the current collection of phage genomes.inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible. Providing up-to-date bacteriophage genome databases, metrics and useful input files for a number of bioinformatic pipelines including vConTACT2 and MASH. The aim is to produce a useful starting point for viral genomics and meta-omics. **Description:**https://github.com/RyanCook94/inpharedhttps://leicester.figshare.com/articles/dataset/INPHARED_DATABASE/14242085 https://doi.org/10.1101/2021.05.01.442102 https://link.springer.com/protocol/10.1007/978-1-4939-7343-9_17 https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC3965101/Phage Kitchenhttps://trello.com/1/cards/6178b384f7c0026429e621f4/attachments/618b21c1df5339278f3169eb/download/image.png
2712023-11-13T11:12:18.883ZINSDC
2722023-11-13T11:12:18.883ZInterpro
2732023-11-13T11:12:18.883Zinterproscan
2742023-11-13T11:12:18.883ZIRF-finder
2752023-11-13T11:12:18.883ZIsraeli Biobankhttps://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC7277922/Phage Kitchen
2762023-11-13T11:12:18.883ZJackhmmer
2772023-11-13T11:12:18.883ZKaiju
2782023-11-13T11:12:18.883ZKEGG
2792023-11-13T11:12:18.883Zkmer-DB
2802023-11-13T11:12:18.883ZKraken(2)
2812023-11-13T11:12:18.883ZKrona
2822023-11-13T11:12:18.883ZLanguages
2832023-11-13T11:12:18.883ZLASTZ (circularity)
2842023-11-13T11:12:18.883ZLifestyle prediction
2852023-11-13T11:12:18.883ZLipoP
2862023-11-13T11:12:18.883ZMapping
2872023-11-13T11:12:18.883ZMARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic BinsHere we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. **Description:** Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach.https://github.com/LaboratorioBioinformatica/MARVEL https://github.com/LaboratorioBioinformatica/MARVELhttps://www.frontiersin.org/articles/10.3389/fgene.2018.00304/full https://www.frontiersin.org/articles/10.3389/fgene.2018.00304/fullPhage Kitchen
2882023-11-13T11:12:18.883ZMASH
2892023-11-13T11:12:18.883Zmashmap
2902023-11-13T11:12:18.883ZMATLAB
2912023-11-13T11:12:18.883ZMAVRICH - Bacteriophage evolution differs by host, lifestyle and genomeBacteriophages play key roles in microbial evolution, marine nutrient cycling and human disease. Phages are genetically diverse, and their genome architectures are characteristically mosaic, driven by horizontal gene transfer with other phages and host genomes. As a consequence, phage evolution is complex and their genomes are composed of genes with distinct and varied evolutionary histories. However, there are conflicting perspectives on the roles of mosaicism and the extent to which it generates a spectrum of genome diversity or genetically discrete populations. Here, we show that bacteriophages evolve within two general evolutionary modes that differ in the extent of horizontal gene transfer by an order of magnitude. Temperate phages distribute into high and low gene flux modes, whereas lytic phages share only the lower gene flux mode. The evolutionary modes are also a function of the bacterial host and different proportions of temperate and lytic phages are distributed in either mode depending on the host phylum. Groups of genetically related phages fall into either the high or low gene flux modes, suggesting there are genetic as well as ecological drivers of horizontal gene transfer rates. Consequently, genome mosaicism varies depending on the host, lifestyle and genetic constitution of phages. **Description:** Bacteriophages play key roles in microbial evolution, marine nutrient cycling and human disease. Phages are genetically diverse, and their genome architectures are characteristically mosaic, driven by horizontal gene transfer with other phages and host genomes. As a consequence, phage evolution is complex and their genomes are composed of genes with distinct and varied evolutionary histories. However, there are conflicting perspectives on the roles of mosaicism and the extent to which it generates a spectrum of genome diversity or genetically discrete populations. Here, we show that bacteriophages evolve within two general evolutionary modes that differ in the extent of horizontal gene transfer by an order of magnitude. Temperate phages distribute into high and low gene flux modes, whereas lytic phages share only the lower gene flux mode. The evolutionary modes are also a function of the bacterial host and different proportions of temperate and lytic phages are distributed in either mode depending on the host phylum. Groups of genetically related phages fall into either the high or low gene flux modes, suggesting there are genetic as well as ecological drivers of horizontal gene transfer rates. Consequently, genome mosaicism varies depending on the host, lifestyle and genetic constitution of phages.https://www.nature.com/articles/nmicrobiol2017112 https://www.nature.com/articles/nmicrobiol2017112Phage Kitchenhttps://trello.com/1/cards/618224f2f47cb25bb3df551f/attachments/61a71964841b764308607aa1/download/image.png
2922023-11-13T11:12:18.883ZMCL
2932023-11-13T11:12:18.883Zmegahit
2942023-11-13T11:12:18.883ZMEROPS
2952023-11-13T11:12:18.883Zmetabat2
2962023-11-13T11:12:18.883ZMetaGeneAnnotator
2972023-11-13T11:12:18.883ZMetagenome enabled
2982023-12-26T17:22:05.670ZMetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing DataTo assist the nonspecialist in the decision-making process and facilitate workflow management, we present here MetaPhage (MP), a fully automated computational pipeline for quality control, assembly, and phage detection as well as classification and quantification of these phages in metagenomics data. The pipeline is modular and enables the user to skip some of the steps and recover analysis in the event of execution errors. To guarantee scalability and reproducibility, MetaPhage was implemented in Nextflow (NF) (8), a workflow manager that uses software containers to allow easy installation. The pipeline can be run on a single computer or parallelized on an high performance computing (HPC) cluster. MetaPhage also implements a novel algorithm that delivers automatic taxonomic classification of phage contigs from the vConTACT2 (9) network graph implemented in the workflow. Results for each step of the analysis are reported on a rich and easy-to-read html report that can be opened and inspected on any web browser.https://github.com/MattiaPandolfoVR/MetaPhagehttps://doi.org/10.1128/msystems.00741-22Phage Kitchenhttps://trello.com/1/cards/63225df719c457038e5d5017/attachments/63225e779c42e5017272e5e1/download/image.png
2992023-12-26T17:21:42.920ZMetaPhinder‚ Identifying Bacteriophage Sequences in Metagenomic Data SetsBacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology while the source code can be downloaded from orhttps://bitbucket.org/genomicepidemiology/metaphinder https://github.com/vanessajurtz/MetaPhinder.https://cge.cbs.dtu.dk/services/MetaPhinder/, https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163111Phage Kitchen
3002023-11-13T11:12:18.883ZmetaSPAdes
3012023-11-13T11:12:18.883ZMETAVIRALSPADES: assembly of viruses from metagenomic dataWe describe a METAVIRALSPADES tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked METAVIRALSPADES on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines. **Availability and implementation** METAVIRALSPADES includes VIRALASSEMBLY, VIRALVERIFY and VIRALCOMPLETE modules that are available as standalone packages: andhttps://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ https://github.com/ablab/viralComplete/.https://academic.oup.com/bioinformatics/article/36/14/4126/5837667Phage Kitchen
3022023-11-13T11:12:18.883ZMetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysisMetaWRAP aims to be an easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish: read quality control, assembly, visualization, taxonomic profiling, extracting draft genomes (binning), and functional annotation. Additionally, metaWRAP takes bin extraction and analysis to the next level (see module overview below). While there is no single best approach for processing metagenomic data, metaWRAP is meant to be a fast and simple approach before you delve deeper into parameterization of your analysis. MetaWRAP can be applied to a variety of environments, including gut, water, and soil microbiomes (see metaWRAP paper for benchmarks). Each individual module of metaWRAP is a standalone program, which means you can use only the modules you are interested in for your data.https://github.com/bxlab/metaWRAPhttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0541-1Phage Kitchenhttps://trello.com/1/cards/6201e658cb39c1591b145040/attachments/6201e6664ee0d5422a604141/download/image.png
3032023-11-13T11:12:18.883ZMGnify: the microbiome analysis resource in 2020MGnify ( provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.http://www.ebi.ac.uk/metagenomics) https://academic.oup.com/nar/article/48/D1/D570/5614179Phage Kitchen
3042023-11-13T11:12:18.883ZMGV - Metagenomic Gut Virus catalogue viral detection pipelineThe Metagenomic Gut Virus catalogue improves detection of viruses in stool metagenomes and accounts for nearly 40% of CRISPR spacers found in human gut Bacteria and Archaea. We also produced a catalogue of 459,375 viral protein clusters to explore the functional potential of the gut virome Access to the full catalogue of viral genomes, protein clusters, diversity-generating retroelements and CRISPR spacers is provided without restrictions at Any requests for further data should be directed to the corresponding authors. **Description:** The Metagenomic Gut Virus catalogue improves detection of viruses in stool metagenomes and accounts for nearly 40% of CRISPR spacers found in human gut Bacteria and Archaea. We also produced a catalogue of 459,375 viral protein clusters to explore the functional potential of the gut virome Access to the full catalogue of viral genomes, protein clusters, diversity-generating retroelements and CRISPR spacers is provided without restrictions at Any requests for further data should be directed to the corresponding authors.https://github.com/snayfach/MGV https://github.com/snayfach/MGVhttps://www.nature.com/articles/s41564-021-00928-6 https://portal.nersc.gov/MGV. https://www.nature.com/articles/s41564-021-00928-6 https://portal.nersc.gov/MGV.Phage Kitchenhttps://trello.com/1/cards/6183207b8771288e6a739eab/attachments/61832100ae71055d17d6d6b8/download/MGV.png
3052023-11-13T11:12:18.883ZMGV/aaicluster
3062023-11-13T11:12:18.883ZMGV/anicluster
3072023-11-13T11:12:18.883ZMGV/crispr_spacers
3082023-11-13T11:12:18.883ZMGV/marker_gene_tree
3092023-11-13T11:12:18.883ZMGV/snptree
3102023-11-13T11:12:18.883ZMIST
3112023-11-13T11:12:18.883ZML-based (outside of HMMs)
3122023-11-13T11:12:18.883ZMMSeqs2 (HMM profile search)
3132023-12-26T17:30:47.559ZMultiPHATE2 - bioinformatics pipeline for functional annotation of phage isolatesMULTIPHATE2 - > MULTIPHATE - > PHANNOTATE -> **ABOUT THE MULTI-PHATE PIPELINE DRIVER** MultiPhATE is a command-line program that runs gene finding and the PhATE annotation code over user-specified phage genomes, then performs gene-by-gene comparisons among the genomes. The multiPhate.py code takes a single argument consisting of a configuration file (hereafter referred to as, multiPhate.config; use the file sample.multiPhate.config as starting point) and uses it to specify annotation parameters. Then, multiPhate.py invokes the PhATE pipeline for each genome. See below for the types of annotations that PhATE performs. If two or more genomes are specified by the user, then multiPhATE will run the CompareGeneProfiles code to identify corresponding genes among the genomes. **ABOUT THE PHATE PIPELINE** PhATE is a fully automated computational pipeline for identifying and annotating phage genes in genome sequence. PhATE is written in Python 3.7, and runs on Linux and Mac operating systems. Code execution is controled by a configuration file, which can be tailored to run specific gene finders and to blast sequences against specific phage- and virus-centric data sets, in addition to more generic (genome, protein) data sets. See below for the specific databases that are accommodated. PhATE runs at least one gene finding algorithm, then annotates the genome, gene, and protein sequences using nucleotide and protein blast flavors and a set of fasta sequence databases, and uses hmm searches (phmmer, jackhmmer) against these same fasta databases. It also runs hmmscan against the pVOG and VOG hmm profile databases. If more than one gene finder is run, PhATE will provide a side-by-side comparison of the genes called by each gene caller. The user specifies the preferred gene caller, and the genes and proteins predicted by that caller are annotated using blast against the supporting databases (or, the user may specify one of the comparison gene sets: superset, consensus, or commoncore, for functional annotation). Classification of each protein sequence into a pVOG or VOG group is followed by generation of an alignment-ready fasta file. By convention, genome sequence files end with extension, ".fasta"; gene nucleotide fasta files end with, ".fnt", and cds amino-acid fasta files end with, ".faa". **Description:** MULTIPHATE2 - > MULTIPHATE - > PHANNOTATE -> **ABOUT THE MULTI-PHATE PIPELINE DRIVER** MultiPhATE is a command-line program that runs gene finding and the PhATE annotation code over user-specified phage genomes, then performs gene-by-gene comparisons among the genomes. The multiPhate.py code takes a single argument consisting of a configuration file (hereafter referred to as, multiPhate.config; use the file sample.multiPhate.config as starting point) and uses it to specify annotation parameters. Then, multiPhate.py invokes the PhATE pipeline for each genome. See below for the types of annotations that PhATE performs. If two or more genomes are specified by the user, then multiPhATE will run the CompareGeneProfiles code to identify corresponding genes among the genomes. **ABOUT THE PHATE PIPELINE** PhATE is a fully automated computational pipeline for identifying and annotating phage genes in genome sequence. PhATE is written in Python 3.7, and runs on Linux and Mac operating systems. Code execution is controled by a configuration file, which can be tailored to run specific gene finders and to blast sequences against specific phage- and virus-centric data sets, in addition to more generic (genome, protein) data sets. See below for the specific databases that are accommodated. PhATE runs at least one gene finding algorithm, then annotates the genome, gene, and protein sequences using nucleotide and protein blast flavors and a set of fasta sequence databases, and uses hmm searches (phmmer, jackhmmer) against these same fasta databases. It also runs hmmscan against the pVOG and VOG hmm profile databases. If more than one gene finder is run, PhATE will provide a side-by-side comparison of the genes called by each gene caller. The user specifies the preferred gene caller, and the genes and proteins predicted by that caller are annotated using blast against the supporting databases (or, the user may specify one of the comparison gene sets: superset, consensus, or commoncore, for functional annotation). Classification of each protein sequence into a pVOG or VOG group is followed by generation of an alignment-ready fasta file. By convention, genome sequence files end with extension, ".fasta"; gene nucleotide fasta files end with, ".fnt", and cds amino-acid fasta files end with, ".faa".https://github.com/carolzhou/multiPhATE2 https://github.com/carolzhou/multiPhATE2http://dx.doi.org/10.1093/g3journal/jkab074 https://doi.org/10.1093/bioinformatics/btz258 https://doi.org/10.1093/bioinformatics/btz265 http://dx.doi.org/10.1093/g3journal/jkab074 https://doi.org/10.1093/bioinformatics/btz258 https://doi.org/10.1093/bioinformatics/btz265Phage Kitchenhttps://trello.com/1/cards/6178a033dfc62f89a9a87e65/attachments/618231016a709d5d597ec768/download/m_jkab074f1.jpeg
3142023-11-13T11:12:18.883Zmultiqc
3152023-11-13T11:12:18.883ZMUSCLE
3162023-11-13T11:12:18.883ZNaming phages - literaturehttps://trello.com/c/FI9EF6ut/18-naming-phages-literaturePhage Kitchenhttps://trello.com/1/cards/6178b1e32fc0055c58b5fc6a/attachments/6178b2141e3d2f8f9d59cbec/download/viruses-09-00070-v2.pdf
3172023-11-13T11:12:18.883ZNational Collection of Type Cultures (UK)https://www.bacteriophage.news/database/uk-national-collection-of-type-cultures/Phage Kitchen
3182023-11-13T11:12:18.883ZNCBI (nr, refseq, taxonomy...)
3192023-11-13T11:12:18.883ZOnePetri: accelerating common bacteriophage Petri dish assays with computer visionOnePetri uses machine learning models & computer vision to automatically detect Petri dishes and plaques, count plaques, and perform common assay calculations with these values (plaque/titration assay). Note that as of now, OnePetri only works with circular Petri dishes; however, other shapes (square & rectangle) may be added if sufficient training images can be obtained. Additionally, the models used in the app require one plate per dilution, and as such, spot assays are not currently supported. All image processing & detection is done locally on-device, with no need for an internet connection once the app has been installed. As such, OnePetri does not collect, store, or transmit any user data or images. Updates are likely to be released regularly, so regular access to the internet is strongly recommended.https://github.com/mshamash/OnePetri https://github.com/mshamash/onepetri-benchmarkhttps://www.biorxiv.org/content/10.1101/2021.09.27.460959v1 https://onepetri.ai/Phage Kitchenhttps://trello.com/1/cards/6183367f52ca4f0ab766ba64/attachments/61833718be698a10bdd5cc90/download/F2.large.jpg
3202023-11-13T11:12:18.883ZOPTSIL (clustering)
3212023-11-13T11:12:18.883ZOther tools
3222023-11-13T11:12:18.883ZPaper - A Roadmap for Genome-Based Phage TaxonomyBacteriophage (phage) taxonomy has been in flux since its inception over four decades ago. Genome sequencing has put pressure on the classification system and recent years have seen significant changes to phage taxonomy. Here, we reflect on the state of phage taxonomy and provide a roadmap for the future, including the abolition of the order Caudovirales and the families Myoviridae, Podoviridae, and Siphoviridae. Furthermore, we specify guidelines for the demarcation of species, genus, subfamily and family-level ranks of tailed phage taxonomy.https://www.mdpi.com/1999-4915/13/3/506Phage Kitchen
3232023-11-13T11:12:18.883ZPaper - Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomesWe assessed the suitability of Illumina technology for high-throughput sequencing and subsequent assembly of phage genomes. In silico datasets reveal that 30√ó coverage is sufficient to correctly assemble the complete genome of ~98.5% of known phages, with experimental data confirming that the majority of phage genomes can be assembled at 30√ó coverage. Furthermore, in silico data demonstrate it is possible to co-sequence multiple phages from different hosts, without introducing assembly errors.https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC4893331/Phage Kitchen
3242023-11-13T11:12:18.883ZPaper - Virome Sequencing of the Human Intestinal Mucosal–Luminal InterfaceWhile the human gut virome has been increasingly explored in recent years, nearly all studies have been limited to fecal sampling. The mucosal–luminal interface has been established as a viable sample type for profiling the microbial biogeography of the gastrointestinal tract. We have developed a protocol to extract nucleic acids from viruses at the mucosal–luminal interface of the proximal and distal colon. Colonic viromes from pediatric patients with Crohn's disease demonstrated high interpatient diversity and low but significant intrapatient variation between sites. Whole metagenomics was also performed to explore virome–bacteriome interactions and to compare the viral communities observed in virome and whole metagenomic sequencing. A site-specific study of the human gut virome is a necessary step to advance our understanding of virome–bacteriome–host interactions in human diseases. **Keywords:** virome, bacteriophage, phage, microbiome, gut mucosa, phageome, gut microbiomehttps://trello.com/c/lvmH9lW7/76-paper-virome-sequencing-of-the-human-intestinal-mucosal-luminal-interfacePhage Kitchenhttps://trello.com/1/cards/618af7a5f7fa9d85b78501aa/attachments/618af83e82b11c348d3c8337/download/Image_1.JPEG
3252023-11-13T11:12:18.883ZPATRIChttps://trello.com/c/rE3cchQQ/80-patricPhage Kitchen
3262023-11-13T11:12:18.883ZPDB
3272023-11-13T11:12:18.883Zpdm_utils - SEA-PHAGES program to create, update, and maintain MySQL phage genomics databases``pdm_utils`` is a Python package designed in combination with a pre-defined MySQL database schema in order to facilitate the creation, management, and manipulation of phage genomics databases in the :seaphages:`SEA-PHAGES program <>`. The package is directly connected to the structure of the MySQL database, and it provides several types of functionality: 1. :ref:`Python library ` including: a. Classes to store/parse phage genomes, interact with a local MySQL genomics database, and manage the process of making database changes. b. Functions and methods to manipulate those classes as well as interact with several databases and servers, including PhagesDB, GenBank, PECAAN, and MySQL. 2. A command line :ref:`toolkit ` to process data and maintain a phage genomics database. ``pdm_utils`` is useful for: 1. The Hatfull lab to maintain MySQL phage genomics databases in the SEA-PHAGES program (:ref:`current pipeline `). 2. Researchers to evaluate new genome annotations (:ref:`flat file QC `). 3. Researchers to directly access and retrieve phage genomics data from any compatible MySQL database (:ref:`tutorial `). 4. Researchers to create :ref:`custom ` MySQL phage genomics databases. 5. Developers to build downstream data analysis tools (:ref:`tutorial `).https://github.com/SEA-PHAGES/pdm_utilsPhage Kitchenhttps://trello.com/1/cards/61833a9614f91804d55da1af/attachments/61833adfdc032a1f16a4d66b/download/schema_10_map.jpg
3282023-11-13T11:12:18.883ZPerl
3292023-11-13T11:12:18.883ZPFAM
3302023-11-13T11:12:18.883ZPhaGCN - GCN based model classifierPhaGCN is a GCN based model, which can learn the species masking feature via deep learning classifier, for new Phage taxonomy classification. To use PhaGCN, you only need to input your contigs to the program. **Description:** PhaGCN is a GCN based model, which can learn the species masking feature via deep learning classifier, for new Phage taxonomy classification. To use PhaGCN, you only need to input your contigs to the program.https://github.com/KennthShang/PhaGCN https://github.com/KennthShang/PhaGCNhttps://academic.oup.com/bioinformatics/article/37/Supplement_1/i25/6319660 https://academic.oup.com/bioinformatics/article/37/Supplement_1/i25/6319660Phage Kitchenhttps://trello.com/1/cards/61833bc5e728741ec9f6b8b5/attachments/61ba7f0925743d0f0be8deee/download/image.png
3312023-11-13T11:12:18.883ZPhage Commander, an Application for Rapid Gene Identification in Bacteriophage Genomes Using Multiple ProgramsWe present the Phage Commander application for rapid identification of bacteriophage genes using multiple gene identification programs. Phage Commander runs a bacteriophage genome sequence through nine gene identification programs (and an additional program for identification of tRNAs) and integrates the results within a single output table. Phage Commander also generates formatted output files for direct export to National Center for Biotechnology Information GenBank or genome visualization programs such as DNA Master. **Description:** We present the Phage Commander application for rapid identification of bacteriophage genes using multiple gene identification programs. Phage Commander runs a bacteriophage genome sequence through nine gene identification programs (and an additional program for identification of tRNAs) and integrates the results within a single output table. Phage Commander also generates formatted output files for direct export to National Center for Biotechnology Information GenBank or genome visualization programs such as DNA Master.https://github.com/sarah-harris/PhageCommander https://github.com/sarah-harris/PhageCommanderhttps://www.liebertpub.com/doi/full/10.1089/phage.2020.0044 https://www.liebertpub.com/doi/full/10.1089/phage.2020.0044Phage Kitchen
3322023-11-13T11:12:18.883ZPhage related tools listing - git reposhttps://github.com/sxh1136/Phage_tools https://github.com/voorloopnul/awesome-phagesPhage Kitchen
3332023-12-26T17:30:18.414ZPhageAI - PhageAI is an Artificial Intelligence application for your daily Phage ResearchPhageAI is an application that simultaneously represents a repository of knowledge of bacteriophages and a tool to analyse genomes with Artificial Intelligence support. ## Framework modules Set of methods related with: * `lifecycle` - bacteriophage lifecycle prediction: * `.predict(fasta_path)` - return bacteriophage lifecycle prediction class (Virulent, Temperate or Chronic) with probability (%); * `taxonomy` - bacteriophage taxonomy order, family and genus prediction (TBA); * `topology` - bacteriophage genome topology prediction (TBA); * `repository` - set of methods related with PhageAI bacteriophage repository: * `.get_record(value)` - return dict with Bacteriophage meta-data * `.get_top10_similar_phages(value)` - return list of dicts contained top-10 most similar bacteriophages ------ Machine Learning algorithms can process enormous amounts of data in relatively short time in order to find connections and dependencies that are unobvious for human beings. Correctly designed applications based on AI are able to vastly improve and speed up the work of the domain experts. Models based on DNA contextual vectorization and Deep Neural Networks are particularly effective when it comes to analysis of genomic data. The system that we propose aims to use the phages sequences uploaded to the database to build a model which is able to predict if a bacteriophage is virulent, temperate or chronic with a high probability. One of the key system modules is the bacteriophages repository with a clean web interface that allows to browse, upload and share data with other users. The gathered knowledge about the bacteriophages is not only valuable on its own but also because of the ability to train the ever-improving Machine Learning models. Detection of virulent or temperate features is only one of the first tasks that can be solved with Artificial Intelligence. The combination of Biology, Natural Language Processing and Machine Learning allows us to create algorithms for genomic data processing that could eventually turn out to be effective in a wide range of problems with focus on classification and information extraced from DNA. **Description:** PhageAI is an application that simultaneously represents a repository of knowledge of bacteriophages and a tool to analyse genomes with Artificial Intelligence support. **Framework modules** Set of methods related with: * `lifecycle` - bacteriophage lifecycle prediction: * `.predict(fasta_path)` - return bacteriophage lifecycle prediction class (Virulent, Temperate or Chronic) with probability (%); * `taxonomy` - bacteriophage taxonomy order, family and genus prediction (TBA); * `topology` - bacteriophage genome topology prediction (TBA); * `repository` - set of methods related with PhageAI bacteriophage repository: * `.get_record(value)` - return dict with Bacteriophage meta-data * `.get_top10_similar_phages(value)` - return list of dicts contained top-10 most similar bacteriophages ------ Machine Learning algorithms can process enormous amounts of data in relatively short time in order to find connections and dependencies that are unobvious for human beings. Correctly designed applications based on AI are able to vastly improve and speed up the work of the domain experts. Models based on DNA contextual vectorization and Deep Neural Networks are particularly effective when it comes to analysis of genomic data. The system that we propose aims to use the phages sequences uploaded to the database to build a model which is able to predict if a bacteriophage is virulent, temperate or chronic with a high probability. One of the key system modules is the bacteriophages repository with a clean web interface that allows to browse, upload and share data with other users. The gathered knowledge about the bacteriophages is not only valuable on its own but also because of the ability to train the ever-improving Machine Learning models. Detection of virulent or temperate features is only one of the first tasks that can be solved with Artificial Intelligence. The combination of Biology, Natural Language Processing and Machine Learning allows us to create algorithms for genomic data processing that could eventually turn out to be effective in a wide range of problems with focus on classification and information extraced from DNA.https://github.com/phageaisa/phageai https://github.com/phageaisa/phageaihttps://phage.ai/accounts/login/?next=/ https://phage.ai/accounts/login/?next=/Phage Kitchen
3342023-11-13T11:12:18.883ZPhagePromoter - Predicting promoters in phage genomesThe growing interest in phages as antibacterial agents has led to an increase in the number of sequenced phage genomes, increasing the need for intuitive bioinformatics tools for performing genome annotation. The identification of phage promoters is indeed the most difficult step of this process. Due to the lack of online tools for phage promoter prediction, we developed PhagePromoter, a tool for locating promoters in phage genomes, using machine learning methods. This is the first online tool for predicting promoters that uses phage promoter data and the first to identify both host and phage promoters with different motifs. Availability and implementation This tool was integrated in the Galaxy framework and it is available online at: **Description:** The growing interest in phages as antibacterial agents has led to an increase in the number of sequenced phage genomes, increasing the need for intuitive bioinformatics tools for performing genome annotation. The identification of phage promoters is indeed the most difficult step of this process. Due to the lack of online tools for phage promoter prediction, we developed PhagePromoter, a tool for locating promoters in phage genomes, using machine learning methods. This is the first online tool for predicting promoters that uses phage promoter data and the first to identify both host and phage promoters with different motifs. Availability and implementation This tool was integrated in the Galaxy framework and it is available online at:https://bit.ly/2Dfebfv. https://academic.oup.com/bioinformatics/article/35/24/5301/5540317 https://academic.oup.com/bioinformatics/article/35/24/5301/5540317 https://bit.ly/2Dfebfv.Phage Kitchen
3352023-11-13T11:12:18.883ZphageReceptor - phage-host receptor interactionsphageReceptor is a database of phage-host receptor interactions, which included 427 pairs of phage-host receptor interactions, 341 unique viral species/sub-species, and 69 bacterial species. Based on phageReceptor, we systematically analyzed the associations between phage-host receptor interactions, and characterized the phage protein receptors by structure, function, protein-protein interaction and expression.https://dx.doi.org/10.1093/BIOINFORMATICS/BTAA123 http://www.computationalbiology.cn/phageReceptor/index.htmlPhage Kitchen
3362023-11-13T11:12:18.883ZPhageTerm: a tool for fast and accurate determination of phage termini and packaging mechanism using next-generation sequencing dataIn this work, we demonstrate how it is possible to recover more information from sequencing data than just the phage genome. We developed a theoretical and statistical framework to determine DNA termini and phage packaging mechanisms using NGS data. Our method relies on the detection of biases in the number of reads, which are observable at natural DNA termini compared with the rest of the phage genome. We implemented our method with the creation of the software PhageTerm and validated it using a set of phages with well-established packaging mechanisms representative of the termini diversity, i.e. 5′cos (Lambda), 3′cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of nine Clostridium difficile phages and six phages whose sequences were retrieved from the Sequence Read Archive. PhageTerm is freely available (), as a Galaxy ToolShed and on a Galaxy-based server (https://sourceforge.net/projects/phageterm https://sourceforge.net/projects/phageterm), https://galaxy.pasteur.fr). https://www.nature.com/articles/s41598-017-07910-5Phage Kitchen
3372023-11-13T11:12:18.883ZPhageTermVirome - High-throughput identification of viral termini and packaging mechanisms in virome datasetsHere, we introduce PhageTermVirome (PTV) as a tool for the easy and rapid high-throughput determination of phage termini and packaging mechanisms using modern large-scale metagenomics datasets. We successfully tested the PTV algorithm on a mock virome dataset and then used it on two real virome datasets to achieve the rapid identification of more than 100 phage termini and packaging mechanisms, with just a few hours of computing time. Because PTV allows the identification of free fully formed viral particles (by recognition of termini present only in encapsidated DNA), it can also complement other virus identification softwares to predict the true viral origin of contigs in viral metagenomics datasets. PTV is a novel and unique tool for high-throughput characterization of phage genomes, including phage termini identification and characterization of genome packaging mechanisms. This software should help researchers better visualize, map and study the virosphere. PTV is freely available for downloading and installation at **Description:** Here, we introduce PhageTermVirome (PTV) as a tool for the easy and rapid high-throughput determination of phage termini and packaging mechanisms using modern large-scale metagenomics datasets. We successfully tested the PTV algorithm on a mock virome dataset and then used it on two real virome datasets to achieve the rapid identification of more than 100 phage termini and packaging mechanisms, with just a few hours of computing time. Because PTV allows the identification of free fully formed viral particles (by recognition of termini present only in encapsidated DNA), it can also complement other virus identification softwares to predict the true viral origin of contigs in viral metagenomics datasets. PTV is a novel and unique tool for high-throughput characterization of phage genomes, including phage termini identification and characterization of genome packaging mechanisms. This software should help researchers better visualize, map and study the virosphere. PTV is freely available for downloading and installation athttps://gitlab.pasteur.fr/vlegrand/ptv. https://www.nature.com/articles/s41598-021-97867-3 https://www.nature.com/articles/s41598-021-97867-3 https://gitlab.pasteur.fr/vlegrand/ptv.Phage Kitchenhttps://trello.com/1/cards/61836d4bd8ac780c3af6b120/attachments/61836d7560fe9587c9cef8cd/download/image.png
3382023-11-13T11:12:18.883ZPhamerator & BYU-Phamerator**Phamerator** Phamerator is a comparative genomics and genome exploration tool designed and written by Dr. Steve Cresawn of James Madison University. In 2017, Phamerator transitioned from a Linux-based program to be a cross-platform web-based program. It is available at ----------- **BYU-Phamerator** Here we describe modifications to the phage comparative genomics software program, Phamerator, provide public access to the code, and include instructions for creating custom Phamerator databases. We further report genomic analysis techniques to determine phage packaging strategies and identification of the physical ends of phage genomes. Results The original Phamerator code can be successfully modified and custom databases can be generated using the instructions we provide. Results of genome map comparisons within a custom database reveal obstacles in performing the comparisons if a published genome has an incorrect complementarity or an incorrect location of the first base of the genome, which are common issues in GenBank-downloaded sequence files. To address these issues, we review phage packaging strategies and provide results that demonstrate identification of the genome start location and orientation using raw sequencing data and software programs such as PAUSE and Consed to establish the location of the physical ends of the genome. These results include determination of exact direct terminal repeats (DTRs) or cohesive ends, or whether phages may use a headful packaging strategy. Phylogenetic analysis using ClustalO and phamily circles in Phamerator demonstrate that the large terminase gene can be used to identify the phage packaging strategy and thereby aide in identifying the physical ends of the genome. **Description:** **Phamerator** Phamerator is a comparative genomics and genome exploration tool designed and written by Dr. Steve Cresawn of James Madison University. In 2017, Phamerator transitioned from a Linux-based program to be a cross-platform web-based program. It is available at ----------- **BYU-Phamerator** Here we describe modifications to the phage comparative genomics software program, Phamerator, provide public access to the code, and include instructions for creating custom Phamerator databases. We further report genomic analysis techniques to determine phage packaging strategies and identification of the physical ends of phage genomes. Results The original Phamerator code can be successfully modified and custom databases can be generated using the instructions we provide. Results of genome map comparisons within a custom database reveal obstacles in performing the comparisons if a published genome has an incorrect complementarity or an incorrect location of the first base of the genome, which are common issues in GenBank-downloaded sequence files. To address these issues, we review phage packaging strategies and provide results that demonstrate identification of the genome start location and orientation using raw sequencing data and software programs such as PAUSE and Consed to establish the location of the physical ends of the genome. These results include determination of exact direct terminal repeats (DTRs) or cohesive ends, or whether phages may use a headful packaging strategy. Phylogenetic analysis using ClustalO and phamily circles in Phamerator demonstrate that the large terminase gene can be used to identify the phage packaging strategy and thereby aide in identifying the physical ends of the genome.https://github.com/scresawn/Phamerator https://github.com/scresawn/Phameratorhttps://phagesdb.org/Phamerator/faq/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=21991981 https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3018-2 https://phamerator.org/ https://phamerator.org/ https://phagesdb.org/Phamerator/faq/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=21991981 https://phamerator.org/ https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3018-2Phage Kitchenhttps://trello.com/1/cards/61836f7dfcd95552ff48227e/attachments/6183734af82ab95ee1eaccd3/download/image.png
3392023-11-13T11:12:18.883ZPhANNs - a fast and accurate tool and web server to classify phage structural proteinsPhANNs is a tool to classify any phage ORF as one of 10 structural protein class, or as "others". It uses an ensemble of Artificial Neural Networks. PhANNs predicts the structural class of a phage ORF by running an artificial neural network ensemble that we created against a fasta file of protein sequences. If you upload a multi-fasta file, we‚Äôll provide you estimates of the structural classes of all the proteins, and we‚Äôll let you download the sequences for each class as a fasta file. --------------------- For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50‚Äì90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an ‚Äúother‚Äù category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as ‚Äúother,‚Äù providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally. **Description:** PhANNs is a tool to classify any phage ORF as one of 10 structural protein class, or as "others". It uses an ensemble of Artificial Neural Networks. PhANNs predicts the structural class of a phage ORF by running an artificial neural network ensemble that we created against a fasta file of protein sequences. If you upload a multi-fasta file, we’ll provide you estimates of the structural classes of all the proteins, and we’ll let you download the sequences for each class as a fasta file. --------------------- For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.https://github.com/Adrian-Cantu/PhANNs https://github.com/Adrian-Cantu/PhANNshttp://edwards.sdsu.edu/phanns https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007845 https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007845 http://edwards.sdsu.edu/phannsPhage Kitchen
3402023-11-13T11:12:18.883ZPHANOTATEPHANOTATE is a tool to annotate phage genomes. It uses the assumption that non-coding bases in a phage genome is disadvantageous, and then populates a weighted graph to find the optimal path through the six frames of the DNA where open reading frames are beneficial paths, while gaps and overlaps are penalized paths.https://github.com/deprekate/PHANOTATEhttps://academic.oup.com/bioinformatics/article/35/22/4537/5480131Phage Kitchen
3412023-11-13T11:12:18.883ZPhantome
3422023-11-13T11:12:18.883Zphap - Phage Host Analysis PipelinePHAP wraps the execution of various phage-host prediction tools. Overview Features Uses Singularity containers for the execution of all tools. When possible (i.e. the image is not larger than a few Gs), tools and their dependencies are bundled in the same container. This means you do not need to get models or any other external databases, unless otherwise specified. Intermediate processing steps are handled by Conda environments, to ensure smooth and reproducible execution. Outputs the Last Common Ancestor of all tools, per contig, based on the predicted taxonomy. * HTP * RaFAh * vHuLK * VirHostMatcher-Net * WIsH **Description:** PHAP wraps the execution of various phage-host prediction tools. Overview Features Uses Singularity containers for the execution of all tools. When possible (i.e. the image is not larger than a few Gs), tools and their dependencies are bundled in the same container. This means you do not need to get models or any other external databases, unless otherwise specified. Intermediate processing steps are handled by Conda environments, to ensure smooth and reproducible execution. Outputs the Last Common Ancestor of all tools, per contig, based on the predicted taxonomy. * HTP * RaFAh * vHuLK * VirHostMatcher-Net * WIsHhttps://github.com/MGXlab/phap https://github.com/wojciech-galan/viruses_classifier https://github.com/LaboratorioBioinformatica/vHULK https://github.com/WeiliWw/VirHostMatcher-Net https://github.com/soedinglab/WIsH https://github.com/MGXlab/phap https://github.com/wojciech-galan/viruses_classifier https://github.com/LaboratorioBioinformatica/vHULK https://github.com/WeiliWw/VirHostMatcher-Net https://github.com/soedinglab/WIsHhttps://sourceforge.net/projects/rafah/ https://sourceforge.net/projects/rafah/Phage Kitchen
3432023-11-13T11:12:18.883ZPHAROKKAphrokka is a fast phage annotation pipeline. phrokka uses Phanotate (McNair et al 2019 doi:10.1093/bioinformatics/btz265) to conduct gene calling and tRNAscan-SE 2 (Chan et al 2021 to call tRNAs. phrokka then uses the lightweight PHROGS database ( Terzian et al 2021 to conduct annotation. Specifically, each gene is compared against the entire PHROGS database using mmseqs2. --- phrokka creates a number of output files in different formats. The 2 main files phrokka generates is phrokka.gff, which is a gff3 format file including the fasta following the gff table annotations. phrokka also creates phrokka.tbl, which is a flat-file table suitable to be unploaded to the NCBI's Bankit.https://github.com/gbouras13/phrokkahttps://doi.org/10.1093/nar/gkab688) https://phrogs.lmge.uca.fr https://doi.org/10.1093/nargab/lqab067)Phage Kitchen
3442023-11-13T11:12:18.883ZPHASTER DB
3452023-11-13T11:12:18.883ZPHERI - Phage Host Exploration pipelineThe solution to this problem may be to use a bioinformatic approach in the form of prediction software capable of determining a bacterial host based on the phage whole-genome sequence. The result of our research is the machine learning algorithm based tool called PHERI. PHERI predicts suitable bacterial host genus for purification of individual viruses from different samples. Besides, it can identify and highlight protein sequences that are important for host selection. PHERI is available at The source code for the model training is available at , and the source code for the tool is available at **Description:** The solution to this problem may be to use a bioinformatic approach in the form of prediction software capable of determining a bacterial host based on the phage whole-genome sequence. The result of our research is the machine learning algorithm based tool called PHERI. PHERI predicts suitable bacterial host genus for purification of individual viruses from different samples. Besides, it can identify and highlight protein sequences that are important for host selection. PHERI is available at The source code for the model training is available at , and the source code for the tool is available athttps://github.com/andynet/pheri_preprocessing https://github.com/andynet/pheri. https://github.com/andynet/pheri_preprocessing https://github.com/andynet/pheri.https://hub.docker.com/repository/docker/andynet/pheri. https://www.biorxiv.org/content/10.1101/2020.05.13.093773v3.full https://www.biorxiv.org/content/10.1101/2020.05.13.093773v3.full https://hub.docker.com/repository/docker/andynet/pheri.Phage Kitchen
3462023-11-13T11:12:18.883ZPhigaro: high throughput prophage sequence annotationSummary Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated “prophage genome maps” and marks possible transposon insertion spots inside prophages. It provides putative taxonomic annotations that can distinguish tailed from non-tailed phages. It is applicable for mining prophage regions from large metagenomic datasets. Availability Source code for Phigaro is freely available for download at along with test data. The code is written in Python.https://github.com/bobeobibo/phigarohttps://www.biorxiv.org/content/10.1101/598243v1Phage Kitchen
3472023-11-13T11:12:18.883ZPhilympics 2021: Prophage Predictions Perplex Programstesting: * Phage Finder (2006) * PhiSpy (2012) * VirSorter (2015) * Phigaro (2020) * DBSCAN-SWA (2020) * VIBRANT (2020) * PhageBoost (2021) * VirSorter2 (2021)https://f1000research.com/articles/10-758/v1Phage Kitchen
3482023-11-13T11:12:18.883ZPhirbo - A tool to predict prokaryotic hosts for phage (meta)genomic sequencesPhirbo links phage to host sequences through other intermediate sequences that are potentially homologous to both phage and host sequences. To link phage (P) to host (H) sequence through intermediate sequences, phage and host sequences need to be used as queries in two separate sequence similarity searches (e.g., BLAST) against the same reference database of prokaryotic genomes (D). One BLAST search is performed for phage query (P) and the other for host query (H). The two lists of BLAST results, P ‚Üí D and H ‚Üí D, contain prokaryotic genomes ordered by decreasing score. To avoid a taxonomic bias due to multiple genomes of the same prokaryote species (e.g., Escherichia coli), prokaryotic species can be ranked according to their first appearance in the BLAST list. In this way, both ranked lists represent phage and host profiles consisting of the ranks of top-score prokaryotic species. Phirbo estimates the phage-host relationship by comparing the content and order between phage and host ranked lists using Rank-Biased Overlap (RBO) measure. Briefly, RBO fosters comparison of ranked lists of different lengths with heavier weights for matching the higher-ranking items. RBO ranges between 0 and 1, where 0 means that the lists are disjoint (have no items in common) and 1 means that the lists are identical in content and order.https://github.com/aziele/phirboPhage Kitchenhttps://trello.com/1/cards/61833982e8cf5f4298a3ad20/attachments/618339d66ac1af2038ec3dcc/download/figure.png
3492023-11-13T11:12:18.883ZPHIST - Phage-Host Interaction Search ToolA tool to predict prokaryotic hosts for phage (meta)genomic sequences. PHIST links viruses to hosts based on the number of k-mers shared between their sequences. **Description:** A tool to predict prokaryotic hosts for phage (meta)genomic sequences. PHIST links viruses to hosts based on the number of k-mers shared between their sequences.https://github.com/refresh-bio/PHIST https://github.com/refresh-bio/PHISThttps://www.biorxiv.org/content/10.1101/2021.09.06.459169v1 https://www.biorxiv.org/content/10.1101/2021.09.06.459169v1Phage Kitchen
3502023-11-13T11:12:18.883ZPhmmer
3512023-11-13T11:12:18.883ZPHROGS
3522023-11-13T11:12:18.883ZPILERCR (crispr)
3532023-11-13T11:12:18.883ZPlaque Size ToolPlaque Size Tool is an open-source application written in Python 3 that is able to detect and measure bacteriophage plaques on a Petri dish image. The source files are located at . To cite Plaque Size Tool, please use **Description:** Plaque Size Tool is an open-source application written in Python 3 that is able to detect and measure bacteriophage plaques on a Petri dish image. The source files are located at . To cite Plaque Size Tool, please usehttps://github.com/ellinium/plaque_size_tool https://github.com/ellinium/plaque_size_tool https://github.com/ellinium/plaque_size_tool.https://doi.org/10.1016/j.virol.2021.05.011 https://doi.org/10.1016/j.virol.2021.05.011Phage Kitchenhttps://trello.com/1/cards/618336f140bb9911840c1069/attachments/6183376932281c5d19e63319/download/image13.jpg
3542023-11-13T11:12:18.883ZPlasmid/ICE contamination check
3552023-11-13T11:12:18.883ZPOG - Orthologous Gene Clusters and Taxon Signature Genes for Viruses of ProkaryotesHere, we present an update of the phage orthologous groups (POGs), a collection of 4,542 clusters of orthologous genes from bacteriophages that now also includes viruses infecting archaea and encompasses more than 1,000 distinct virus genomes. Analysis of this expanded data set shows that the number of POGs keeps growing without saturation and that a substantial majority of the POGs remain specific to viruses, lacking homologues in prokaryotic cells, outside known proviruses. Thus, the great majority of virus genes apparently remains to be discovered.https://journals.asm.org/doi/10.1128/JB.01801-12?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmedPhage Kitchenhttps://trello.com/1/cards/61a713f023021804c22bb858/attachments/61a7143b11d3f363954f2291/download/image.png
3562023-11-13T11:12:18.883ZPOGs
3572023-11-13T11:12:18.883ZPPHMM
3582023-11-13T11:12:18.883ZPPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learningWe present PPR-Meta, a 3-class classifier that allows simultaneous identification of both phage and plasmid fragments from metagenomic assemblies. PPR-Meta consists of several modules for predicting sequences of different lengths. Using deep learning, a novel network architecture, referred to as the Bi-path Convolutional Neural Network, is designed to improve the performance for short fragments. PPR-Meta demonstrates much better performance than currently available similar tools individually for phage or plasmid identification, while testing on both artificial contigs and real metagenomic data. PPR-Meta is freely available via orhttps://github.com/zhenchengfang/PPR-Meta.http://cqb.pku.edu.cn/ZhuLab/PPR_Meta https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC6586199/Phage Kitchen
3592023-11-13T11:12:18.883Zpprmeta
3602023-11-13T11:12:18.883ZPresentation - phage genome annotation and classification - how to get startedContains lots of info on circularising issue (repeats, etc..)https://quadram.ac.uk/wp-content/uploads/2021/02/APF_phage_annotation_EAdriaenssens-red.pdfPhage Kitchen
3612023-11-13T11:12:18.883ZprodigalProdigal: prokaryotic gene recognition and translation initiation site identification
3622023-11-13T11:12:18.883ZprogressiveMauve
3632023-11-13T11:12:18.883Zprokka
3642023-11-13T11:12:18.883ZProphage Hunter - an integrative hunting tool for active prophagesWe present Prophage Hunter, a tool aimed at hunting for active prophages from whole genome assembly of bacteria. Combining sequence similarity-based matching and genetic features-based machine learning classification, we developed a novel scoring system that exhibits higher accuracy than current tools in predicting active prophages on the validation datasets. The option of skipping similarity matching is also available so that there's higher chance for novel phages to be discovered. Prophage Hunter provides a one-stop web service to extract prophage genomes from bacterial genomes, evaluate the activity of the prophages, identify phylogenetically related phages, and annotate the function of phage proteins. Prophage Hunter is freely available at **Description:** We present Prophage Hunter, a tool aimed at hunting for active prophages from whole genome assembly of bacteria. Combining sequence similarity-based matching and genetic features-based machine learning classification, we developed a novel scoring system that exhibits higher accuracy than current tools in predicting active prophages on the validation datasets. The option of skipping similarity matching is also available so that there's higher chance for novel phages to be discovered. Prophage Hunter provides a one-stop web service to extract prophage genomes from bacterial genomes, evaluate the activity of the prophages, identify phylogenetically related phages, and annotate the function of phage proteins. Prophage Hunter is freely available athttps://pro-hunter.bgi.com/. https://academic.oup.com/nar/article/47/W1/W74/5494712 https://academic.oup.com/nar/article/47/W1/W74/5494712 https://pro-hunter.bgi.com/.Phage Kitchenhttps://trello.com/1/cards/6201ebef6118d03d98ea20d7/attachments/6201ec21c1d0366b647235ea/download/image.png
3652023-11-13T11:12:18.883ZProxiMeta and ProxiPhage (PhaseGenomics - Commercial)We developed an end-to-end bioinformatics platform for viral genome reconstruction and host attribution from metagenomic data using proximity-ligation sequencing (i.e., Hi-C). We demonstrate the capabilities of the platform by recovering and characterizing the metavirome of a variety of metagenomes, including a fecal microbiome that has also been sequenced with accurate long reads, allowing for the assessment and benchmarking of the new methods. The platform can accurately extract numerous near-complete viral genomes even from highly fragmented short-read assemblies and can reliably predict their cellular hosts with minimal false positives. To our knowledge, this is the first software for performing these tasks. Being significantly cheaper than long-read sequencing of comparable depth, the incorporation of proximity-ligation sequencing in microbiome research shows promise to greatly accelerate future advancements in the field.https://phasegenomics.com/wp-content/uploads/2021/06/ProxiMeta_Phage-Analysis-App-Note_June-2021.pdf https://www.biorxiv.org/content/10.1101/2021.06.14.448389v1.full https://phasegenomics.com/wp-content/uploads/2021/06/PhaseGenomics_ASM_IHMC_Poster_2021-2.pdfPhage Kitchenhttps://trello.com/1/cards/6202faad25d55618bd5fe06d/attachments/6202fb224d6b23742207c6b1/download/image.png
3662023-11-13T11:12:18.883ZpVOG-DB
3672023-11-13T11:12:18.883ZPython
3682023-11-13T11:12:18.883ZQueen Astrid Military Hospital (Belgium)https://phage.directory/capsid/phage-futures-jean-paul-pirnayPhage Kitchen
3692023-11-13T11:12:18.883ZR
3702023-11-13T11:12:18.883Zrafah
3712023-11-13T11:12:18.883Zrafah - Random Forest Assignment of HostsOne fundamental question when trying to describe viruses of Bacteria and Archaea is: Which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), which outperformed other methods for virus-host prediction. Our rationale was that the machine could learn the associations between genes and hosts much more efficiently than a human, while also using the information contained in the hypothetical proteins. Random forest models were built using the Ranger‚ņ package in R‚ņ. **Description:** One fundamental question when trying to describe viruses of Bacteria and Archaea is: Which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), which outperformed other methods for virus-host prediction. Our rationale was that the machine could learn the associations between genes and hosts much more efficiently than a human, while also using the information contained in the hypothetical proteins. Random forest models were built using the Ranger⁠ package in R⁠.https://sourceforge.net/projects/rafah/ https://www.sciencedirect.com/science/article/pii/S2666389921001008 https://www.sciencedirect.com/science/article/pii/S2666389921001008 https://sourceforge.net/projects/rafah/Phage Kitchenhttps://trello.com/1/cards/61ba781d22aac541652633b2/attachments/61ba78770c46cf1034d494f3/download/image.png
3722023-11-13T11:12:18.883ZRASThttps://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC3965101/ https://link.springer.com/protocol/10.1007/978-1-4939-7343-9_17Phage Kitchenhttps://trello.com/1/cards/6189db69330c2189dcd5f848/attachments/6189db857b333e773d5d673a/download/image.png
3732023-11-13T11:12:18.883ZRFAM
3742023-11-13T11:12:18.883ZRIME bioinformaticshttps://www.rime-bioinformatics.com/en/home/Phage Kitchenhttps://trello.com/1/cards/6178a0631a700007711b5399/attachments/618af56cad3dfb26df27564e/download/image.png
3752023-11-13T11:12:18.883ZRocha lab - A simple, reproducible and cost-effective procedure to analyse gut phageome: from phage isolation to bioinformatic approach(Camille d’Humières, Marie Touchon, Sara Dion, Jean Cury, Amine Ghozlane, Marc Garcia-Garcera, Christiane Bouchier, Laurence Ma, Erick Denamur & Eduardo P.C.Rocha ) We analysed five different techniques to isolate phages from human adult faeces and developed an approach to analyse their genomes in order to quantify contamination and classify phage contigs in terms of taxonomy and lifestyle. We chose the polyethylene glycol concentration method to isolate phages because of its simplicity, low cost, reproducibility, and of the high number and diversity of phage sequences that we obtained. We also tested the reproducibility of this method with multiple displacement amplification (MDA) and showed that MDA severely decreases the phage genetic diversity of the samples and the reproducibility of the method. Lastly, we studied the influence of sequencing depth on the analysis of phage diversity and observed the beginning of a plateau for phage contigs at 20,000,000 reads. This work contributes to the development of methods for the isolation of phages in faeces and for their comparative analysis.https://www.nature.com/articles/s41598-019-47656-wPhage Kitchenhttps://trello.com/1/cards/6178bcd30d3c1535418e3a84/attachments/6181bed5236a0a2e711d79ef/download/Table1.png
3762023-11-13T11:12:18.883ZRPSBLAST
3772023-11-13T11:12:18.883ZRTMg
3782023-11-13T11:12:18.883ZRuby
3792023-11-13T11:12:18.883ZRVDB
3802023-11-13T11:12:18.883ZsamtoolsThe Sequence Alignment/Map format and SAMtools
3812023-11-13T11:12:18.883Zsankey
3822023-11-13T11:12:18.883ZSEA-PHAGES bioinformatics guidehttps://seaphagesbioinformatics.helpdocsonline.com/interpreting-dataPhage Kitchenhttps://trello.com/1/cards/6183877a22f1ab60f5e0eb2f/attachments/6183879e1c947738a174289c/download/image.png
3832023-11-13T11:12:18.883ZSEA-PHAGES University of Pittsburgh (US)https://seaphages.org/institution/PITT/Phage Kitchen
3842023-11-13T11:12:18.883ZSeaphages decision trees for refining annotationshttps://seaphagesbioinformatics.helpdocsonline.com/article-25Phage Kitchenhttps://trello.com/1/cards/61e752c9f2a14e3b977d0d8c/attachments/61e752d5c3c821689d80ec25/download/image.png
3852023-11-13T11:12:18.883ZSeeker: alignment-free identification of bacteriophage genomes by deep learningRecent advances in metagenomic sequencing have enabled discovery of diverse, distinct microbes and viruses. Bacteriophages, the most abundant biological entity on Earth, evolve rapidly, and therefore, detection of unknown bacteriophages in sequence datasets is a challenge. Most of the existing detection methods rely on sequence similarity to known bacteriophage sequences, impeding the identification and characterization of distinct, highly divergent bacteriophage families. Here we present Seeker, a deep-learning tool for alignment-free identification of phage sequences. Seeker allows rapid detection of phages in sequence datasets and differentiation of phage sequences from bacterial ones, even when those phages exhibit little sequence similarity to established phage families. We comprehensively validate Seeker's ability to identify previously unidentified phages, and employ this method to detect unknown phages, some of which are highly divergent from the known phage families. We provide a web portal (seeker.pythonanywhere.com) and a user-friendly Python package (github.com/gussow/seeker) allowing researchers to easily apply Seeker in metagenomic studies, for the detection of diverse unknown bacteriophages.https://academic.oup.com/nar/article/48/21/e121/5921300Phage Kitchen
3862023-11-13T11:12:18.883ZseqkitSeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
3872023-11-13T11:12:18.883ZSequencing QC
3882023-11-13T11:12:18.883ZShineFind
3892023-11-13T11:12:18.883ZSourmashsourmash: a library for MinHash sketching of DNA
3902023-11-13T11:12:18.883Zsourmash
3912023-11-13T11:12:18.883ZSpacePharer (CRISPR)
3922023-11-13T11:12:18.883ZSPAdes
3932023-11-13T11:12:18.883ZSTEP3**A machine-learning approach to define the component parts of bacteriophage virions** Bacteriophages (phages) are currently under consideration as a means to treat a wide range of bacterial infections, including those caused by drug-resistant ‚Äúsuperbugs‚Äù. Successful phage therapy protocols require diverse phage in a phage cocktail, with the prospective need to recognize features of diverse phage from under-sampled environments. The effective use of these viruse for therapy depends on a number of factors, not least of which is the sequence-based choices that must be made to identify new phages for development into phage therapy. Phage virions, i.e. the physical form of the phage that would be delivered to the site of infection, conform to a blue-print that consists of a protein capsid housing the viral genome, and a multicomponent tail. We view these virions as molecular machines, and the machinery of the tail machinery is complex. First and foremost, elements within the tail function to engage a species-specific component on the surface of the host bacterium, thereby initiating the infection cascade. The tail machinery is also responsible for penetrating through the bacterial cell wall, in order that the tip of the tail can enter the bacterial cytoplasm. Then, and only then, is a signal transmitted to the portal at the proximal end of the tail, enabling release of the phage DNA into the tail lumen to permit DNA translocation into the bacterial cell cytoplasm, resulting in bacterial death. We have developed an ensemble predictor called STEP3 that uses machine-learning algorithms to characterize the components of the machinery in phage virions. STEP3 can be used to understand the universal features of the machinery in phage tails, by accurately classifying proteins with conserved features together into groupings that are not dependent on the ill-considered annotations that currently confuse phage genome data. In the development of STEP3, various types of evolutionary features were sampled, features that were extracted from Position-Specific Scoring Matrix (PSSM), to draw on relationships underpinning the evolutionary history of the various proteins making up the phage virions. Considering the high evolution rates of phage proteins, these features are particularly suitable to detect virion proteins with only distantly related homologies. STEP3 integrated these features into an ensemble framework to achieve a stable and robust prediction performance. The final ensemble model showed a significant improvement in terms of prediction accuracy over current state-of-the-art phage virion protein predictors on extensive 5-fold cross-validation and independent tests. **Description:** **A machine-learning approach to define the component parts of bacteriophage virions** Bacteriophages (phages) are currently under consideration as a means to treat a wide range of bacterial infections, including those caused by drug-resistant “superbugs”. Successful phage therapy protocols require diverse phage in a phage cocktail, with the prospective need to recognize features of diverse phage from under-sampled environments. The effective use of these viruse for therapy depends on a number of factors, not least of which is the sequence-based choices that must be made to identify new phages for development into phage therapy. Phage virions, i.e. the physical form of the phage that would be delivered to the site of infection, conform to a blue-print that consists of a protein capsid housing the viral genome, and a multicomponent tail. We view these virions as molecular machines, and the machinery of the tail machinery is complex. First and foremost, elements within the tail function to engage a species-specific component on the surface of the host bacterium, thereby initiating the infection cascade. The tail machinery is also responsible for penetrating through the bacterial cell wall, in order that the tip of the tail can enter the bacterial cytoplasm. Then, and only then, is a signal transmitted to the portal at the proximal end of the tail, enabling release of the phage DNA into the tail lumen to permit DNA translocation into the bacterial cell cytoplasm, resulting in bacterial death. We have developed an ensemble predictor called STEP3 that uses machine-learning algorithms to characterize the components of the machinery in phage virions. STEP3 can be used to understand the universal features of the machinery in phage tails, by accurately classifying proteins with conserved features together into groupings that are not dependent on the ill-considered annotations that currently confuse phage genome data. In the development of STEP3, various types of evolutionary features were sampled, features that were extracted from Position-Specific Scoring Matrix (PSSM), to draw on relationships underpinning the evolutionary history of the various proteins making up the phage virions. Considering the high evolution rates of phage proteins, these features are particularly suitable to detect virion proteins with only distantly related homologies. STEP3 integrated these features into an ensemble framework to achieve a stable and robust prediction performance. The final ensemble model showed a significant improvement in terms of prediction accuracy over current state-of-the-art phage virion protein predictors on extensive 5-fold cross-validation and independent tests.https://journals.asm.org/doi/10.1128/mSystems.00242-21 https://step3.erc.monash.edu/ https://step3.erc.monash.edu/ https://journals.asm.org/doi/10.1128/mSystems.00242-21Phage Kitchenhttps://trello.com/1/cards/61837d0fa5f17a4e76aed78b/attachments/61a7139734f4b01927c4888b/download/image.png
3942023-11-13T11:12:18.883ZStringTie
3952023-11-13T11:12:18.883ZSumTrees (bootstrapping)
3962023-11-13T11:12:18.883ZSWISSPROT
3972023-11-13T11:12:18.883ZTaxonomic assignment
3982023-11-13T11:12:18.883Ztblastx
3992023-11-13T11:12:18.883ZThe Bacteriophage Bank of Koreahttp://www.phagebank.or.kr/intro/eng_intro.jspPhage Kitchen
4002023-11-13T11:12:18.883ZTIGRfam
4012023-11-13T11:12:18.883ZTMHMM
4022023-11-13T11:12:18.883ZTnT Genome
4032023-11-13T11:12:18.883ZTransTermHP
4042023-11-13T11:12:18.883ZTrEMBL
4052023-11-13T11:12:18.883ZTRIBE-MCL
4062023-11-13T11:12:18.883ZtRNAscan-SE
4072023-11-13T11:12:18.883ZUnavailable
4082023-11-13T11:12:18.883ZUniprot99
4092023-11-13T11:12:18.883ZUniRef90
4102023-11-13T11:12:18.883ZUPGMA
4112023-11-13T11:12:18.883ZUpSetRUpSetR: an R package for the visualization of intersecting sets and their properties
4122023-11-13T11:12:18.883ZUsearch
4132023-11-13T11:12:18.883ZvConTACT2
4142023-11-13T11:12:18.883ZvCONTACT2 - Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networksWe present vConTACT v.2.0, a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions. We report near-identical (96%) replication of existing genus-level viral taxonomy assignments from the International Committee on Taxonomy of Viruses for National Center for Biotechnology Information virus RefSeq. Application of vConTACT v.2.0 to 1,364 previously unclassified viruses deposited in virus RefSeq as reference genomes produced automatic, high-confidence genus assignments for 820 of the 1,364. We applied vConTACT v.2.0 to analyze 15,280 Global Ocean Virome genome fragments and were able to provide taxonomic assignments for 31% of these data, which shows that our algorithm is scalable to very large metagenomic datasets. Our taxonomy tool can be automated and applied to metagenomes from any environment for virus classification. --- Version 1 vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria **Description:** We present vConTACT v.2.0, a network-based application utilizing whole genome gene-sharing profiles for virus taxonomy that integrates distance-based hierarchical clustering and confidence scores for all taxonomic predictions. We report near-identical (96%) replication of existing genus-level viral taxonomy assignments from the International Committee on Taxonomy of Viruses for National Center for Biotechnology Information virus RefSeq. Application of vConTACT v.2.0 to 1,364 previously unclassified viruses deposited in virus RefSeq as reference genomes produced automatic, high-confidence genus assignments for 820 of the 1,364. We applied vConTACT v.2.0 to analyze 15,280 Global Ocean Virome genome fragments and were able to provide taxonomic assignments for 31% of these data, which shows that our algorithm is scalable to very large metagenomic datasets. Our taxonomy tool can be automated and applied to metagenomes from any environment for virus classification. --- Version 1 vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteriahttps://bitbucket.org/MAVERICLab/vcontact2/wiki/Home https://bitbucket.org/MAVERICLab/vcontact2/wiki/Homehttps://www.nature.com/articles/s41587-019-0100-8 https://peerj.com/articles/3243/ https://www.nature.com/articles/s41587-019-0100-8 https://peerj.com/articles/3243/Phage Kitchen
4152023-11-13T11:12:18.883ZvConTACT2 SOPhttps://www.protocols.io/view/applying-vcontact-to-viral-sequences-and-visualizi-x5xfq7nPhage Kitchen
4162023-11-13T11:12:18.883ZvHULK**Phage Host Prediction using high level features and neural networks** Metagenomics and sequencing techniques have greatly improved in these last five years and, as a consequence, the amount of data from microbial communities is astronomic. An import part of the microbial community are phages, which have their own ecological roles in the environment. Besides that, they have also been given a possible human relevant (clinical) role as terminators of multidrug resistant bacterial infections. A lot of basic research still need to be done in the Phage therapy field, and part of this research involves gathering knowledge from new phages present in the environment as well as about their relationship with clinical relevant bacterial pathogens. Having this scenario in mind, we have developed vHULK. A user-friendly tool for prediction of phage hosts given their complete or partial genome in FASTA format. Our tool outputs an ensemble prediction at the genus or species level based on scores of four different neural network models. Each model was trained with more than 4,000 genomes whose phage-host relationship was known. v.HULK also outputs a mesure of entropy for each final prediction, which we have demonstrated to be correlated with prediction's accuracy. The user might understand this value as additional information of how certain v.HULK is about a particular prediction. We also suspect that phages with higher entropy values may have a broad host-range. But that hypothesis is to be tested later. Accuracy results in test datasets were >99% for predictions at the genus level and >98% at the species level. vHULK currently supports predictions for 52 different prokaryotic host species and 61 different genera.https://github.com/LaboratorioBioinformatica/vHULKhttps://www.biorxiv.org/content/10.1101/2020.12.06.413476v1.fullPhage Kitchenhttps://trello.com/1/cards/61ba7607a28975244b0a6027/attachments/61ba76d4f7735a810ed5d1f1/download/image.png
4172023-11-13T11:12:18.883Zvibrant
4182023-11-13T11:12:18.883ZVIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequencesBackground Viruses are central to microbial community structure in all environments. The ability to generate large metagenomic assemblies of mixed microbial and viral sequences provides the opportunity to tease apart complex microbiome dynamics, but these analyses are currently limited by the tools available for analyses of viral genomes and assessing their metabolic impacts on microbiomes. Design Here we present VIBRANT, the first method to utilize a hybrid machine learning and protein similarity approach that is not reliant on sequence features for automated recovery and annotation of viruses, determination of genome quality and completeness, and characterization of viral community function from metagenomic assemblies. VIBRANT uses neural networks of protein signatures and a newly developed v-score metric that circumvents traditional boundaries to maximize identification of lytic viral genomes and integrated proviruses, including highly diverse viruses. VIBRANT highlights viral auxiliary metabolic genes and metabolic pathways, thereby serving as a user-friendly platform for evaluating viral community function. VIBRANT was trained and validated on reference virus datasets as well as microbiome and virome data. Results VIBRANT showed superior performance in recovering higher quality viruses and concurrently reduced the false identification of non-viral genome fragments in comparison to other virus identification programs, specifically VirSorter, VirFinder, and MARVEL. When applied to 120,834 metagenome-derived viral sequences representing several human and natural environments, VIBRANT recovered an average of 94% of the viruses, whereas VirFinder, VirSorter, and MARVEL achieved less powerful performance, averaging 48%, 87%, and 71%, respectively. Similarly, VIBRANT identified more total viral sequence and proteins when applied to real metagenomes. When compared to PHASTER, Prophage Hunter, and VirSorter for the ability to extract integrated provirus regions from host scaffolds, VIBRANT performed comparably and even identified proviruses that the other programs did not. To demonstrate applications of VIBRANT, we studied viromes associated with Crohn’s disease to show that specific viral groups, namely Enterobacteriales-like viruses, as well as putative dysbiosis associated viral proteins are more abundant compared to healthy individuals, providing a possible viral link to maintenance of diseased states. Conclusions The ability to accurately recover viruses and explore viral impacts on microbial community metabolism will greatly advance our understanding of microbiomes, host-microbe interactions, and ecosystem dynamics.https://github.com/AnantharamanLab/VIBRANThttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00867-0Phage Kitchenhttps://trello.com/1/cards/6178a75108823f42f72f4f48/attachments/6178a81d23075487b78f97c7/download/image.png
4192023-11-13T11:12:18.883ZVICTOR: genome-based phylogeny and classification of prokaryotic viruses - online only?We here present a novel in silico framework for phylogeny and classification of prokaryotic viruses, in line with the principles of phylogenetic systematics, and using a large reference dataset of officially classified viruses. The resulting trees revealed a high agreement with the classification. Except for low resolution at the family level, the majority of taxa was well supported as monophyletic. Clusters obtained with distance thresholds chosen for maximizing taxonomic agreement appeared phylogenetically reasonable, too. Analysis of an expanded dataset, containing >4000 genomes from public databases, revealed a large number of novel species, genera, subfamilies and families.https://ggdc.dsmz.de/victor.php https://doi.org/10.1093/bioinformatics/btx440Phage Kitchen
4202023-11-13T11:12:18.883ZVIGA - De novo Viral Genome AnnotatorVIGA is a script written in Python 3 that annotates viral genomes automatically (using a de novo algorithm) and predict the function of their proteins using BLAST and HMMER. This script works in UNIX-based OS, including MacOSX and the Windows Subsystem for Linux. Programs: * LASTZ (Harris 2007): it is used to predict the circularity of the contigs. The program is publicly available at under the MIT licence. * INFERNAL (Nawrocki and Eddy 2013): it is used to predict ribosomal RNA in the contigs when using the RFAM database (Nawrocki et al. 2015). This program is publicly available at under the BSD licence and RFAM database is available at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/ * ARAGORN (Laslett and Canback 2004): it is used to predict tRNA sequences in the contig. This program is publicly available at under the GPLv2 licence. * PILERCR (Edgar 2007): it is used to predict CRISPR repeats in your contig. This program is freely available at under a public licence. * Prodigal (Hyatt et al. 2010): it is used to predict the ORFs. When the contig is smaller than 100,000 bp, MetaProdigal (Hyatt et al. 2012) is automatically activated instead of normal Prodigal. This program is publicly available at under the GPLv3 licence. * DIAMOND (Buchfink et al. 2015): it is used to predict the function of proteins according to homology. This program is publicly available at under the GPLv3 licence. Databases must be created from FASTA files according to their instructions before running. * BLAST+ (Camacho et al. 2008): it is used to predict the function of the predicted proteins according to homology when DIAMOND is not able to retrieve any hit or such hit is a 'hypothetical protein'. This suite is publicly available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ under the GPLv2 licence. Databases are available at ftp://ftp.ncbi.nlm.nih.gov/blast/db/ or created using makeblastdb command. * HMMER (Finn et al. 2011): it is used to add more information of the predicted proteins according to Hidden Markov Models. This suite is publicly available at under the GPLv3 licence. Databases must be in HMM format and an example of potential database is PVOGs ( **Description:** VIGA is a script written in Python 3 that annotates viral genomes automatically (using a de novo algorithm) and predict the function of their proteins using BLAST and HMMER. This script works in UNIX-based OS, including MacOSX and the Windows Subsystem for Linux. Programs: * LASTZ (Harris 2007): it is used to predict the circularity of the contigs. The program is publicly available at under the MIT licence. * INFERNAL (Nawrocki and Eddy 2013): it is used to predict ribosomal RNA in the contigs when using the RFAM database (Nawrocki et al. 2015). This program is publicly available at under the BSD licence and RFAM database is available at ftp://ftp.ebi.ac.uk/pub/databases/Rfam/ * ARAGORN (Laslett and Canback 2004): it is used to predict tRNA sequences in the contig. This program is publicly available at under the GPLv2 licence. * PILERCR (Edgar 2007): it is used to predict CRISPR repeats in your contig. This program is freely available at under a public licence. * Prodigal (Hyatt et al. 2010): it is used to predict the ORFs. When the contig is smaller than 100,000 bp, MetaProdigal (Hyatt et al. 2012) is automatically activated instead of normal Prodigal. This program is publicly available at under the GPLv3 licence. * DIAMOND (Buchfink et al. 2015): it is used to predict the function of proteins according to homology. This program is publicly available at under the GPLv3 licence. Databases must be created from FASTA files according to their instructions before running. * BLAST+ (Camacho et al. 2008): it is used to predict the function of the predicted proteins according to homology when DIAMOND is not able to retrieve any hit or such hit is a 'hypothetical protein'. This suite is publicly available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ under the GPLv2 licence. Databases are available at ftp://ftp.ncbi.nlm.nih.gov/blast/db/ or created using makeblastdb command. * HMMER (Finn et al. 2011): it is used to add more information of the predicted proteins according to Hidden Markov Models. This suite is publicly available at under the GPLv3 licence. Databases must be in HMM format and an example of potential database is PVOGs (https://github.com/EGTortuero/viga/tree/developer https://github.com/lastz/lastz https://github.com/hyattpd/prodigal/releases/ https://github.com/bbuchfink/diamond https://github.com/EGTortuero/viga/tree/developer https://github.com/lastz/lastz https://github.com/hyattpd/prodigal/releases/ https://github.com/bbuchfink/diamondhttp://eddylab.org/infernal/ http://mbio-serv2.mbioekol.lu.se/ARAGORN/ http://drive5.com/pilercr/ http://hmmer.org/ http://dmk-brain.ecn.uiowa.edu/VOG/downloads/All/AllvogHMMprofiles.tar.gz). http://eddylab.org/infernal/ http://mbio-serv2.mbioekol.lu.se/ARAGORN/ http://drive5.com/pilercr/ http://hmmer.org/ http://dmk-brain.ecn.uiowa.edu/VOG/downloads/All/AllvogHMMprofiles.tar.gz).Phage Kitchen
4212023-11-13T11:12:18.883ZViPhOG
4222023-11-13T11:12:18.883ZViPhOGs - Informative Regions In Viral GenomesIn order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.https://www.mdpi.com/1999-4915/13/6/1164/htmPhage Kitchenhttps://trello.com/1/cards/6201e751f1931e3e6fdcc1a1/attachments/6201e79af135207d425ea7d6/download/image.png
4232023-11-13T11:12:18.883ZViPTree : the Viral Proteomic Tree serverThe ViPTree server generates a "proteomic tree" of viral genome sequences based on genome-wide sequence similarities computed by tBLASTx. The original proteomic tree concept (i.e., "the Phage Proteomic Tree”) was developed by Rohwer and Edwards, 2002. A proteomic tree is a dendrogram that reveals global genomic similarity relationships between tens, hundreds, and thousands of viruses. It has been shown that viral groups identified in a proteomic tree well correspond to established viral taxonomies. The proteomic tree approach is effective to investigate genomes of newly sequenced viruses as well as those identified in metagenomes. 2021-10-04 version 1.9.1 Version of Virus-Host DB: RefSeq release 207 ViPTreeGen is a tool for automated generation of viral "proteomic tree" by computing genome-wide sequence similarities based on tBLASTx results.https://github.com/yosuken/ViPTreeGenhttps://www.genome.jp/viptree/Phage Kitchen
4242023-11-13T11:12:18.883ZVIRALPROVIRALpro is a predictor capable of identifying capsid and tail protein sequences using support vector machines (SVM) with an accuracy estimated to be between 90% and 97%. Predictions are based on the protein amino acid composition, on the protein predicted secondary structure, as predicted by SSpro, and on a boosted linear combination of HMM e-values obtained from 3,380 HMMs built from multiple sequence alignments of specific fragments - called contact fragments - of both capsid and tail sequences. **Description:** VIRALpro is a predictor capable of identifying capsid and tail protein sequences using support vector machines (SVM) with an accuracy estimated to be between 90% and 97%. Predictions are based on the protein amino acid composition, on the protein predicted secondary structure, as predicted by SSpro, and on a boosted linear combination of HMM e-values obtained from 3,380 HMMs built from multiple sequence alignments of specific fragments - called contact fragments - of both capsid and tail sequences.http://download.igb.uci.edu/ http://scratch.proteomics.ics.uci.edu/explanation.html#VIRALpro http://scratch.proteomics.ics.uci.edu/explanation.html#VIRALpro http://download.igb.uci.edu/Phage Kitchen
4252023-11-13T11:12:18.883ZViralZone
4262023-11-13T11:12:18.883ZVirClust – a tool for hierarchical clustering, core gene detection and annotation of (prokaryotic) virusesHere, VirClust is presented – a novel tool capable of performing * hierarchical clustering of viruses based on intergenomic distances calculated from their protein cluster content, * identification of core proteins and * annotation of viral proteins. VirClust groups proteins into clusters both based on BLASTP sequence similarity, which identifies more related proteins, and also based on hidden markow models (HMM), which identifies more distantly related proteins. Furthermore, VirClust provides an integrated visualization of the hierarchical clustering tree and of the distribution of the protein content, which allows the identification of the genomic features responsible for the respective clustering. By using different intergenomic distances, the hierarchical trees produced by VirClust can be split into viral genome clusters of different taxonomic ranks. VirClust is freely available, as web-service (virclust.icbm.de) and stand-alone tool.https://doi.org/10.1101/2021.06.14.448304Phage Kitchenhttps://trello.com/1/cards/6178afc68303fa1f4a81b80f/attachments/6178b0154cb281263e8f9b56/download/F1.large.jpg
4272023-11-13T11:12:18.883ZVirFinderVirFinder: R package for identifying viral sequences from metagenomic data using sequence signatures
4282023-11-13T11:12:18.883ZVirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data**Background:** Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. **Methods:** We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder’s performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. **Results:** VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder’s potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. **Conclusions:** This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology. **Keywords:** Metagenome, Virus, k-mer, Human gut, Liver cirrhosishttps://github.com/jessieren/VirFinderhttps://link.springer.com/epdf/10.1186/s40168-017-0283-5?Phage Kitchen
4292023-11-13T11:12:18.883ZVIRIDIC (Virus Intergenomic Distance Calculator) computes pairwise intergenomic distances/similarities amongst viral genomes.VIRIDIC stand-alone is available now (see download tab). You can use it for jobs with high computational demand and/or for implementing it in your own pipelines. It is very easy to install on your own servers (it is wrapped as a Singularity). You can continue to use the VIRIDIC web-service for small to medium projects (e.g. up to 200 phages per project, no viromes please, they will crash our resources and the analysis will fail).https://doi.org/10.3390/v12111268 http://rhea.icbm.uni-oldenburg.de/VIRIDIC/Phage Kitchen
4302023-11-13T11:12:18.883Zvirnet
4312023-11-13T11:12:18.883ZVirNET - A deep attention model for viral reads identificationVirNet: A deep attention model for viral reads identification This tool is able to identifiy viral sequences from a mixture of viral and bacterial sequences. Also, it can purify viral metagenomic data from bacterial contamination **Description:** VirNet: A deep attention model for viral reads identification This tool is able to identifiy viral sequences from a mixture of viral and bacterial sequences. Also, it can purify viral metagenomic data from bacterial contaminationhttps://github.com/alyosama/virnet https://github.com/alyosama/virnetPhage Kitchen
4322023-11-13T11:12:18.883ZVirsorter2 beta
4332023-11-13T11:12:18.883ZVirSorter2 Sullivan lab SOPhttps://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-bwm5pc86Phage Kitchen
4342023-11-13T11:12:18.883ZVirsorter2 vs VirSorter, VirFinder, DeepVirFinder, MARVEL, and VIBRANThttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00990-yPhage Kitchen
4352023-11-13T11:12:18.883ZVirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA virusesVirSorter2 applies a multi-classifier, expert-guided approach to detect diverse DNA and RNA virus genomes. It has made major updates to its previous version: * work with more viral groups including dsDNA phages, ssDNA viruses, RNA viruses, NCLDV (Nucleocytoviricota), lavidaviridae (virophages); * apply machine learning to estimate viralness using genomic features including structural/functional/taxonomic annotation and viral hallmark genes; * train with high quality virus genomes from metagenomes or other sources. A tutorial/SOP on how to quality control VirSorter2 results is available Source code of VirSorter2 is freely available ( and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse (https://github.com/jiarong/VirSorter2 https://bitbucket.org/MAVERICLab/virsorter2),https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00990-y https://www.protocols.io/view/viral-sequence-identification-sop-with-virsorter2-btv8nn9w https://de.cyverse.org/de).Phage Kitchenhttps://trello.com/1/cards/6178aa86d796dd19c663c746/attachments/6178ad86b2b7bd7413e685f4/download/image.png
4362023-11-13T11:12:18.883ZVirus-Host DB
4372023-11-13T11:12:18.883ZVOGDB
4382023-11-13T11:12:18.883ZVPF
4392023-11-13T11:12:18.883ZVPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein familiesSupplementary information Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets.https://github.com/biocom-uib/vpf-toolshttp://bioinfo.uib.es/~recerca/VPF-Class/ https://academic.oup.com/bioinformatics/article-abstract/37/13/1805/6104829Phage Kitchenhttps://trello.com/1/cards/6178ade1cc3ee83046d70aba/attachments/619341538a0dbc7b547eaa21/download/image.png
4402023-11-13T11:12:18.883ZWhat the Phage: A scalable workflow for the identification and analysis of phage sequences* WtP is a scalable and easy-to-use workflow for phage identification and analysis. Our tool currently combines 10 established phage identification tools * An attempt to streamline the usage of various phage identification and prediction tools * The main focus is stability and data filtering/analysis for the user * The tool is intended for fasta and fastq reads to identify phages in contigs/reads * Proper prophage detection is not implemented (yet) - but a handful of tools report them - so they are mostly identifiedhttps://github.com/replikation/What_the_Phagehttps://doi.org/10.1101/2020.07.24.219899Phage Kitchenhttps://trello.com/1/cards/618325564f862049e2f9e7e7/attachments/6183259926d4af2200999bd4/download/wtp-flowchart-simple.png
4412023-11-13T11:12:18.883ZWIsH - who is the host? Predicting prokaryotic hosts from metagenomic phage contigsWIsH can identify bacterial hosts from metagenomic data, keeping good accuracy even on smaller contigs. WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies.https://github.com/soedinglab/WIsHhttps://doi.org/10.1093/bioinformatics/btx383Phage Kitchen
4422023-11-13T11:12:18.883ZXfams
4562024-01-23T14:08:16.566Z

Clinical Trials

_idLast ModifiedTrial IDTitleSponsorStatusPhaseStart DateCompletion DateConditionInterventionLocationURL
22023-11-08T12:52:50.529ZNCT05269121Bacteriophage Therapy in Chronic Prosthetic Joint InfectionsAdaptive Phage Therapeutics, Inc.Not yet recruitingPhase 1/22022-04-30T14:00:00.000Z2023-10-31T13:00:00.000ZProsthetic Joint InfectionPhage TherapyUSAhttps://clinicaltrials.gov/study/NCT05269121
32023-11-08T12:53:05.786Zchanging this

Events

_idLast ModifiedNameCostDescriptionTypeOrganizerLocationDateURLNotes
22023-11-08T20:48:02.090ZEvent NameFree or paid; how much?Brief summary of the eventmeetups, conferences, workshops, etcEvent LocationEvent Datehttps://example.com

Lab Tools

_idLast ModifiedNameTypeManufacturerDescriptionURLCostManualsReviews/Comments
22023-11-08T20:48:11.246ZExample lab tool

Definitions

_idLast ModifiedTermDefinitionAuthorReferenceRelated TermsDate AddedNotes
22023-11-08T20:48:19.863ZExample definition

Grants

_id Grant IDTitleAgencyStatusOpen DateClose DateTopic AreaEligibilityAmountURL
22023-11-08T20:48:29.441ZExample grant

Resource Library

_idLast ModifiedNameCategoryNotesURLSource
22024-03-01T18:33:05.685ZRob Edwards' Viral Bioinformatics ToolsBioinformaticsPeriodically updated open spreadsheet of bioinformatics tools; owned by Rob Edwardshttps://docs.google.com/spreadsheets/d/1ClNgip08olKK-oBMMlPHBwIcilqSxsan8MEaYphUei4/edit#gid=1636291468https://x.com/linsalrob/status/1625650675236454400?s=20
32023-11-08T06:00:30.693ZPhage prediction toolsBioinformaticsGithub repo accompanying paper: "Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data" by Siu Fung Stanley Ho, Nicole E. Wheeler, Andrew D. Millard & Willem van Schaik (https://doi.org/10.1186/s40168-023-01533-x) https://github.com/sxh1136/Phage_tools Rob Edwards' Viral Bioinfo Tools
42023-11-08T06:01:50.638ZTesting (5) Prophage finding toolsBioinformaticsComparison of five (text updated with 5th tool) prophage finding tools for bacterial genomics — Phispy, VirSorter, Phigaro, ProphET, PHASTERhttps://nickp60.github.io/weird_one_offs/testing_3_prophage_finders/ Rob Edwards' Viral Bioinfo Tools
52023-11-08T06:01:46.566ZPhage KitchenBioinformaticsComparison and categorization of MANY phage bioinformatics toolshttps://github.com/nbenzakour/phage-kitchen Nouri Ben Zakour
62023-11-08T06:01:51.360ZChoice of assembly software has a critical impact on virome characterisationBioinformaticsPhage assembly benchmarkhttps://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5/tables/1Rob Edwards' Viral Bioinfo Tools
72023-11-08T06:01:52.016ZEvaluation of computational phage detection tools for metagenomic datasetsBioinformaticsPhage detection in metagenomes tools benchmarkhttps://www.frontiersin.org/articles/10.3389/fmicb.2023.1078760/fullRob Edwards' Viral Bioinfo Tools
82023-11-08T06:01:52.955ZClinicaltrials.govClinical TrialsGreat source for clinical trials; these are NOT reviewedClinicaltrials.gov

Funding Opportunities

_idLast ModifiedIDNameTypeProviderDeadlineField/AreaEligibilityBenefitURL
22023-11-08T06:00:34.011ZA unique identifier for each funding opportunityThe official name of the award, sponsorship, or studentshipThe kind of support offered (e.g., award, scholarship, fellowship, studentship, grant)The organization, institution, or group providing the fundingWhen the application must be submitted byThe academic or research area the funding is targeted towardsKey criteria applicants must meetWhat the recipient gains, such as monetary award, tuition support, research funding, etchttps://example.com

Contributors

_idLast ModifiedNameAffiliationSocials
22023-11-08T06:00:29.610ZJan ZhengPhage Australia, Phage Directoryhttps://twitter.com/yawnxyz, https://linkedin.com/in/janzh
32023-11-08T06:00:30.693ZJessica SacherPhage Australia, Phage Directoryhttps://twitter.com/JessicaSacher, https://www.linkedin.com/in/jessica-sacher
42023-11-14T08:18:06.795Z