Skip to end of metadata
Go to start of metadata
Computer lab availability is here.

Your work

Update your resumes!:

Here is a link to a folder with presentations- our three posters, the three abstracts used to submit the posters for the undergraduate research festival, and the oral presentation from William & Mary. Please include these presentations in your resumes as appropriate to the author list.

I think it is a good idea for students to list GenBank submissions in your resume. You can always list it under your presenations/publications, with an "*this is a submission of an annotated bacteriophage genome to a public database GenBank. My role was to annotate a section (or X genes) in this genome." I will update the table below with the phages' GenBank accession numbers once they are received later this week.

This week we will write Genome Announcements to describe our phages. These are real publications with each of you as a coauthor! You should put a note on the item in your resume with "I am included as a co-author as a member of the 2016-2017 VCU Phage Hunters. My role was to annotate a section (or X genes) of the bacteriophage Y". You can also specify what role you played in writing the GA. Students who discover and characterize the bacteriophage will be listed individually.

Example Genome Announcements for Bacillus phages DirtyBetty and Kida from VCU,  Bacillus Phage Belinda from JMU, Bacillus Phage SalinJah from UMBC, and Bacillus Phage vB_BceS-MY192 from a non SEA PHAGES group. 

Each journal has their own rules for formatting and content for article submissions. Instructions to authors for Genome Announcements is linked here.



 Section 1Section 1Section 2Section 2Section 3/1 Section 3/1
Fasta fileZainnyJanetOTooleKemple52AaronPhadgersBubs PPIsBest
GenBank accession number MF288920.1 MF288922.1 MF288921.1 MF288919.1 MF288918.1  MF288917.1
Discovered byZainab GbadamosiBrenna KentThomas RaymondRahul WarrierErin Cochran Nashwan Farooque
Phage Morphologymyoviridaemyoviridaemyoviridaemyoviridaemyoviridae myoviridae
Genome length (bp)162692160705161807161772162449 162281

# ORFs

303285291301302 301
# tRNAs07730 0
% GC38.73837.938.738.8 38.6
DNAMaster fileZainny_aunoannotated.dnam5Janet_autoannotated.dnam5OTooleKemple52_autoannotated.dnam5AaronPhadgers_autoannotated.dnam5Bubs_autoannotated.dnam5 PPIsBest_autoannotated.dnam5
Merged DNAMaster fileZainny_final.dnam5









Genbank file for journal club Janet Genbank fileOTooleKemple52 Genbank file   PPIsBest Genbank file
Genemark coding potentialZainny coding potentialJanet coding potentialOTooleKemple52 coding potentialAaronPhadgers coding potentialBubs coding potential 

PPIsBest coding potential


PPIsBest annotators

Six frame translationZainny 6 frameJanet 6 frameOTooleKemple52 6 frameAaronPhadgers six frameBubs six frame 
GenBank submission file

Genbank file

Submission file

Genbank file

Submission file

Genbank file

 Submission file

Genbank file

Submission file

Genbank file

Submission file


Genbank author list  Zainny annotators Janet annotatorsOKT annotatorsAaronPhadgers annotatorsBubs annotators 
GA draftZainny GAJanet GA    
Plate pic 
TEM image  

VCU phages are archived here.

Co-authors for Genbank submissions are here.

Resetting Phamerator

If ever you need to reset settings in Phamerator, go to Edit > Preferences and type the following:


Database: Bacillus_Draft

After forcing a database update, you will be able to access the phage database (you may need to restart Phamerator)

CACAO GO Annotation

Phage Hunters GO annotation guide

GO annotation training slides from TAMU

 Allison's slides for March 22nd introduction to GO annotation, through slide 17

Form your teams by 3/17 using this link. For now, everyone should sign up for a team.

Week of March 20th- this week we will prepare for you to submit your first annotations starting on April 3rd, but by preparing, you'll also be ready to submit challenges next week.

- choose a favorite endolysin (section1)/capsid (section 3)/holin (section 2) pham from the Bacillus phage collection in phamerator.

- identify a published homolog to your favorite endolysin/capsid/holin pham using Blastp and HHPred. Record on your wiki a screenshot(s) of the match as well as the probability and evalue of the match. For this work, your statistics must be a maximum e-value of 10-7 for Blastp and a minimum probability of 0.9 for HHPred. Manual inspection of the alignments is required to ensure you have at least 75% coverage and 30% identity between the two sequences. Note for endolysin: you probably need to do this after splitting your sequence into the catalytic and cell-wall binding domains.

- From HHPred, save the paper describing the published homolog. Scour that paper to identify the experiment in the paper that supports the functional annotation of that protein. From HHPred, your papers are likely to be crystal structures of proteins, therefore support for functional annotaiton may come from a prior study showing a wet lab characterization of that protein's activity. You'll have to learn to trace through a paper's references to find an appropriate source. On your wiki, document the experiment you wish to use. Grab a screen shot of the figure or table, and the methods used to produce that data. Write a description in your own words of how the figure/table "proves" the activity of the protein. 

Week of March 27th- this week you will be able to submit your first challenges to GO annotations

Week of April 3rd- this week you will submit your first GO annotations!

Week of April 17th- annotation week. You can do transfer annotations to connect your protein of interest to an experimentally verified protein!

SEA PHAGES GO annotation term for transfer annotation

You can use this code for transfer annotations were you choose a homology-based evidence code. An extensive list of all the reference GO codes are listed here.

go_ref_id: GO_REF:0000100 title: Gene Ontology annotation by SEA-PHAGE biocurators authors: Ivan Erill, SEA-PHAGE biocurators year: 2014 abstract: This GO reference describes the criteria used by biocurators of the SEA-PHAGE consortium for the annotation of predicted gene products from newly sequenced bacteriophage genomes in the SEA-PHAGE and other databases and in the GenBank records periodically released to NCBI for these genomes. In particular, this GO reference describes the criteria used to assign evidence codes ISS, ISA, ISO, ISM, IGC and ND. To assign ISS, ISA, ISO and ISM evidence codes, SEA-PHAGE biocurators use a varied array of bioinformatics tools to establish homology and conservation of sequence and structure functional determinants with proteins from multiple organisms with published association to experimental GO terms and lacking NOT qualifiers. These proteins are referenced in the WITH field of the annotation using their xref database accession. The primary tools for homology search in ISS, ISA, ISO and ISM assignments are BLASTP and HHpred, using a maximum e-value of 10^-7 for BLASTP and a minimum probability of 0.9 for HHpred, and manual inspection of alignments in both cases. For ISS and ISA assignments, BLASTP alignments are required to have at least 75% coverage and 30% identity. For ISO assignments, orthology is further validated using reciprocal BLASTP with the identified hit. For HHpred results, ISS or ISM annotations are made only if the source for the original GO annotation explicitly defines a matched domain function, or if more than half of the domains of the query protein are identified in the matching protein. All ISS, ISA, ISO and ISM assignments entail the manual verification of the source for the GO term in the matching protein sequence and critical curator assessment of the likelihood of preservation of function, process or component in the context of bacteriophage biology. IGC codes are assigned on the basis of suggestive evidence for function based on synteny, as inferred from whole-genome comparative analyses of multiple bacteriophage genomes using primarily the Phamerator software platform, and with special emphasis on the bacteriophage virion structure and assembly genes. When extensive review of published literature on putative homologs reveals no experimental evidence of function, component or process for a particular gene product, it is assigned an ND evidence code and annotated to the root term for Cellular Component, Molecular Function and Biological Process. As part of the review process for assignment of ISS, ISA, ISO, IGC and ISM evidence codes, SEA-PHAGE curators are required to analyze the reference literature for identified matches and shall perform GO annotations with appropriate evidence codes if these were not available.

Some Guidelines and Links for GO Annotation:

  1. Identify a protein of interest.  For a standard annotation, the protein must have experimental evidence for function in a primary literature article. For a transfer annotation of a protein from one of our phage proteins to a close homolog, the homolog protein must have an existing GO annotation with experimental evidence for function in a primary literature citation.

2. Select an evidence codes: You will have to satisfy one of the approved evidence codes based on the type of data documenting the protein's function.

  • Here's a list of the evidence codes that CACAO students may use:
  1. IDA: Inferred from Direct Assay
  2. IMP: Inferred from Mutant Phenotype
  3. IGI: Inferred from Genetic Interaction - requires with/from field to be filled in
  4. ISS: Inferred from Sequence or Structural Similarity - almost always requires with/from field to be filled in
  5. ISO: Inferred from Sequence Orthology - requires with/from field to be filled in
  6. ISA: Inferred from Sequence Alignment - requires with/from field to be filled in
  7. ISM: Inferred from Sequence Model - requires with/from field to be filled in
  8. IGC: Inferred from Genomic Context

  ALL other codes, even if used correctly, will cause the annotation to be rejected by the judges

***Here is a full description of what all those codes mean

3. Select a GO term: You will have to pick the appropriate GO term, which can only be as specific as the data shows.

FIrst, think about whether your protein's evidence points to Cellular Component, Biological Process or Molecular Function.

***Here is a full description of CC/BP/MF.

Second, check out existing GO term annotations in Uniprot, Amigo and QuickGO. (make sure to record your uniprot identifier and confirm you have the right protein!)

Third, check out existing GO term annotations on GONuts wiki. You can re-use a GO term to offer additional support. You can't submit a second annotation using the same evidence.

Fourth, re-examine your evidence and select the appropriate GO term. 

4. Write your notes: Clearly and carefully document the evidence that is displayed in a figure or table for your protein. Write this evidence in your own words, without copy/pasting from the article.

Here's a paper describing GONUTS wiki.



Section 1 Endolysin functional anntoation

TsarBomba gp40 pham 168

Anthos gp54 pham 182

Nigalana gp74 pham 3097

Vinny gp63 pham 4563

Claudi gp39 pham 4564

Taylor gp31 pham 137

Pegasus gp108 pham 1683

Harambe pham 1759 (I'm guessing on this one)

Update: AJ posted the combined files here for Endolysin Catalytic domain and Endolysin cell wall binding domain. Use these for your journal club phylogeny!

This file contains the full length endolysin sequences plus three structural homologs (to Nigalana, Anthos and TsarBomba groups).

March 27th- challenge

  1. Can you figure out what is wrong with this endolysin(?) annotation from last year? Evaluate the suitability of the GO ID, the Reference, the Evidence Code, and the evidence described by the student in the Notes. Post this evaluation to your wiki.
  2. Can you figure out what is wrong with this endolysin annotation from last year? Note that the annotation for "GO:0009253" was rejected, and the annotation for "GO:0051672 was accepted by the TAMU folks making a minor change. Evaluate the suitability of the GO ID, the Reference, the Evidence Code, and the evidence described by the student in the Notes. Post this evaluation to your wiki.


Section 2 Holin

Sequence file for holin pham: holin_pham634

March 29th- challenge

Section 3 Capsid

Sequence file for capsid phams:

CapsidPham58 (myoviruses)

CapsidPham65 (podoviruses)

March 29th- challenge

  1. Can you figure out what is wrong with this capsid annotation from last year? Evaluate the suitability of the GO ID, the Reference, the Evidence Code, and the evidence described by the student in the Notes. Post this evaluation to your wiki.
  2. Can you figure otu what is wrong with this capsid annotation from last year? Look for the row with the white background in the table, under the automatically generated annotations with a green background. Evaluate the suitability of the GO ID, the Reference, the Evidence Code, and the evidence described by the student in the Notes. Post this evaluation to your wiki.

Comparative Genomics tools

You can use these tools to create data, but Bioinformatics isn't just button clicking. Make sure you know what you are comparing, why you are doing that comparison, and work hard to understand what the data means.

  1. Phamerator: Comparative genomics tool in the SEA Virtual Machine in the computer lab. You can use this tool to generate map images, explore protein phamilies, and protein domains (this function seems to not be working at the moment).
  2. Gepard dot plot tool: This tool takes a fasta file as input for both the x and y axis. You can upload the same or different files depending on the comparison you want to make. Requires your java to be up-to-date. Default word size is 10bp, so the tool places a dot for each 10bp that is identical between two sequences in a pairwise alignment. Note that for the journal club you can use blastn, but you are also welcome to try this tool to produce a prettier dotplot.
  3. Clustal omega: Quick multiple sequence alignment and phylogenetic tree tool. This tool takes a multi-fasta file as input. Requires your java to be up-to-date.
  4. Splitstree: This tool takes a modified phages vs. phams table (which is then converted to a nexus file) as input and generates an unrooted phylogenetic tree based on shared protein content. The splits in the tree indicate the extent of shared content. A nexus file you can use as input based on our current phamerator database is on blackboard in Course Documents because the wiki wouldn't let me upload.

Posters for VCU Undergraduate Research Festival

Organized by the Undergraduate Research Opportunities Program (UROP) and part of VCU Student Research Weeks, the annual VCU Poster Symposium for Undergraduate Research and Creativity is a wonderful opportunity for students to present their research endeavors and creative scholarship to their academic peers, members of the VCU faculty, community members, and friends and family.  All undergrads from every discipline are encouraged to present and attend.  Presentations may be for completed research projects, completed papers, or research in progress.

Projects involving creative work such as prose or poetry, performances, and artwork will be considered for acceptance if they are part of a scholarly project undertaken by the student.  We are currently accepting poster abstracts up until the deadline of March 22nd, 2017.  All abstracts should be submitted to

After students are notified of their acceptance, we will accept electronic file submission of their posters.  Note: We hold poster workshops Jan. – Mar. and we are now able to print research posters free of cost to our students!  A schedule for upcoming poster workshops will be posted on the UROP Blog next week.

Abstracts should include: Name/Major of student, Name/Dept. of Faculty Mentor, Title of research Project, Brief description of research project.  All inquiries to  

Research Weeks will take place throughout the month of April, and the symposium will take place on April 19th.  We would also be happy to add your event to the week(s) and assist in publicizing it.

Abstracts are due March 22nd!

  1. Yeast 2 Hybrid: Rashmi can submit abstract. The draft poster is here. We need one or two people to finalize the poster for printing. 

       2. Host range: Emaan can submit abstract. The draft poster is here. We need two people to finalize the poster for printing. 

       3. Spring bioinformatics poster!! We will talk about what we want to do and who can contribute pieces.

For February 15th week:

For annotating functional information, let's look at OTooleKemple52 terminase protein.

  1. Blastp conserved domains-  A conserved domain is a distinct functional (and often structural) unit of a protein. A sub-tool in Blastp queries a conserved domains database [consisting of pre-calculated position specific score matrices made from multiple sequence alignments] to identify conserved domains in protein sequences. Please report the protein function name and the E value. The source is blastp conserved domains. Hit must have E-value of 10-5 or lower. If you want to know more, this page describes the tool and how to read the information.
  2. HHPred-  This tool compares your amino acid sequence to a structural database to predict function. The algorithm (hidden Markov model) uses a profile-sequence comparison to find the best match to your query. The profile of each amino acid position includes information about residue conservation and preferred amino acids ("how important each position is for defining other members of the protein family"). The database behind HHPred is a protein structural database. Structure leads to function and structure is more conserved than sequence. Please report the protein function name, the probability and the E value. The source is HHPred. Hit must have probability >80% and an E-value of 10-5 or lower. If you want to know more, this page describes HHPred.
  3. Phamerator- This is a mapping tool that visually shows each protein in a phage genome as well as blastp conserved domains. It is only available through a linux virtual machine installed on the computers in Harris 3112. Please report the protein function name. The source is Phamerator.

For February 8 week labs:

This excel file has your annotation ranges. You have a month to complete the annotation of up to 80 proteins, as I'd like these submitted by spring break. This is a lot! Note I used four people from section 1 to help annotate section 3 phages because of the difference in the number of students. Please use your lab time efficiently and eliminate distraction so you can get as much work done in lab as possible. You're welcome to bring headphones and tune out your classmates. (smile)

For February 1st week labs: 

Slides about blast and coding potential data, and how to get started with annotation.

Sequence file for OTooleKemple52 protein 1.

Sequence file for PPIsBest protein1.

Sequence file for Janet protein 1.

DNAMaster Annotation Guide is on blackboard! See page 64 for guiding principles of annotation.

Here are three slides on how to make a wiki page! I inserted these slides into my wiki page by clicking on the link button above. 


  • No labels