GenBank Flat File

Example of GenBank Flat File format

  • NCBI GenBank Flat File Format Sample record: https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
  • Example file: genbank_example

Handling GenBank Flat File

Load a .gb file

import os
from skbio import DNA, RNA, sequence

dna_seq = DNA.read('./sequence.gb')

Display information on the sequence

DNA
----------------------------------------------------------------------
Metadata:
    'ACCESSION': 'AH002671 K02214 M18715 M18717 M18719 M18721 M18723
                  M18725 M18727 M18782 M18784 M18786'
    'COMMENT': <class 'str'>
    'DEFINITION': 'Homo sapiens alpha-amylase (AMY1) gene, complete
                   cds; and AMY1A gene, complete sequence.'
    'KEYWORDS': 'alpha-amylase; amylase.'
    'LOCUS': <class 'dict'>
    'REFERENCE': <class 'list'>
    'SOURCE': <class 'dict'>
    'VERSION': 'AH002671.2'
Interval metadata:
    41 interval features
Stats:
    length: 3429
    has gaps: False
    has degenerates: True
    has definites: True
    GC-content: 29.05%
----------------------------------------------------------------------
0    GCATTCAAGT TAACTCTTCC CCTTGGTATC TGTACATACC TTTGATGTCA GTGTTTAGTA
60   CACGTGGCTT GGTCACTTCA TGGCTAAAAA CGTGCTTGTG GAAGACAAGT CTGGCTTGGT
...
3360 TGCTGAATCT AAATTGTAAA ATTTAAAATT AAATGCAAAT CCGCAAAGCA ATAGCTAAGT
3420 GTGTTTCTT

Display accession numbers

dna_seq.metadata['ACCESSION']
'AH002671 K02214 M18715 M18717 M18719 M18721 M18723 M18725 M18727 M18782 M18784 M18786'

Display sequences

str(dna_seq)
'GCATTCAAGTTAACTCTTCCCCTTGGTATCTGTACATACCTTTGATGTCAGTGTTTAGTACACGTGGCTTGGTCACTTCATGGCTAAAAACGTGCTTGTGGAAGACAAGTCTGGCTTGGTGAGTCTGTGTGGTCAGCAGTCTCTGATCCGTGCAGGGTATTAATGTGTCAGGGCTGAGTGTTCTGAGATTTATCTAGAGGCTGGGAAGGGCTCCTGAACCAGTTGTTTCCGTCTTGTCGGTCTGTCAGGGTTGGAAAGTCCAAGCCATAGGACCCAGTTTCCTTTCTTAGCTTACGTTATCTACCAGAGCACCGTGGGCTGTTACTTGCCTTGAGTTGGAAGCGGTTCGCATTTATACCGGTAAATGTATTCATCCTTTTAATTTATGTAAAGTTTTTTAGTATGCAATTCTCGATCTTTTAAGAGTTGACAACAAATTTTGGTTTTCTGCTGTTATGTGAGAACATTAGGCCACAGCAACATGTCATTGTGTAAGGAAAAATAAAAGTGCTACCATATGCAAAAAAAAAAAAAAAAAGAAAAGAAAAGAAACATTAATGTCTAAGAGGTCATTGAGATGATTTCCATGAGAGACTTTTTGATGTTCTTCACCAGTTAGGATTATTATTGATAATCCTTTTCAGATTATGAATAAACAGTTTGCCCTCAAGTATTTATTCATGCTACTATTTACATTGTAAAATGTGCTTCTTACAGGAATATAAATAGTTTCTGGAAAGGACACTGACAACTTCAAAGCAAAATGAAGCTCTTTTGGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCTCAAATACACAACAAGGACGAACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTTAGCTCCCAAGGGATTTGGAGGGGTTCAGGTGGGTATGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTAATTTGTAGGTCTCTCCACCAAATGAAAATGTTGCCATTCACAACCCTTTCAGACCTTGGTGGGAAAGATACCAACCAGTTAGCTATAAATTATGCACAAGATCTGGAAATGAAGATGAATTTAGAAACATGGTGACTAGATGCAACAATGTTGGGGTAAGTGAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGTTTTCTAGGTTCGTATTTATGTGGATGCTGTAATTAATCATATGTGTGGTAATGCTGTGAGTGCAGGAACAAGCAGTACCTGTGGAAGTTACTTCAACCCTGGAAGTAGGGACTTTCCAGCAGTCCCATATTCTGGATGGGATTTTAATGATGGTAAATGTAAAACTGGAAGTGGAGATATCGAGAACTATAATGATGCTACTCAGGTAATTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACCTCAACAGGTCAGAGATTGTCGTCTGTCTGGTCTTCTCGATCTTGCACTGGGGAAGGATTATGTGCGTTCTAAGATTGCCGAATATATGAACCATCTCATTGACATTGGTGTTGCAGGGTTCAGAATTGATGCTTCCAAGCACATGTGGCCTGGAGACATAAAGGCAATTTTGGACAAACTGCATAATCTAAACAGTAACTGGTTCCCGGAAGGTAGTAAACCTTTCATTTACCAGGAGGTACGTCAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTCTTCTAGGTAATTGATCTGGGTGGTGAGCCAATTAAAAGCAGTGACTACTTTGGTAATGGCCGGGTGACAGAATTCAAGTATGGTGCAAAACTCGGCACAGTTATTCGCAAGTGGAATGGAGAGAAGATGTCTTACTTAAAGTAAATAAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCAAAAATAGGAACTGGGGAGAAGGTTGGGGTTTCATGCCTTCTGACAGAGCGCTTGTCTTTGTGGATAACCATGACAATCAACGAGGACATGGCGCTGGAGGAGCCTCTATACTTACCTTCTGGGATGCTAGGTAAAAAACCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTAACTTTCAGGCTGTACAAAATGGCAGTTGGATTTATGCTTGCTCATCCTTATGGATTTACACGAGTAATGTCAAGCTACCGTTGGCCAAGATATTTTGAAAATGGAAANGTAAGTTTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGTAATTAAGGATGTTAATGATTGGGTTGGGCCACCAAATGATAATGGAGTAACTAAAGAAGTTACTATTAATCCAGACACTACTTGTGGCAATGACTGGGTCTGTGAACATCGATGGCGCCAAATAAGGTGAGAATATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGTTTTTAGGAACATGGTTAATTTCCGCAATGTAGTGGATGGCCAGCCTTTTACAAACTGGTATGATAATGGGAGCAACCAAGTGGCTTTTGGGAGAGGAAACAGAGGATTCATTGTTTTCAACAATGATGACTGGTAAGTAAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATTTTACAGGACATTTTCTTTAACTTTGCAAACTGGTCTTCCTGCTGGCACATACTGTGATGTCATTTCTGGAGATAAAATTAATGGCAACTGCACAGGCATTAAAATCTACGTTTCTGATGATGGCAAAGCTCATTTTTCTATTAGTAACTCTGCTGAAGATCCATTTATTGCAATTCATGCTGAATCTAAATTGTAAAATTTAAAATTAAATGCAAATCCGCAAAGCAATAGCTAAGTGTGTTTCTT'

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top