Example of GenBank Flat File format
- NCBI GenBank Flat File Format Sample record: https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
- Example file: genbank_example
Handling GenBank Flat File
Load a .gb file
import os
from skbio import DNA, RNA, sequence
dna_seq = DNA.read('./sequence.gb')
Display information on the sequence
DNA
----------------------------------------------------------------------
Metadata:
'ACCESSION': 'AH002671 K02214 M18715 M18717 M18719 M18721 M18723
M18725 M18727 M18782 M18784 M18786'
'COMMENT': <class 'str'>
'DEFINITION': 'Homo sapiens alpha-amylase (AMY1) gene, complete
cds; and AMY1A gene, complete sequence.'
'KEYWORDS': 'alpha-amylase; amylase.'
'LOCUS': <class 'dict'>
'REFERENCE': <class 'list'>
'SOURCE': <class 'dict'>
'VERSION': 'AH002671.2'
Interval metadata:
41 interval features
Stats:
length: 3429
has gaps: False
has degenerates: True
has definites: True
GC-content: 29.05%
----------------------------------------------------------------------
0 GCATTCAAGT TAACTCTTCC CCTTGGTATC TGTACATACC TTTGATGTCA GTGTTTAGTA
60 CACGTGGCTT GGTCACTTCA TGGCTAAAAA CGTGCTTGTG GAAGACAAGT CTGGCTTGGT
...
3360 TGCTGAATCT AAATTGTAAA ATTTAAAATT AAATGCAAAT CCGCAAAGCA ATAGCTAAGT
3420 GTGTTTCTT
Display accession numbers
dna_seq.metadata['ACCESSION']
'AH002671 K02214 M18715 M18717 M18719 M18721 M18723 M18725 M18727 M18782 M18784 M18786'
Display sequences
str(dna_seq)
'GCATTCAAGTTAACTCTTCCCCTTGGTATCTGTACATACCTTTGATGTCAGTGTTTAGTACACGTGGCTTGGTCACTTCATGGCTAAAAACGTGCTTGTGGAAGACAAGTCTGGCTTGGTGAGTCTGTGTGGTCAGCAGTCTCTGATCCGTGCAGGGTATTAATGTGTCAGGGCTGAGTGTTCTGAGATTTATCTAGAGGCTGGGAAGGGCTCCTGAACCAGTTGTTTCCGTCTTGTCGGTCTGTCAGGGTTGGAAAGTCCAAGCCATAGGACCCAGTTTCCTTTCTTAGCTTACGTTATCTACCAGAGCACCGTGGGCTGTTACTTGCCTTGAGTTGGAAGCGGTTCGCATTTATACCGGTAAATGTATTCATCCTTTTAATTTATGTAAAGTTTTTTAGTATGCAATTCTCGATCTTTTAAGAGTTGACAACAAATTTTGGTTTTCTGCTGTTATGTGAGAACATTAGGCCACAGCAACATGTCATTGTGTAAGGAAAAATAAAAGTGCTACCATATGCAAAAAAAAAAAAAAAAAGAAAAGAAAAGAAACATTAATGTCTAAGAGGTCATTGAGATGATTTCCATGAGAGACTTTTTGATGTTCTTCACCAGTTAGGATTATTATTGATAATCCTTTTCAGATTATGAATAAACAGTTTGCCCTCAAGTATTTATTCATGCTACTATTTACATTGTAAAATGTGCTTCTTACAGGAATATAAATAGTTTCTGGAAAGGACACTGACAACTTCAAAGCAAAATGAAGCTCTTTTGGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCTCAAATACACAACAAGGACGAACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTTAGCTCCCAAGGGATTTGGAGGGGTTCAGGTGGGTATGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTAATTTGTAGGTCTCTCCACCAAATGAAAATGTTGCCATTCACAACCCTTTCAGACCTTGGTGGGAAAGATACCAACCAGTTAGCTATAAATTATGCACAAGATCTGGAAATGAAGATGAATTTAGAAACATGGTGACTAGATGCAACAATGTTGGGGTAAGTGAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGTTTTCTAGGTTCGTATTTATGTGGATGCTGTAATTAATCATATGTGTGGTAATGCTGTGAGTGCAGGAACAAGCAGTACCTGTGGAAGTTACTTCAACCCTGGAAGTAGGGACTTTCCAGCAGTCCCATATTCTGGATGGGATTTTAATGATGGTAAATGTAAAACTGGAAGTGGAGATATCGAGAACTATAATGATGCTACTCAGGTAATTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNACCTCAACAGGTCAGAGATTGTCGTCTGTCTGGTCTTCTCGATCTTGCACTGGGGAAGGATTATGTGCGTTCTAAGATTGCCGAATATATGAACCATCTCATTGACATTGGTGTTGCAGGGTTCAGAATTGATGCTTCCAAGCACATGTGGCCTGGAGACATAAAGGCAATTTTGGACAAACTGCATAATCTAAACAGTAACTGGTTCCCGGAAGGTAGTAAACCTTTCATTTACCAGGAGGTACGTCAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTCTTCTAGGTAATTGATCTGGGTGGTGAGCCAATTAAAAGCAGTGACTACTTTGGTAATGGCCGGGTGACAGAATTCAAGTATGGTGCAAAACTCGGCACAGTTATTCGCAAGTGGAATGGAGAGAAGATGTCTTACTTAAAGTAAATAAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCAAAAATAGGAACTGGGGAGAAGGTTGGGGTTTCATGCCTTCTGACAGAGCGCTTGTCTTTGTGGATAACCATGACAATCAACGAGGACATGGCGCTGGAGGAGCCTCTATACTTACCTTCTGGGATGCTAGGTAAAAAACCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTAACTTTCAGGCTGTACAAAATGGCAGTTGGATTTATGCTTGCTCATCCTTATGGATTTACACGAGTAATGTCAAGCTACCGTTGGCCAAGATATTTTGAAAATGGAAANGTAAGTTTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGTAATTAAGGATGTTAATGATTGGGTTGGGCCACCAAATGATAATGGAGTAACTAAAGAAGTTACTATTAATCCAGACACTACTTGTGGCAATGACTGGGTCTGTGAACATCGATGGCGCCAAATAAGGTGAGAATATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGTTTTTAGGAACATGGTTAATTTCCGCAATGTAGTGGATGGCCAGCCTTTTACAAACTGGTATGATAATGGGAGCAACCAAGTGGCTTTTGGGAGAGGAAACAGAGGATTCATTGTTTTCAACAATGATGACTGGTAAGTAAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATTTTACAGGACATTTTCTTTAACTTTGCAAACTGGTCTTCCTGCTGGCACATACTGTGATGTCATTTCTGGAGATAAAATTAATGGCAACTGCACAGGCATTAAAATCTACGTTTCTGATGATGGCAAAGCTCATTTTTCTATTAGTAACTCTGCTGAAGATCCATTTATTGCAATTCATGCTGAATCTAAATTGTAAAATTTAAAATTAAATGCAAATCCGCAAAGCAATAGCTAAGTGTGTTTCTT'