GFF parsing with biopython

classic Classic list List threaded Threaded
1 message Options
Mic
Reply | Threaded
Open this post in threaded view
|

GFF parsing with biopython

Mic
Hi,
I have the following GFF file from a SNAP

X1       SNAP    Einit   2579    2712    -3.221  +       .       X1-snap.1
X1       SNAP    Exon    2813    2945    4.836   +       .       X1-snap.1
X1       SNAP    Eterm   3013    3033    10.467  +       .       X1-snap.1
X1       SNAP    Esngl   3457    3702    -17.856 +       .       X1-snap.2
X1       SNAP    Einit   4901    4974    -4.954  +       .       X1-snap.3
X1       SNAP    Eterm   5021    5150    14.231  +       .       X1-snap.3
X1       SNAP    Einit   6245    7325    -1.525  -       .       X1-snap.4
X1       SNAP    Eterm   5974    6008    5.398   -       .       X1-snap.4


With the code below I have tried to parse the above GFF file

from BCBio import GFF
from pprint import pprint
from BCBio.GFF import GFFExaminer

def retrieve_pred_genes_data():
    with open("test/X1_small.snap.gff") as sf:
        #examiner = GFFExaminer()
        #pprint(examiner.available_limits(sf))

        for rec in GFF.parse(sf):
            pprint(rec.id)
            pprint(rec.description)
            pprint(rec.name)
            pprint(rec.features)
            #pprint(rec.type)              #'SeqRecord' object has no attribute
            #pprint(rec.ref)               #'SeqRecord' object has no attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no attribute
            #pprint(rec.location)          #'SeqRecord' object has no attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no attribute
            #pprint(rec.strand)            #'SeqRecord' object has no attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no attribute

retrieve_pred_genes_data()


and got the following output:

'X1'
'<unknown description>'
'<unknown name>'
[SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712), strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945), strand=1), type='Exon'),
 SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033), strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702), strand=1), type='Esngl'),
 SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974), strand=1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150), strand=1), type='Eterm'),
 SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325), strand=-1), type='Einit'),
 SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008), strand=-1), type='Eterm')]

and with GFFExaminer I got these:

{'gff_id': {('X1',): 8},
 'gff_source': {('SNAP',): 8},
 'gff_source_type': {('SNAP', 'Einit'): 3,
                     ('SNAP', 'Esngl'): 1,
                     ('SNAP', 'Eterm'): 3,
                     ('SNAP', 'Exon'): 1},
 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}}


I found these examples ( https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt ), but I got these kind of errors:
            #pprint(rec.type)              #'SeqRecord' object has no attribute
            #pprint(rec.ref)               #'SeqRecord' object has no attribute
            #pprint(rec.ref_db)            #'SeqRecord' object has no attribute
            #pprint(rec.location)          #'SeqRecord' object has no attribute
            #pprint(rec.location_operator) #'SeqRecord' object has no attribute
            #pprint(rec.strand)            #'SeqRecord' object has no attribute
            #pprint(rec.sub_features)      #'SeqRecord' object has no attribute

What did I do wrong and how is it possible to access all fields in the above GFF file?

Thank you in advance.

Mic



___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/