problems with MAF alignment file in Galaxy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

problems with MAF alignment file in Galaxy

Amit Pande-2
Dear Galaxy,

I am trying to import a multiz alignment file for all the insect species from the UCSC
genome browser.
Galaxy does not recognize number of blocks in the multiz file as there is a question mark in the file format view (? blocks).
Then when I am trying to use the tool ( Extract MAF blocks given a set of genomic intervals) then there is an error saying
 
"An error occurred with this dataset:191757 MAF blocks converted to Genomic Intervals for species dm3. There was a problem processing your input: exceptions must be old-style classes or derived from BaseException, not str" and even when the tool runs it shows the following message "
This is a new dataset and not all of its data are available yet "

Please look into the problem.

warm regards,
Amit.

___________________________________________________________
The Galaxy User List is being replaced by the Galaxy Biostar
User Support Forum at https://biostar.usegalaxy.org/

Posts to this list will be disabled in May 2014.  In the
meantime, you are encouraged to post all new questions to
Galaxy Biostar.

For discussion of local Galaxy instances and the Galaxy
source code, please use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: problems with MAF alignment file in Galaxy

Jen Hillman-Jackson
Hi Amit,

This is occurring when you are uploading a MAF "multizXXway" file obtained from UCSC downloads (genome.ucsc.edu) to Galaxy main (usegalaxy.org)? Upload using FTP? https://wiki.galaxyproject.org/Support#Loading_data

The table browser is generally a poor choice to extract more than a few regions with MAF data (per query) as there are limits on how many lines of output will be sent over. Incomplete transfers are a common. This error could be related to a format or datatype assignment issue from that type of issue.

Please give FTP loading a try if you have not already. Then if problems continue, you can share a history link with me. Note which dataset was the MAF uploaded via FTP. This is how to share: https://wiki.galaxyproject.org/Learn/Share

Best,

Jen
Galaxy team

Going forward, please ask questions on our new forum that is replacing this list (very soon now):
https://wiki.galaxyproject.org/Support#Biostar

On 5/14/14 10:52 PM, Amit Pande wrote:
Dear Galaxy,

I am trying to import a multiz alignment file for all the insect species from the UCSC
genome browser.
Galaxy does not recognize number of blocks in the multiz file as there is a question mark in the file format view (? blocks).
Then when I am trying to use the tool ( Extract MAF blocks given a set of genomic intervals) then there is an error saying
 
"An error occurred with this dataset:191757 MAF blocks converted to Genomic Intervals for species dm3. There was a problem processing your input: exceptions must be old-style classes or derived from BaseException, not str" and even when the tool runs it shows the following message "
This is a new dataset and not all of its data are available yet "

Please look into the problem.

warm regards,
Amit.


___________________________________________________________
The Galaxy User List is being replaced by the Galaxy Biostar
User Support Forum at https://biostar.usegalaxy.org/

Posts to this list will be disabled in May 2014.  In the
meantime, you are encouraged to post all new questions to
Galaxy Biostar.

For discussion of local Galaxy instances and the Galaxy
source code, please use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User List is being replaced by the Galaxy Biostar
User Support Forum at https://biostar.usegalaxy.org/

Posts to this list will be disabled in May 2014.  In the
meantime, you are encouraged to post all new questions to
Galaxy Biostar.

For discussion of local Galaxy instances and the Galaxy
source code, please use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: problems with MAF alignment file in Galaxy

Jen Hillman-Jackson
Hi Amit,

The problem has to do with the MAF files themselves - they are truncated. This occurs when MAF data is extracted from the Table browser (or any data in excess of ~100k lines). This data greatly exceeds that:
Database: dm3    Primary Table: multiz15way    Row Count: 1,633,505

The ends of both dataset #1 and dataset #3 have this warning:
---------------------------------------------------------------------------
procedures have exceeded timeout: 1200 seconds, function has ended. ---------------------------------------------------------------------------

Instead, you have two options:

1 - obtain the MAF files from the UCSC downloads area. Go to http://genome.ucsc.edu, then in the left blue side bar select "Downloads", then navigate to the data for dm3. The multiz (MAF) will be under the Conservation track data.

2 - this same MAF data is cashed as a local data source on usegalaxy.org. If you queried the blocks using an interval/bed file of coordinates (assigned with database "dm3"), you could obtain the intervals that way. UCSC has the chromsome names and lengths on the D. Mel home page in a table found by clicking on the link near the top named "Sequences". Simple files from this info can be pasted into the "Get Data -> Upload file" tool form to create one-line query datasets, like this one:




I was able to run " Extract MAF blocks" then "MAF to Interval" with no problems. I don't know if doing this one chromosome at time is required, but it certainly will work and not exceed any resources. I suggested doing this once, building a workflow, then running on the rest in batch.

Hopefully one of these options works out for you!

Jen
Galaxy team

On 5/16/14 6:59 AM, Amit Pande wrote:
Dear Jennifer,

I uploaded the data both the ways i.e via the FTP and through the UCSC browser, but all the attempts to extract MAF blocks between insect species has failed.
I need your help in this regard, so I have shared my history with you.
warm regards,
Amit. 


On Thu, May 15, 2014 at 5:07 PM, Jennifer Jackson <[hidden email]> wrote:
Hi Amit,

This is occurring when you are uploading a MAF "multizXXway" file obtained from UCSC downloads (genome.ucsc.edu) to Galaxy main (usegalaxy.org)? Upload using FTP? https://wiki.galaxyproject.org/Support#Loading_data

The table browser is generally a poor choice to extract more than a few regions with MAF data (per query) as there are limits on how many lines of output will be sent over. Incomplete transfers are a common. This error could be related to a format or datatype assignment issue from that type of issue.

Please give FTP loading a try if you have not already. Then if problems continue, you can share a history link with me. Note which dataset was the MAF uploaded via FTP. This is how to share: https://wiki.galaxyproject.org/Learn/Share

Best,

Jen
Galaxy team

Going forward, please ask questions on our new forum that is replacing this list (very soon now):
https://wiki.galaxyproject.org/Support#Biostar


On 5/14/14 10:52 PM, Amit Pande wrote:
Dear Galaxy,

I am trying to import a multiz alignment file for all the insect species from the UCSC
genome browser.
Galaxy does not recognize number of blocks in the multiz file as there is a question mark in the file format view (? blocks).
Then when I am trying to use the tool ( Extract MAF blocks given a set of genomic intervals) then there is an error saying
 
"An error occurred with this dataset:191757 MAF blocks converted to Genomic Intervals for species dm3. There was a problem processing your input: exceptions must be old-style classes or derived from BaseException, not str" and even when the tool runs it shows the following message "
This is a new dataset and not all of its data are available yet "

Please look into the problem.

warm regards,
Amit.


___________________________________________________________
The Galaxy User List is being replaced by the Galaxy Biostar
User Support Forum at https://biostar.usegalaxy.org/

Posts to this list will be disabled in May 2014.  In the
meantime, you are encouraged to post all new questions to
Galaxy Biostar.

For discussion of local Galaxy instances and the Galaxy
source code, please use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org


-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User List is being replaced by the Galaxy Biostar
User Support Forum at https://biostar.usegalaxy.org/

Posts to this list will be disabled in May 2014.  In the
meantime, you are encouraged to post all new questions to
Galaxy Biostar.

For discussion of local Galaxy instances and the Galaxy
source code, please use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Loading...