Join, Subtract and Group PLUS large GOA files

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Join, Subtract and Group PLUS large GOA files

Colleen Burge
Hello all,

I've been using the "Join, Subtract and Group" to join my transcriptome/annotation data to GO and GO Slim for some time (in the Main galaxy).  I just updated my GO files as I've run a a new data set, and have been having trouble with the joining function, it never seems to complete (while before it would be done in just a few minutes).  It works just fine joining my "new data" with my "old" GO files (which of course are now out of date) but not the new GO files from both my collaborator and from EBI (specifically the unipro).  Not sure if its a file size limitation?

Thanks,
Colleen

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Join, Subtract and Group PLUS large GOA files

Jen Hillman-Jackson
Hello Colleen,

The tool " Join two Datasets" could potentially create a very large dataset if the inputs shared enough common keys. The pathological case where every input from the new query was joined with all of the entries in the target would lead to longer processing times (and potentially memory problems), if either of these were large to begin with. The results in this case most likely wouldn't be useful, so even if the job is eventually successful, you will want to investigate the content.

I would suggest one more test: taking one of your old datasets, and using it as query against the new GO files, then comparing the results vs those you had from the old GO files (or generate it again, new, as a direct comparison). I job that runs longer should be left to just execute. If it fails (likely a memory problem), then consider only running a sample of the data in this manor. This will probably tell you much about the composition of the new GO files themselves and potentially how to correct the problem. If the new GO files do not seem to be the problem, then you will know that the issue must be in the new data - perhaps something went wrong in the annotation process that is linking in too many assignments?

Please let us know how it goes and if you need more help,

Jen
Galaxy team

On 4/23/13 1:04 PM, Colleen Burge wrote:
Hello all,

I've been using the "Join, Subtract and Group" to join my transcriptome/annotation data to GO and GO Slim for some time (in the Main galaxy).  I just updated my GO files as I've run a a new data set, and have been having trouble with the joining function, it never seems to complete (while before it would be done in just a few minutes).  It works just fine joining my "new data" with my "old" GO files (which of course are now out of date) but not the new GO files from both my collaborator and from EBI (specifically the unipro).  Not sure if its a file size limitation?

Thanks,
Colleen


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: Join, Subtract and Group PLUS large GOA files

Colleen Burge-2
Hi Jen,

I've been using this tool to join based on one key, and never had any trouble.  I've tried with my old data sets and the new GO files, and once again they don't ever finish--I stopped it after 12 hours. 

I'm wondering if there's also an issue with my account?

Thanks,
Colleen:)


On Tue, Apr 23, 2013 at 4:35 PM, Jennifer Jackson <[hidden email]> wrote:
Hello Colleen,

The tool " Join two Datasets" could potentially create a very large dataset if the inputs shared enough common keys. The pathological case where every input from the new query was joined with all of the entries in the target would lead to longer processing times (and potentially memory problems), if either of these were large to begin with. The results in this case most likely wouldn't be useful, so even if the job is eventually successful, you will want to investigate the content.

I would suggest one more test: taking one of your old datasets, and using it as query against the new GO files, then comparing the results vs those you had from the old GO files (or generate it again, new, as a direct comparison). I job that runs longer should be left to just execute. If it fails (likely a memory problem), then consider only running a sample of the data in this manor. This will probably tell you much about the composition of the new GO files themselves and potentially how to correct the problem. If the new GO files do not seem to be the problem, then you will know that the issue must be in the new data - perhaps something went wrong in the annotation process that is linking in too many assignments?

Please let us know how it goes and if you need more help,

Jen
Galaxy team


On 4/23/13 1:04 PM, Colleen Burge wrote:
Hello all,

I've been using the "Join, Subtract and Group" to join my transcriptome/annotation data to GO and GO Slim for some time (in the Main galaxy).  I just updated my GO files as I've run a a new data set, and have been having trouble with the joining function, it never seems to complete (while before it would be done in just a few minutes).  It works just fine joining my "new data" with my "old" GO files (which of course are now out of date) but not the new GO files from both my collaborator and from EBI (specifically the unipro).  Not sure if its a file size limitation?

Thanks,
Colleen


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Join, Subtract and Group PLUS large GOA files

Jen Hillman-Jackson
Hello Colleen,

This helps to localize the problem to the new GO files you are working with (the software did not change).

A small sample of your original data, run against the old and new GO files until completed, then compared to eachother, is probably the best way to understand what it is about the new files that is different/causing all of the new joins. This may help you to find a data problem, or to understand why the change is present (if it is a correct change).

Good luck with your project,

Jen
Galaxy team


On 4/25/13 7:06 AM, Colleen Burge wrote:
Hi Jen,

I've been using this tool to join based on one key, and never had any trouble.  I've tried with my old data sets and the new GO files, and once again they don't ever finish--I stopped it after 12 hours. 

I'm wondering if there's also an issue with my account?

Thanks,
Colleen:)


On Tue, Apr 23, 2013 at 4:35 PM, Jennifer Jackson <[hidden email]> wrote:
Hello Colleen,

The tool " Join two Datasets" could potentially create a very large dataset if the inputs shared enough common keys. The pathological case where every input from the new query was joined with all of the entries in the target would lead to longer processing times (and potentially memory problems), if either of these were large to begin with. The results in this case most likely wouldn't be useful, so even if the job is eventually successful, you will want to investigate the content.

I would suggest one more test: taking one of your old datasets, and using it as query against the new GO files, then comparing the results vs those you had from the old GO files (or generate it again, new, as a direct comparison). I job that runs longer should be left to just execute. If it fails (likely a memory problem), then consider only running a sample of the data in this manor. This will probably tell you much about the composition of the new GO files themselves and potentially how to correct the problem. If the new GO files do not seem to be the problem, then you will know that the issue must be in the new data - perhaps something went wrong in the annotation process that is linking in too many assignments?

Please let us know how it goes and if you need more help,

Jen
Galaxy team


On 4/23/13 1:04 PM, Colleen Burge wrote:
Hello all,

I've been using the "Join, Subtract and Group" to join my transcriptome/annotation data to GO and GO Slim for some time (in the Main galaxy).  I just updated my GO files as I've run a a new data set, and have been having trouble with the joining function, it never seems to complete (while before it would be done in just a few minutes).  It works just fine joining my "new data" with my "old" GO files (which of course are now out of date) but not the new GO files from both my collaborator and from EBI (specifically the unipro).  Not sure if its a file size limitation?

Thanks,
Colleen


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org


-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org