Metagenome Analysis

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Metagenome Analysis

Mike Dyall-Smith
I have looked through the metagenome tools and looked at the tutorials, and was wondering how one could pull out reads that contain specific protein domains or COGS. Blastx is not possible (?) but  megablast could get GI codes, and these could potentially be used to retrieve CDD information. I just can't see the way to do this on galaxy. Any suggestions would be greatly appreciated.

Mike DS

Sent from my iPhone4
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Metagenome Analysis

Jen Hillman-Jackson
Hi Mike,

To use BLASTX directly, a wrapper is available in the Tool Shed for use
with a local or cloud instance of Galaxy. Please see:
http://toolshed.g2.bx.psu.edu
http://getgalaxy.org
http://usegalaxy.org

Another option is to map against the target genome, then compare
coordinates of those hits with the coordinates of known annotation that
represents CCDS or alternate protein tracks of interest. UCSC, Biomart,
and other sources under "Get Data" can be used to import BED/Interval
data directly into Galaxy. Compare coordinates using tools in the group
"Operate on Genomic Intervals". There are other tools that compare
coordinates (Bed Tools, etc.) but these are a good place to start.

A several of our tutorials have examples of how to compare coordinates,
including "Galaxy 101" and protocols 1 & 4 of "Using Galaxy". The tool's
themselves also have help directly on the tool forms.
https://main.g2.bx.psu.edu/u/aun1/p/galaxy101
https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012

If you used an Ensembl annotation track, then tools in the group "Genome
Diversity -> KEGG and GO" might be of interest to you. The UCSC "Known
Genes" track also has some extra tables (http://genome.ucsc.edu) that
you may find interesting to pull in and consider, if you decided to use
that as the annotation track to compare against. Most (if not all) of
this data can linked together either through coordinates or identifiers,
but it is not available for all genomes, you will have to check at the
data sources.

For predictive domain analysis using conserved genomic data, the tools
in "Fetch Alignments" function with MAF inputs. A bed file of hits can
be used to query out data from multiple species, obtain sequence, etc.
for downstream analysis. Protocol 5 in the "Using Galaxy" paper above
has a walk-through of how this can be done. If the public Main server
does not have the MAF data for your genome, and it is small, it is
possible to use one from the history. If it is larger, using a local or
cloud Galaxy would be recommended.

Be sure to check the Tool Shed if there is a specific tool that you are
looking for. If it is not there now, you could ask if someone has it or
if it is the process of being wrapped (on the development list:
[hidden email]). And keep checking back, more tools are added all
the time.

Best,

Jen
Galaxy team

On 4/15/13 3:23 PM, Mike Dyall-Smith wrote:

> I have looked through the metagenome tools and looked at the tutorials, and was wondering how one could pull out reads that contain specific protein domains or COGS. Blastx is not possible (?) but  megablast could get GI codes, and these could potentially be used to retrieve CDD information. I just can't see the way to do this on galaxy. Any suggestions would be greatly appreciated.
>
> Mike DS
>
> Sent from my iPhone4
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>
>    http://galaxyproject.org/search/mailinglists/

--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org