Re: Finding constitutive exons using expression data (7plusorminus 3)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Finding constitutive exons using expression data (7plusorminus 3)

Sébastien Vigneau
Hi 7plusorminus 3,

One possibility is to use the "group" tool with "max" operation, to get the highest expressed exon for each gene. Then, you may use "subtract datasets" to remove the highest expressed exons from the original dataset, and iterate to get the second highest expressed exons (which are now the highest expressed exons). "Group" may also help you getting the exons with more proximal or distal start position (whether it is 5' or 3' depends on the orientation of the gene).

Alternatively, if you know how to use R, you can use the function "by" (here is a good explanation: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/).

Sébastien

----------------------------------------

Message: 1
Date: Sun, 9 Feb 2014 16:43:14 -0500
From: 7plusorminus 3 <[hidden email]>
To: [hidden email]
Subject: [galaxy-user] Finding constitutive exons using expression
        data
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hi, I'm trying to find over the entire human genome, for each gene, which
exons are the most constitutively expressed. To do this, I'd like to
combine expression data (RNA-seq or Microarray) and exons data (UCSC
track). Then, for each gene, I'd like to pick the 1 or 2 exons with the
highest levels of expression (my proxy for constitutiveness).

An additional nicety would be to somehow work in a preference for 5' exons.
For example, let's say a gene has 3 exons and, with the expression data,
all 3 exons are equally expressed. I'd like to selectively get the first 2
exons.

I've started learning Galaxy and was able to import BED files for UCSC
exons (as in the Galaxy 101 tutorial) and a BED file for Affy microarray
expression data. (I tried also importing the Burge RNA-seq track as BED but
couldn't get it to work). I did an inner join on genomic sequences to join
the expression data with the exons and sorted them from most expressed to
least. But how do I sort within genes? That is, how do I get the top 2
exons per gene (highest expressing exons per gene) and, if there are more
than 2 with equally high expression, how do I preferentially get the 5`
exons?

I'm also open to ways to do this without using Galaxy, etc. I want to do
this for an entire genome, so I figured it would be good to have a Galaxy
workflow, which I could then apply to other genomes as needed.

Thanks for any help

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/