Hi, I'm trying to find over the entire human genome, for each
gene, which exons are the most constitutively expressed. To do this, I'd
like to combine expression data (RNA-seq or Microarray) and exons data
(UCSC track). Then, for each gene, I'd like to pick the 1 or 2 exons
with the highest levels of expression (my proxy for constitutiveness).
An additional nicety would be to somehow work in a preference for 5'
exons. For example, let's say a gene has 3 exons and, with the
expression data, all 3 exons are equally expressed. I'd like to
selectively get the first 2 exons.
I've started learning Galaxy and was able to import BED files for
UCSC exons (as in the Galaxy 101 tutorial) and a BED file for Affy
microarray expression data. (I tried also importing the Burge RNA-seq
track as BED but couldn't get it to work). I did an inner join on
genomic sequences to join the expression data with the exons and sorted
them from most expressed to least. But how do I sort within genes? That
is, how do I get the top 2 exons per gene (highest expressing exons per
gene) and, if there are more than 2 with equally high expression, how do
I preferentially get the 5` exons?
I'm also open to ways to do this without using Galaxy, etc. I want
to do this for an entire genome, so I figured it would be good to have a
Galaxy workflow, which I could then apply to other genomes as needed.
Thanks for any help
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org. Please keep all replies on the list by
using "reply all" in your mail client. For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list: