Regarding a cuffdiff output

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding a cuffdiff output

Yona Kim
Dear galaxy users 

Hello. I have a quick question about Cuffdiff analysis. 
I have obtained two SRA files and converted them to fastq files which were uploaded to Galaxy via FTP server. My analysis was followed by Fastq groomer, Tophat, Cufflinks, Cuffcompare, and eventually Cuffdiff. (Gene annotation was also downloaded from UCSC table browser in GTF format) I've downloaded gene differential expression testing, one of the output files of Cuffdiff, and viewed it in excel sheet. However, I have only zeros recorded for value_1, value_2, log2, test_stat and only ones recorded for p_value and q_value. 

Is it likely that I might have obtained wrong gene annotation file and caused this problem? 

Thank you 

Yona Kim
Department of Genetics 
Rutgers University - New Brunswick Campus 

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: Regarding a cuffdiff output

Jen Hillman-Jackson
Hi Yona,

Yes, the GTF file is most likely the problem due to it lacking certain attributes that Cuffdiff requires to perform these calculations. You will also want to double check that the reference genome and GTF file (where you source it next) are an exact match - both the genome build and the identifier format. If either are not a match, you will not get the expected or full results that Cuffdiff can produce.

This wiki has some help;
http://wiki.galaxyproject.org/Support#Interpreting_scientific_results
See "Tools on the Main server: Example → RNA-seq analysis tools."

The links to the Cufflinks web site explains the attributes that Cuffdiff is looking for, links to the iGenomes datasets available (best to use if your genome is represented), and a pointer to the tool's user group. Two iGenomes GTF files are also already available in Galaxy (hg19, mm9) in "Shared Data -> Data Libraries -> iGenomes". The link to our tutorial and FAQ has help about how the GTF files are used along with troubleshooting advice.

Best,

Jen
Galaxy team

On 4/3/13 8:28 AM, Yona Kim wrote:
Dear galaxy users 

Hello. I have a quick question about Cuffdiff analysis. 
I have obtained two SRA files and converted them to fastq files which were uploaded to Galaxy via FTP server. My analysis was followed by Fastq groomer, Tophat, Cufflinks, Cuffcompare, and eventually Cuffdiff. (Gene annotation was also downloaded from UCSC table browser in GTF format) I've downloaded gene differential expression testing, one of the output files of Cuffdiff, and viewed it in excel sheet. However, I have only zeros recorded for value_1, value_2, log2, test_stat and only ones recorded for p_value and q_value. 

Is it likely that I might have obtained wrong gene annotation file and caused this problem? 

Thank you 

Yona Kim
Department of Genetics 
Rutgers University - New Brunswick Campus 


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: Regarding a cuffdiff output

Jen Hillman-Jackson
Hi Yona Kim,

On 4/18/13 9:54 PM, Yona Kim wrote:
> Dear Jennifer
>
> Thank you very much for your help for my analysis.
> I'm still stuck on getting the final data from Cuffdiff analysis.
> As you have mentioned, I've obtained the correct GTF file (mm9 gene
> annotation), and also made sure that the reference genome and GTF file
> are an exact match -  they both are mm9.

Are you using the iGenomes version of the GTF file? With the attributes
Cuffdiff requires for generating all of the additional statistics? It
appears that this is the case, but I just wanted to double check. If not
using it, you can find a copy to load and use on the public server (if
this is where you are working) in Shared Data -> Data Libraries ->
iGenomes. Otherwise, it can be found at the Cufflinks web site.

These are the two attributes that are important to have, when available:
http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff_input

>
> When I view the output of transcript differential expression testing
> (one of the outputs of cuffdiff) in excel, the names of the genes seem
> to be properly annotated according to their location on chromosome, but
> I have no values recorded for any of the calculations (I'm attaching
> this file just in case you want to take a look at it).
>
The results you are getting indicate the data coverage is sparse, which
aligns with your thoughts about this mapping not being as successful as
prior runs:

NOTEST and LOWDATA are explained here with advice about parameter tuning:
    http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server
follow links to Cufflinks FAQ to find:
    http://cufflinks.cbcb.umd.edu/faq.html#notest

> Do you think that the problem might have been originated from fastq
> files itself?
>
> And also I was wondering about the reduce in the size of the files.
> Comparing with one of my other analysis in galaxy, I realized that the
> size of the file was significantly reduced from 6.1GB (fastq groomer) to
> 1006.9KB (Tophat accepted hits), whereas in my other analysis, the size
> was reduced from 5.9GB (fastq groomer) only to 1.6 GB(Tophat accepted
> hits).
>
> Do you think there might have been an error occurred when Tophat was
> running on the groomed data, and thus, providing an erroneous data to
> Cufflinks, and eventually to Cuffdiff?

This could be the source of the problem. Making sure that the data was
groomed correctly would be a good place the start. The comments from the
first run will note the detected input type (but there can be some
overlap), so also use the tool "FastQC" to help determine the proper
settings for "FASTQ Groomer". And if necessary, re-run from this step to
see if that improves the mapping.

http://wiki.galaxyproject.org/Support#Dataset_special_cases
See the second bullet under "FASTQ"

If your query data is short (less than around 40 bases), then tuning
Tophat could also improve mapping, see the tool's web page for advice
regarding mapping shorter sequences. Then test out a few different
parameter options to see what produces the best results for your
particular datasets/samples. There is a balance between being too
sensitive and too stringent - and this is a judgement call in most cases.

Trimming the reads may help if quality is an issue ("FastQC" will also
give information about this). The RNA-seq example tutorial has an
example of how to do basic QC:
https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seq-analysis-exercise


Hopefully this helps to give some new options to test out that improve
the result!

Jen
Galaxy team

>
> Thank you very very much for your time and help
>
> Sincerely yours,
>
> Yona Kim
> Department of Genetics
> Rutgers University
>
>
>
>
> On Mon, Apr 8, 2013 at 4:54 PM, Jennifer Jackson <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Yona,
>
>     Yes, the GTF file is most likely the problem due to it lacking
>     certain attributes that Cuffdiff requires to perform these
>     calculations. You will also want to double check that the reference
>     genome and GTF file (where you source it next) are an exact match -
>     both the genome build and the identifier format. If either are not a
>     match, you will not get the expected or full results that Cuffdiff
>     can produce.
>
>     This wiki has some help;
>     http://wiki.galaxyproject.org/Support#Interpreting_scientific_results
>     See "Tools on the Main server: Example → RNA-seq analysis tools."
>
>     The links to the Cufflinks web site explains the attributes that
>     Cuffdiff is looking for, links to the iGenomes datasets available
>     (best to use if your genome is represented), and a pointer to the
>     tool's user group. Two iGenomes GTF files are also already available
>     in Galaxy (hg19, mm9) in "Shared Data -> Data Libraries ->
>     iGenomes". The link to our tutorial and FAQ has help about how the
>     GTF files are used along with troubleshooting advice.
>
>     Best,
>
>     Jen
>     Galaxy team
>
>
>     On 4/3/13 8:28 AM, Yona Kim wrote:
>>     Dear galaxy users
>>
>>     Hello. I have a quick question about Cuffdiff analysis.
>>     I have obtained two SRA files and converted them to fastq files
>>     which were uploaded to Galaxy via FTP server. My analysis was
>>     followed by Fastq groomer, Tophat, Cufflinks, Cuffcompare, and
>>     eventually Cuffdiff. (Gene annotation was also downloaded from
>>     UCSC table browser in GTF format) I've downloaded gene
>>     differential expression testing, one of the output files of
>>     Cuffdiff, and viewed it in excel sheet. However, I have only zeros
>>     recorded for value_1, value_2, log2, test_stat and only ones
>>     recorded for p_value and q_value.
>>
>>     Is it likely that I might have obtained wrong gene annotation file
>>     and caused this problem?
>>
>>     Thank you
>>
>>     Yona Kim
>>     Department of Genetics
>>     Rutgers University - New Brunswick Campus


--
Jennifer Hillman-Jackson
Galaxy Support and Training
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org