all FPKMs are 0 in the tmap files produced by cuffcompare

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

all FPKMs are 0 in the tmap files produced by cuffcompare

Yang Bi
Dear all:

I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq data for alternative splicing. I used "tophat for illumina" to align my sequencing data after QC/filtering. Other than setting min intron to 20, I used the default settings. Then I feed the accepted hit files to cufflink. I set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and choose yes for perform bias correction (locally cached tair10). I merged the assembled transcripts with cuffmerge and use cuffcompare to compare the resultant merged assembled transcript to the reference annotation file tair10 gff3. I choose yes for "use sequence data" and locally cached tair10 as the "reference list". I get this for the transcript accuracy analysis:

# Cuffcompare v2.1.1 | Command line was:
#cuffcompare -o cc_output -r /galaxy-repl/main/files/007/386/dataset_7386886.dat -s /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa ./input1
#

#= Summary for dataset: ./input1 :
#     Query mRNAs :   72778 in   51779 loci  (57559 multi-exon transcripts)
#            (12679 multi-transcript loci, ~1.4 transcripts per locus)
# Reference mRNAs :   42163 in   33350 loci  (30127 multi-exon)
# Corresponding super-loci:          33140
#--------------------|   Sn   |  Sp   |  fSn |  fSp  
        Base level: 100.0 62.7  -  -
        Exon level: 104.6 59.5 100.0 60.5
      Intron level: 100.0 55.5 100.0 56.5
Intron chain level: 98.3 51.5 100.0 60.3
  Transcript level: 98.7 57.2 94.8 54.9
       Locus level: 99.4 64.0 100.0 64.1

     Matching intron chains:   29618
              Matching loci:   33147

          Missed exons:       1/169820 (  0.0%)
           Novel exons:  128021/298149 ( 42.9%)
        Missed introns:       0/127896 (  0.0%)
         Novel introns:  102614/230568 ( 44.5%)
           Missed loci:       1/33350 (  0.0%)
            Novel loci:    2962/51779 (  5.7%)

 Total union super-loci across all input datasets: 51779

For the tmap file, all my FPKMs are 0:

ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
AT1G01010 AT1G01010.1 = AT1G01010 TCONS_00000001 0 0.000000 0.000000 0.000000 0.000000 1688 TCONS_00000001 1688
AT1G01040 AT1G01040.1 = AT1G01040 TCONS_00000002 0 0.000000 0.000000 0.000000 0.000000 6251 TCONS_00000002 6251
AT1G01040 AT1G01040.2 = AT1G01040 TCONS_00000003 0 0.000000 0.000000 0.000000 0.000000 5877 TCONS_00000002 5877
AT1G01046 AT1G01046.1 = AT1G01046 TCONS_00000004 0 0.000000 0.000000 0.000000 0.000000 207 TCONS_00000004 207
AT1G01073 AT1G01073.1 = AT1G01073 TCONS_00000005 0 0.000000 0.000000 0.000000 0.000000 111 TCONS_00000005 111
AT1G01110 AT1G01110.2 = AT1G01110 TCONS_00000006 0 0.000000 0.000000 0.000000 0.000000 1782 TCONS_00000006 1782
AT1G01110 AT1G01110.1 = AT1G01110 TCONS_00000007 0 0.000000 0.000000 0.000000 0.000000 1439 TCONS_00000006 1439
AT1G01115 AT1G01115.1 = AT1G01115 TCONS_00000008 0 0.000000 0.000000 0.000000 0.000000 117 TCONS_00000008 117
AT1G01160 AT1G01160.1 = AT1G01160 TCONS_00000009 0 0.000000 0.000000 0.000000 0.000000 1045 TCONS_00000010 1045
AT1G01160 AT1G01160.2 = AT1G01160 TCONS_00000010 0 0.000000 0.000000 0.000000 0.000000 1129 TCONS_00000010 1129
AT1G01180 AT1G01180.1 = AT1G01180 TCONS_00000011 0 0.000000 0.000000 0.000000 0.000000 1176 TCONS_00000011 1176
AT1G01210 AT1G01210.1 = AT1G01210 TCONS_00000012 0 0.000000 0.000000 0.000000 0.000000 616 TCONS_00000012 616
AT1G01220 AT1G01220.1 = AT1G01220 TCONS_00000013 0 0.000000 0.000000 0.000000 0.000000 3532 TCONS_00000013 3532

The FPKMs were normal in the assembled trancripts produced by cufflink.

Please enlighten me on the possible mistakes that i have made. I really appreciate your help.

Best
Yang
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: all FPKMs are 0 in the tmap files produced by cuffcompare

Jen Hillman-Jackson
Hello,

It looks like the data is mapping as novel - not linked with the reference annotation. There can be a few factors that can cause this to occur for part of a dataset (often desirable) but when it occurs for an entire dataset, there is often a data mismatch or parameter issue.

The first item I always check is that the reference genomes are a match between inputs. Do this by confirming that the identifiers in the reference GFF file are the same as those in the Tophat BAM output (convert to SAM, with headers, to see the chromosome names). For the GFF file, the tool " Join, Subtract and Group -> Group" on the first column, chromosome name, with the action "count distinct" will isolate these.

But the real problem could be in the parameters, see below:

On 1/11/14 10:43 PM, Yang Bi wrote:
Dear all:

I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq data for alternative splicing. I used "tophat for illumina" to align my sequencing data after QC/filtering. Other than setting min intron to 20, I used the default settings. Then I feed the accepted hit files to cufflink. I set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and choose yes for perform bias correction (locally cached tair10). 
My guess is that this Cufflinks run had the same issue - have you checked it? The 'Min isoform fraction' set to "0" may be problematic (I have never run Cufflinks this way). It may seem that this is a setting that is permissive - to capture even very small expression levels - but it may have had the reverse effect of not assigning any reads.

(The Tophat run with min intron at 20 is pretty low/sensitive - but with a smaller genome this probably will not cause memory issues with the mapping. Was this set based on the genome having transcripts with known, characterized introns this short? I didn't check, but you can in the reference GFF file.).

Maybe double check the above Cufflinks run, confirm the results were as expected, then try the default in Cufflinks to see how that works out ("0.1")? As a first pass test? If you want to make this more sensitive in subsequent run, you could try "0.01" - although how significant those results are, given this genome and your specific input data, would need to be evaluated.

After that, if you are still having trouble, please feel free to share a history link and we can try to help (copy and email a share link from the public server, direct to me, to keep your data private). Here is how:
https://wiki.galaxyproject.org/Support#Shared_and_Published_data

Hopefully the parameter change works, or a reference genome issue is found and corrected, but if not, I'll watch for your email,

Jen
Galaxy team

I merged the assembled transcripts with cuffmerge and use cuffcompare to compare the resultant merged assembled transcript to the reference annotation file tair10 gff3. I choose yes for "use sequence data" and locally cached tair10 as the "reference list". I get this for the transcript accuracy analysis:

# Cuffcompare v2.1.1 | Command line was:
#cuffcompare -o cc_output -r /galaxy-repl/main/files/007/386/dataset_7386886.dat -s /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa ./input1
#

#= Summary for dataset: ./input1 :
#     Query mRNAs :   72778 in   51779 loci  (57559 multi-exon transcripts)
#            (12679 multi-transcript loci, ~1.4 transcripts per locus)
# Reference mRNAs :   42163 in   33350 loci  (30127 multi-exon)
# Corresponding super-loci:          33140
#--------------------|   Sn   |  Sp   |  fSn |  fSp  
        Base level: 	100.0	 62.7	  - 	  - 
        Exon level: 	104.6	 59.5	100.0	 60.5
      Intron level: 	100.0	 55.5	100.0	 56.5
Intron chain level: 	 98.3	 51.5	100.0	 60.3
  Transcript level: 	 98.7	 57.2	 94.8	 54.9
       Locus level: 	 99.4	 64.0	100.0	 64.1

     Matching intron chains:   29618
              Matching loci:   33147

          Missed exons:       1/169820	(  0.0%)
           Novel exons:  128021/298149	( 42.9%)
        Missed introns:       0/127896	(  0.0%)
         Novel introns:  102614/230568	( 44.5%)
           Missed loci:       1/33350	(  0.0%)
            Novel loci:    2962/51779	(  5.7%)

 Total union super-loci across all input datasets: 51779

For the tmap file, all my FPKMs are 0:

ref_gene_id	ref_id	class_code	cuff_gene_id	cuff_id	FMI	FPKM	FPKM_conf_lo	FPKM_conf_hi	cov	len	major_iso_id	ref_match_len
AT1G01010	AT1G01010.1	=	AT1G01010	TCONS_00000001	0	0.000000	0.000000	0.000000	0.000000	1688	TCONS_00000001	1688
AT1G01040	AT1G01040.1	=	AT1G01040	TCONS_00000002	0	0.000000	0.000000	0.000000	0.000000	6251	TCONS_00000002	6251
AT1G01040	AT1G01040.2	=	AT1G01040	TCONS_00000003	0	0.000000	0.000000	0.000000	0.000000	5877	TCONS_00000002	5877
AT1G01046	AT1G01046.1	=	AT1G01046	TCONS_00000004	0	0.000000	0.000000	0.000000	0.000000	207	TCONS_00000004	207
AT1G01073	AT1G01073.1	=	AT1G01073	TCONS_00000005	0	0.000000	0.000000	0.000000	0.000000	111	TCONS_00000005	111
AT1G01110	AT1G01110.2	=	AT1G01110	TCONS_00000006	0	0.000000	0.000000	0.000000	0.000000	1782	TCONS_00000006	1782
AT1G01110	AT1G01110.1	=	AT1G01110	TCONS_00000007	0	0.000000	0.000000	0.000000	0.000000	1439	TCONS_00000006	1439
AT1G01115	AT1G01115.1	=	AT1G01115	TCONS_00000008	0	0.000000	0.000000	0.000000	0.000000	117	TCONS_00000008	117
AT1G01160	AT1G01160.1	=	AT1G01160	TCONS_00000009	0	0.000000	0.000000	0.000000	0.000000	1045	TCONS_00000010	1045
AT1G01160	AT1G01160.2	=	AT1G01160	TCONS_00000010	0	0.000000	0.000000	0.000000	0.000000	1129	TCONS_00000010	1129
AT1G01180	AT1G01180.1	=	AT1G01180	TCONS_00000011	0	0.000000	0.000000	0.000000	0.000000	1176	TCONS_00000011	1176
AT1G01210	AT1G01210.1	=	AT1G01210	TCONS_00000012	0	0.000000	0.000000	0.000000	0.000000	616	TCONS_00000012	616
AT1G01220	AT1G01220.1	=	AT1G01220	TCONS_00000013	0	0.000000	0.000000	0.000000	0.000000	3532	TCONS_00000013	3532

The FPKMs were normal in the assembled trancripts produced by cufflink.

Please enlighten me on the possible mistakes that i have made. I really appreciate your help.

Best
Yang 
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

-- 
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: all FPKMs are 0 in the tmap files produced by cuffcompare

Yang Bi
Hi Jen:

Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an assembled transcript file):

Seqname Source Feature Start End Score Strand Frame Attributes
chr1 Cufflinks transcript 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218"; full_read_support "yes";
chr1 Cufflinks exon 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; exon_number "1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218";
chr1 Cufflinks transcript 4536 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844"; full_read_support "no";
chr1 Cufflinks exon 4536 4605 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
chr1 Cufflinks exon 4706 5095 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "2"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
chr1 Cufflinks exon 5174 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "3"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";

I checked the chromosome names and I realized that the BAM outputs use lower cases for "RNAME", eg. "chr1" while my gff3 file uses initial capital letters for "seqId", eg "Chr1". Could this be the problem? What is the fastest way to convert the capital C in my gff3 file to lower case?

Thank you very much
Yang

----- 原始邮件 -----
发件人: "Jennifer Jackson" <[hidden email]>
收件人: "Yang Bi" <[hidden email]>, [hidden email]
发送时间: 星期一, 2014年 1 月 13日 上午 10:56:39
主题: Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare

Hello,

It looks like the data is mapping as novel - not linked with the
reference annotation. There can be a few factors that can cause this to
occur for part of a dataset (often desirable) but when it occurs for an
entire dataset, there is often a data mismatch or parameter issue.

The first item I always check is that the reference genomes are a match
between inputs. Do this by confirming that the identifiers in the
reference GFF file are the same as those in the Tophat BAM output
(convert to SAM, with headers, to see the chromosome names). For the GFF
file, the tool " Join, Subtract and Group -> Group" on the first column,
chromosome name, with the action "count distinct" will isolate these.

But the real problem could be in the parameters, see below:

On 1/11/14 10:43 PM, Yang Bi wrote:
> Dear all:
>
> I am new to Galaxy and I followed online tutorials/tips to analyze my RNA seq data for alternative splicing. I used "tophat for illumina" to align my sequencing data after QC/filtering. Other than setting min intron to 20, I used the default settings. Then I feed the accepted hit files to cufflink. I set Min isoform fraction to 0, use annotation (tair10 gff3) as guide and choose yes for perform bias correction (locally cached tair10).
My guess is that this Cufflinks run had the same issue - have you
checked it? The 'Min isoform fraction' set to "0" may be problematic (I
have never run Cufflinks this way). It may seem that this is a setting
that is permissive - to capture even very small expression levels - but
it may have had the reverse effect of not assigning any reads.

(The Tophat run with min intron at 20 is pretty low/sensitive - but with
a smaller genome this probably will not cause memory issues with the
mapping. Was this set based on the genome having transcripts with known,
characterized introns this short? I didn't check, but you can in the
reference GFF file.).

Maybe double check the above Cufflinks run, confirm the results were as
expected, then try the default in Cufflinks to see how that works out
("0.1")? As a first pass test? If you want to make this more sensitive
in subsequent run, you could try "0.01" - although how significant those
results are, given this genome and your specific input data, would need
to be evaluated.

After that, if you are still having trouble, please feel free to share a
history link and we can try to help (copy and email a share link from
the public server, direct to me, to keep your data private). Here is how:
https://wiki.galaxyproject.org/Support#Shared_and_Published_data

Hopefully the parameter change works, or a reference genome issue is
found and corrected, but if not, I'll watch for your email,

Jen
Galaxy team

> I merged the assembled transcripts with cuffmerge and use cuffcompare to compare the resultant merged assembled transcript to the reference annotation file tair10 gff3. I choose yes for "use sequence data" and locally cached tair10 as the "reference list". I get this for the transcript accuracy analysis:
>
> # Cuffcompare v2.1.1 | Command line was:
> #cuffcompare -o cc_output -r /galaxy-repl/main/files/007/386/dataset_7386886.dat -s /galaxy/data/Arabidopsis_thaliana_TAIR10/sam_index/Arabidopsis_thaliana_TAIR10.fa ./input1
> #
>
> #= Summary for dataset: ./input1 :
> #     Query mRNAs :   72778 in   51779 loci  (57559 multi-exon transcripts)
> #            (12679 multi-transcript loci, ~1.4 transcripts per locus)
> # Reference mRNAs :   42163 in   33350 loci  (30127 multi-exon)
> # Corresponding super-loci:          33140
> #--------------------|   Sn   |  Sp   |  fSn |  fSp
>          Base level: 100.0 62.7  -  -
>          Exon level: 104.6 59.5 100.0 60.5
>        Intron level: 100.0 55.5 100.0 56.5
> Intron chain level: 98.3 51.5 100.0 60.3
>    Transcript level: 98.7 57.2 94.8 54.9
>         Locus level: 99.4 64.0 100.0 64.1
>
>       Matching intron chains:   29618
>                Matching loci:   33147
>
>            Missed exons:       1/169820 (  0.0%)
>             Novel exons:  128021/298149 ( 42.9%)
>          Missed introns:       0/127896 (  0.0%)
>           Novel introns:  102614/230568 ( 44.5%)
>             Missed loci:       1/33350 (  0.0%)
>              Novel loci:    2962/51779 (  5.7%)
>
>   Total union super-loci across all input datasets: 51779
>
> For the tmap file, all my FPKMs are 0:
>
> ref_gene_id ref_id class_code cuff_gene_id cuff_id FMI FPKM FPKM_conf_lo FPKM_conf_hi cov len major_iso_id ref_match_len
> AT1G01010 AT1G01010.1 = AT1G01010 TCONS_00000001 0 0.000000 0.000000 0.000000 0.000000 1688 TCONS_00000001 1688
> AT1G01040 AT1G01040.1 = AT1G01040 TCONS_00000002 0 0.000000 0.000000 0.000000 0.000000 6251 TCONS_00000002 6251
> AT1G01040 AT1G01040.2 = AT1G01040 TCONS_00000003 0 0.000000 0.000000 0.000000 0.000000 5877 TCONS_00000002 5877
> AT1G01046 AT1G01046.1 = AT1G01046 TCONS_00000004 0 0.000000 0.000000 0.000000 0.000000 207 TCONS_00000004 207
> AT1G01073 AT1G01073.1 = AT1G01073 TCONS_00000005 0 0.000000 0.000000 0.000000 0.000000 111 TCONS_00000005 111
> AT1G01110 AT1G01110.2 = AT1G01110 TCONS_00000006 0 0.000000 0.000000 0.000000 0.000000 1782 TCONS_00000006 1782
> AT1G01110 AT1G01110.1 = AT1G01110 TCONS_00000007 0 0.000000 0.000000 0.000000 0.000000 1439 TCONS_00000006 1439
> AT1G01115 AT1G01115.1 = AT1G01115 TCONS_00000008 0 0.000000 0.000000 0.000000 0.000000 117 TCONS_00000008 117
> AT1G01160 AT1G01160.1 = AT1G01160 TCONS_00000009 0 0.000000 0.000000 0.000000 0.000000 1045 TCONS_00000010 1045
> AT1G01160 AT1G01160.2 = AT1G01160 TCONS_00000010 0 0.000000 0.000000 0.000000 0.000000 1129 TCONS_00000010 1129
> AT1G01180 AT1G01180.1 = AT1G01180 TCONS_00000011 0 0.000000 0.000000 0.000000 0.000000 1176 TCONS_00000011 1176
> AT1G01210 AT1G01210.1 = AT1G01210 TCONS_00000012 0 0.000000 0.000000 0.000000 0.000000 616 TCONS_00000012 616
> AT1G01220 AT1G01220.1 = AT1G01220 TCONS_00000013 0 0.000000 0.000000 0.000000 0.000000 3532 TCONS_00000013 3532
>
> The FPKMs were normal in the assembled trancripts produced by cufflink.
>
> Please enlighten me on the possible mistakes that i have made. I really appreciate your help.
>
> Best
> Yang
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>
>    http://galaxyproject.org/search/mailinglists/

--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: all FPKMs are 0 in the tmap files produced by cuffcompare

Jen Hillman-Jackson
Hello Yang,

Glad the problem was isolated - the mismatched chromosomes is definitely
something to be fixed.

The tools in 'Text Manipulation" can help. The tool "Change Case of
selected columns" can change the case for you. Click on the pencil icon
after running the tool to reassign the datatype correctly as needed.

Take care,

Jen
Galaxy team

On 1/13/14 6:31 PM, Yang Bi wrote:

> Hi Jen:
>
> Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an assembled transcript file):
>
> Seqname Source Feature Start End Score Strand Frame Attributes
> chr1 Cufflinks transcript 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218"; full_read_support "yes";
> chr1 Cufflinks exon 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; exon_number "1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218";
> chr1 Cufflinks transcript 4536 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844"; full_read_support "no";
> chr1 Cufflinks exon 4536 4605 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
> chr1 Cufflinks exon 4706 5095 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "2"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
> chr1 Cufflinks exon 5174 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "3"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
>
> I checked the chromosome names and I realized that the BAM outputs use lower cases for "RNAME", eg. "chr1" while my gff3 file uses initial capital letters for "seqId", eg "Chr1". Could this be the problem? What is the fastest way to convert the capital C in my gff3 file to lower case?
>
> Thank you very much
> Yang

--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: all FPKMs are 0 in the tmap files produced by cuffcompare

Yang Bi
Hi Jen:

I still have a little problem with the chromosome names. It appears that the mitochondria genes and chloroplast genes are named "ChrC" and "ChrM" in the gff3 file which I need to change to "chrC" and "chrM". How do I change cases specifically for the initial letters and not the entire words?

Thanks
Yang

----- 原始邮件 -----
发件人: "Jennifer Jackson" <[hidden email]>
收件人: "Yang Bi" <[hidden email]>
抄送: [hidden email]
发送时间: 星期一, 2014年 1 月 13日 下午 6:54:53
主题: Re: [galaxy-user] all FPKMs are 0 in the tmap files produced by cuffcompare

Hello Yang,

Glad the problem was isolated - the mismatched chromosomes is definitely
something to be fixed.

The tools in 'Text Manipulation" can help. The tool "Change Case of
selected columns" can change the case for you. Click on the pencil icon
after running the tool to reassign the datatype correctly as needed.

Take care,

Jen
Galaxy team

On 1/13/14 6:31 PM, Yang Bi wrote:

> Hi Jen:
>
> Thank you for the prompt reply. RPKMs produced by cufflink look normal (from an assembled transcript file):
>
> Seqname Source Feature Start End Score Strand Frame Attributes
> chr1 Cufflinks transcript 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218"; full_read_support "yes";
> chr1 Cufflinks exon 11960 13178 1000 . . gene_id "CUFF.180"; transcript_id "CUFF.180.1"; exon_number "1"; FPKM "6.5441928094"; frac "1.000000"; conf_lo "3.594986"; conf_hi "8.987465"; cov "2.413218";
> chr1 Cufflinks transcript 4536 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844"; full_read_support "no";
> chr1 Cufflinks exon 4536 4605 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "1"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
> chr1 Cufflinks exon 4706 5095 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "2"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
> chr1 Cufflinks exon 5174 5314 1000 + . gene_id "CUFF.178"; transcript_id "CUFF.178.1"; exon_number "3"; FPKM "11.0556332840"; frac "1.000000"; conf_lo "3.645830"; conf_hi "13.216134"; cov "4.076844";
>
> I checked the chromosome names and I realized that the BAM outputs use lower cases for "RNAME", eg. "chr1" while my gff3 file uses initial capital letters for "seqId", eg "Chr1". Could this be the problem? What is the fastest way to convert the capital C in my gff3 file to lower case?
>
> Thank you very much
> Yang

--
Jennifer Hillman-Jackson
http://galaxyproject.org


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Reply | Threaded
Open this post in threaded view
|

Re: all FPKMs are 0 in the tmap files produced by cuffcompare

Jen Hillman-Jackson
Hi Yang,

I am going to give you a method to do this - in short you'll be
splitting the dataset into three parts, altering two of them, then
merging the three final results datasets together. A workflow could be
extracted from the history once you have completed this method, saved
for future use.

1 - Use 'Filter and Sort -> Select'

   The default string would match all of the lines in your dataset.
Alter it to create three files:

    Use "Matching" for all

   All chroms, minus ChrM and ChrC
   ^chr([0-9])+

   ChrM
   ^ChrM

   ChrC
   ^ChrC

2. For the datasets ChrM and ChrC, use 'Text Manipulation -> Add column'
on each file individually. This column should be in the final desired
form, e.g. "chrM" or "chrC"

3. For both results, use "'Text Manipulation -> Cut" to replace column
"1" with the new column.

4. Use the tool "Concatenate datasets" to combine the three files again,
using the new results.

5. Reassign the metadata as needed using the pencil icon as needed.

These tool all work on datatype "tabular" and generally on other text
data, but assign a dataset to "tabular" format using the pencil icon if
it is not recognized by a tool. This is fine until the last step where
you can set it back to GFF.



On 1/14/14 11:17 AM, Yang Bi wrote:
> Hi Jen:
>
> I still have a little problem with the chromosome names. It appears that the mitochondria genes and chloroplast genes are named "ChrC" and "ChrM" in the gff3 file which I need to change to "chrC" and "chrM". How do I change cases specifically for the initial letters and not the entire words?
>
> Thanks
> Yang
>

--
Jennifer Hillman-Jackson
http://galaxyproject.org

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/
Jennifer Hillman-Jackson
http://galaxyproject.org