fastq groomer

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

fastq groomer

arabidopsis
Hi all,

Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I asked at the sequencing facility about their machine and output and they said their format was Illumina 1.8+ (the newest). I tried to convert my fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input option and got all reads with quality of around 10... Does it mean that Galaxy cannot be used on a dataset with 1.8+ encoding or something else was wrong?

Thanks,

Slon
Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Peter Cock
On Tue, Oct 18, 2011 at 9:02 AM, arabidopsis <[hidden email]> wrote:

> Hi all,
>
> Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I
> asked at the sequencing facility about their machine and output and they
> said their format was Illumina 1.8+ (the newest). I tried to convert my
> fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input
> option and got all reads with quality of around 10... Does it mean that
> Galaxy cannot be used on a dataset with 1.8+ encoding or something
> else was wrong?
>
> Thanks,
>
> Slon

Illumina 1.8+ is already using the Sanger FASTQ encoding, so you
don't need to convert it with the groomer.

I think the Galaxy team might still recommend it as it doubles as
a sanity test for corrupt FASTQ files.

Peter

Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

arabidopsis
If Illumina 1.8+ is already using the Sanger FASTQ encoding, the file should be recognized by downstream applications, like Quality statistics computer, quality filter etc. However, my file is not visible by those programs and when I click on it, only "uploaded fastq file" is displayed, without encoding details.

S.

On Tue, Oct 18, 2011 at 10:12 AM, Peter Cock <[hidden email]> wrote:
On Tue, Oct 18, 2011 at 9:02 AM, arabidopsis <[hidden email]> wrote:
> Hi all,
>
> Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I
> asked at the sequencing facility about their machine and output and they
> said their format was Illumina 1.8+ (the newest). I tried to convert my
> fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input
> option and got all reads with quality of around 10... Does it mean that
> Galaxy cannot be used on a dataset with 1.8+ encoding or something
> else was wrong?
>
> Thanks,
>
> Slon

Illumina 1.8+ is already using the Sanger FASTQ encoding, so you
don't need to convert it with the groomer.

I think the Galaxy team might still recommend it as it doubles as
a sanity test for corrupt FASTQ files.

Peter

Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Peter Cock
On Tue, Oct 18, 2011 at 9:21 AM, arabidopsis <[hidden email]> wrote:
> If Illumina 1.8+ is already using the Sanger FASTQ encoding, the file should
> be recognized by downstream applications, like Quality statistics computer,
> quality filter etc. However, my file is not visible by those programs and
> when I click on it, only "uploaded fastq file" is displayed, without
> encoding details.
>
> S.

Have you told Galaxy it is fastqsanger? My guess is the upload tool
has defaulted to the generic fastq. Look with the "pencil" icon to
edit the attributes of the uploaded FASTQ file in your Galaxy history.

Peter

Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Kevin-2
actually Illumina 1.8+ has one more quality value higher than fastqsanger (see http://en.wikipedia.org/wiki/FASTQ_format )

my question now I guess is if I use fastqsanger would it break anything when it encounters the 'J' in the qual values?

On Tue, Oct 18, 2011 at 5:10 PM, Peter Cock <[hidden email]> wrote:
On Tue, Oct 18, 2011 at 9:21 AM, arabidopsis <[hidden email]> wrote:
> If Illumina 1.8+ is already using the Sanger FASTQ encoding, the file should
> be recognized by downstream applications, like Quality statistics computer,
> quality filter etc. However, my file is not visible by those programs and
> when I click on it, only "uploaded fastq file" is displayed, without
> encoding details.
>
> S.

Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Peter Cock
On Tue, Nov 1, 2011 at 4:58 PM, Kevin Lam <[hidden email]> wrote:
> actually Illumina 1.8+ has one more quality value higher than fastqsanger
> (see http://en.wikipedia.org/wiki/FASTQ_format )
>
> my question now I guess is if I use fastqsanger would it break anything when
> it encounters the 'J' in the qual values?

The Sanger FASTQ format has always allowed J (PHRED 41), the
issue is some tools might treat that as an error as it is unusually
high for a raw read. For instance, you need at least FASTX v0.0.13
to cope with this - older versions didn't like it.
http://seqanswers.com/forums/showthread.php?p=49667

Peter

Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Jen Hillman-Jackson
In reply to this post by arabidopsis
Hello Slon,

In case you are still having issues, the best use case for Illumina 1.8+
data is to run the FASTQ Groomer tool with the option "Sanger". As Peter
noted, this assigns the expected datatype plus verifies content before
investing time in downstream analysis.

Please let us know if more help is needed,

Best,

Jen
Galaxy team

On 10/18/11 1:02 AM, arabidopsis wrote:

> Hi all,
>
> Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I
> asked at the sequencing facility about their machine and output and they
> said their format was Illumina 1.8+ (the newest). I tried to convert my
> fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input
> option and got all reads with quality of around 10... Does it mean that
> Galaxy cannot be used on a dataset with 1.8+ encoding or something else
> was wrong?
>
> Thanks,
>
> Slon
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support

Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Richard Mark White
Hi,
  So, I am getting a fastq groomer error on some illumina data, with the following error.  any ideas?

There was an error reading your input file. Your input file is likely malformed.
It is suggested that you double-check your original input file for errors -- helpful information for this purpose has been provided below.
However, if you think that you have encountered an actual error with this tool, please do tell us by using the bug reporting mechanism.
 
The reported error is: 'Invalid fastq header: lab/solexa_public/Zon/111021_WICMT-SOLEXA_64KF7AAXX/QualityScore/s_3_1_sequence.txt

rich




From: Jennifer Jackson <[hidden email]>
To: arabidopsis <[hidden email]>
Cc: [hidden email]
Sent: Wednesday, November 2, 2011 9:19 AM
Subject: Re: [galaxy-user] fastq groomer

Hello Slon,

In case you are still having issues, the best use case for Illumina 1.8+
data is to run the FASTQ Groomer tool with the option "Sanger". As Peter
noted, this assigns the expected datatype plus verifies content before
investing time in downstream analysis.

Please let us know if more help is needed,

Best,

Jen
Galaxy team

On 10/18/11 1:02 AM, arabidopsis wrote:

> Hi all,
>
> Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I
> asked at the sequencing facility about their machine and output and they
> said their format was Illumina 1.8+ (the newest). I tried to convert my
> fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input
> option and got all reads with quality of around 10... Does it mean that
> Galaxy cannot be used on a dataset with 1.8+ encoding or something else
> was wrong?
>
> Thanks,
>
> Slon
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Reply | Threaded
Open this post in threaded view
|

Re: fastq groomer

Bob Harris
Howdy, Rich,

My interpretation of the error report is that the fastq file you are trying to groom contains the indicated text (lab/solexa_public/Zon/111021_WICMT-SOLEXA_64KF7AAXX/QualityScore/s_3_1_sequence.txt) on a line where it expects a valid fastq header.  I believe a valid header line would begin with an at sign ("@").  So perhaps somewhere along the way, your fastq file's contents were replaced by a filename.

Bob H


On Nov 2, 2011, at 10:09 AM, Richard Mark White wrote:

Hi,
  So, I am getting a fastq groomer error on some illumina data, with the following error.  any ideas?

There was an error reading your input file. Your input file is likely malformed.
It is suggested that you double-check your original input file for errors -- helpful information for this purpose has been provided below.
However, if you think that you have encountered an actual error with this tool, please do tell us by using the bug reporting mechanism.
 
The reported error is: 'Invalid fastq header: lab/solexa_public/Zon/111021_WICMT-SOLEXA_64KF7AAXX/QualityScore/s_3_1_sequence.txt

rich




From: Jennifer Jackson <[hidden email]>
To: arabidopsis <[hidden email]>
Cc: [hidden email]
Sent: Wednesday, November 2, 2011 9:19 AM
Subject: Re: [galaxy-user] fastq groomer

Hello Slon,

In case you are still having issues, the best use case for Illumina 1.8+
data is to run the FASTQ Groomer tool with the option "Sanger". As Peter
noted, this assigns the expected datatype plus verifies content before
investing time in downstream analysis.

Please let us know if more help is needed,

Best,

Jen
Galaxy team

On 10/18/11 1:02 AM, arabidopsis wrote:

> Hi all,
>
> Fastq groomer has Solexa or Illumina 1.3+ as an input quality format. I
> asked at the sequencing facility about their machine and output and they
> said their format was Illumina 1.8+ (the newest). I tried to convert my
> fastq file into Sanger by fastq groomer, using Illumina 1.3+ as an input
> option and got all reads with quality of around 10... Does it mean that
> Galaxy cannot be used on a dataset with 1.8+ encoding or something else
> was wrong?
>
> Thanks,
>
> Slon
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/