how to transfer gene id into protein id

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

how to transfer gene id into protein id

Li, Jilong (MU-Student)
Hi,

I have some refseq gene id, like NM_*****.

How can I transfer these gene id into protein id, like NP_****?

Thank you very much!

Victor
Reply | Threaded
Open this post in threaded view
|

Re: how to transfer gene id into protein id

Jen Hillman-Jackson
Hello,

If the reference genome is in UCSC and has a RefSeq track, then you can
extract a file with the transcript and protein identifiers from the
Table Browser called "refLink" and subset it for rows in your query
RefSeq transcript identifiers.

If the RefSeq data is at BioMart or another source, a similar path to
the one I outline below will work with some modifications, it all
depends on the file format, but Galaxy's tools can manipulate data is
just about every way you will need.

Using a transcript identifier query, subset protein identifiers in a
UCSC RefSeq track:

A.
Load your list of NM* identifiers ("Get Data -> Upload).
- set the file format to "tabular" (use "pencil" icon to "Edit
Attributes -> Change data type") if needed.

B.
Load RefSeq id mapping data with "Get Data -> UCSC Main" and set the
form parameters as needed, choosing the track "RefSeq Genes" and the
table "refLink". Make sure the region is the entire genome. Send to
Galaxy formatted as-is (tabular).

B.
Next, cut columns 3 and 4 out of the table with tool "Text Manipulation
->Cut" and the options "c3,c4".

C. OPTIONAL, if you want the full list of coding RefSeqs for another
purpose... remove the non-coding RefSeqs with the tool "Filter and Sort
-> Select" and the options "that: NOT Matching" and "the pattern:
^NR_.*$". Be sure to enter the regular expression '^NR_.*$' without any
quotes.

D. Perform a join using "Join, Subtract and Group -> Compare two
Datasets" with the options>:
     - "Compare: <file of trans and prot id, filtered or not>"
     - "Using column: c1" where c1 is the trans ids
     - "against: <file of trans ids>"
     - "and column: c1" where c1 is the trans ids
     - "To find: Matching rows of first dataset"

E.
Result dataset is a two column tabular file:
    transcript id <tab> protein id


Hopefully this helps you and others who are doing a similar task. If you
think you will be doing this a lot, be sure to consider extracting the
steps into a workflow.

Thanks for using Galaxy,

Jen
Galaxy team



On 10/27/11 1:34 PM, Li, Jilong (MU-Student) wrote:

> Hi,
>
> I have some refseq gene id, like NM_*****.
>
> How can I transfer these gene id into protein id, like NP_****?
>
> Thank you very much!
>
> Victor
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support

Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: how to transfer gene id into protein id

Li, Jilong (MU-Student)
Hi,

I have some refseq gene id, like NM_***** and NR_******.

I know how to transfer NM_****** into protein ID NP_*****. But, how to transfer NR_***** into protein id, like NP_****? I do not know. Could you please tell me?

Thank you very much!

Victor




________________________________________
From: Jennifer Jackson [[hidden email]]
Sent: Thursday, October 27, 2011 11:23 PM
To: Li, Jilong (MU-Student)
Cc: [hidden email]
Subject: Re: [galaxy-user] how to transfer gene id into protein id

Hello,

If the reference genome is in UCSC and has a RefSeq track, then you can
extract a file with the transcript and protein identifiers from the
Table Browser called "refLink" and subset it for rows in your query
RefSeq transcript identifiers.

If the RefSeq data is at BioMart or another source, a similar path to
the one I outline below will work with some modifications, it all
depends on the file format, but Galaxy's tools can manipulate data is
just about every way you will need.

Using a transcript identifier query, subset protein identifiers in a
UCSC RefSeq track:

A.
Load your list of NM* identifiers ("Get Data -> Upload).
- set the file format to "tabular" (use "pencil" icon to "Edit
Attributes -> Change data type") if needed.

B.
Load RefSeq id mapping data with "Get Data -> UCSC Main" and set the
form parameters as needed, choosing the track "RefSeq Genes" and the
table "refLink". Make sure the region is the entire genome. Send to
Galaxy formatted as-is (tabular).

B.
Next, cut columns 3 and 4 out of the table with tool "Text Manipulation
->Cut" and the options "c3,c4".

C. OPTIONAL, if you want the full list of coding RefSeqs for another
purpose... remove the non-coding RefSeqs with the tool "Filter and Sort
-> Select" and the options "that: NOT Matching" and "the pattern:
^NR_.*$". Be sure to enter the regular expression '^NR_.*$' without any
quotes.

D. Perform a join using "Join, Subtract and Group -> Compare two
Datasets" with the options>:
     - "Compare: <file of trans and prot id, filtered or not>"
     - "Using column: c1" where c1 is the trans ids
     - "against: <file of trans ids>"
     - "and column: c1" where c1 is the trans ids
     - "To find: Matching rows of first dataset"

E.
Result dataset is a two column tabular file:
    transcript id <tab> protein id


Hopefully this helps you and others who are doing a similar task. If you
think you will be doing this a lot, be sure to consider extracting the
steps into a workflow.

Thanks for using Galaxy,

Jen
Galaxy team



On 10/27/11 1:34 PM, Li, Jilong (MU-Student) wrote:

> Hi,
>
> I have some refseq gene id, like NM_*****.
>
> How can I transfer these gene id into protein id, like NP_****?
>
> Thank you very much!
>
> Victor
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support




Reply | Threaded
Open this post in threaded view
|

Re: how to transfer gene id into protein id

Jen Hillman-Jackson
Hello Victor,

RefSeq sequences designated by a transcript identifier formatted as NR_*
are non-coding (meaning: transcribed, but not translated), therefore
there is no protein product and no linked protein sequence NP_* identifier.

This documentation from NCBI covers RefSeq naming conventions:
http://www.ncbi.nlm.nih.gov/RefSeq/key.html

Hopefully this is helpful,

Best,

Jen
Galaxy team


On 10/28/11 3:24 PM, Li, Jilong (MU-Student) wrote:

> Hi,
>
> I have some refseq gene id, like NM_***** and NR_******.
>
> I know how to transfer NM_****** into protein ID NP_*****. But, how to transfer NR_***** into protein id, like NP_****? I do not know. Could you please tell me?
>
> Thank you very much!
>
> Victor
>
>
>
>
> ________________________________________
> From: Jennifer Jackson [[hidden email]]
> Sent: Thursday, October 27, 2011 11:23 PM
> To: Li, Jilong (MU-Student)
> Cc: [hidden email]
> Subject: Re: [galaxy-user] how to transfer gene id into protein id
>
> Hello,
>
> If the reference genome is in UCSC and has a RefSeq track, then you can
> extract a file with the transcript and protein identifiers from the
> Table Browser called "refLink" and subset it for rows in your query
> RefSeq transcript identifiers.
>
> If the RefSeq data is at BioMart or another source, a similar path to
> the one I outline below will work with some modifications, it all
> depends on the file format, but Galaxy's tools can manipulate data is
> just about every way you will need.
>
> Using a transcript identifier query, subset protein identifiers in a
> UCSC RefSeq track:
>
> A.
> Load your list of NM* identifiers ("Get Data ->  Upload).
> - set the file format to "tabular" (use "pencil" icon to "Edit
> Attributes ->  Change data type") if needed.
>
> B.
> Load RefSeq id mapping data with "Get Data ->  UCSC Main" and set the
> form parameters as needed, choosing the track "RefSeq Genes" and the
> table "refLink". Make sure the region is the entire genome. Send to
> Galaxy formatted as-is (tabular).
>
> B.
> Next, cut columns 3 and 4 out of the table with tool "Text Manipulation
> ->Cut" and the options "c3,c4".
>
> C. OPTIONAL, if you want the full list of coding RefSeqs for another
> purpose... remove the non-coding RefSeqs with the tool "Filter and Sort
> ->  Select" and the options "that: NOT Matching" and "the pattern:
> ^NR_.*$". Be sure to enter the regular expression '^NR_.*$' without any
> quotes.
>
> D. Perform a join using "Join, Subtract and Group ->  Compare two
> Datasets" with the options>:
>       - "Compare:<file of trans and prot id, filtered or not>"
>       - "Using column: c1" where c1 is the trans ids
>       - "against:<file of trans ids>"
>       - "and column: c1" where c1 is the trans ids
>       - "To find: Matching rows of first dataset"
>
> E.
> Result dataset is a two column tabular file:
>      transcript id<tab>  protein id
>
>
> Hopefully this helps you and others who are doing a similar task. If you
> think you will be doing this a lot, be sure to consider extracting the
> steps into a workflow.
>
> Thanks for using Galaxy,
>
> Jen
> Galaxy team
>
>
>
> On 10/27/11 1:34 PM, Li, Jilong (MU-Student) wrote:
>> Hi,
>>
>> I have some refseq gene id, like NM_*****.
>>
>> How can I transfer these gene id into protein id, like NP_****?
>>
>> Thank you very much!
>>
>> Victor
>>
>>
>> ___________________________________________________________
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org.  Please keep all replies on the list by
>> using "reply all" in your mail client.  For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>>     http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>>     http://lists.bx.psu.edu/
>
> --
> Jennifer Jackson
> http://usegalaxy.org
> http://galaxyproject.org/wiki/Support
>
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://usegalaxy.org
http://galaxyproject.org/wiki/Support

Jennifer Hillman-Jackson
http://galaxyproject.org
Reply | Threaded
Open this post in threaded view
|

Re: how to transfer gene id into protein id

Hans-Rudolf Hotz
In reply to this post by Li, Jilong (MU-Student)
Hi Victor

It is not really a Galaxy related answer....but you might wanna study
the following webpage explaining the RefSeq Accession Format:

http://www.ncbi.nlm.nih.gov/RefSeq/key.html#accession


Strictly speaking, there is no such thing as a "refseq gene id", since
RefSeq entries describe individual molecules. There is a new subset of
RefSeq called 'RefSeqGene", see: http://www.ncbi.nlm.nih.gov/refseq/rsg/
  but I don't think this is what you are after.


Hence, you can crosslink 'mRNA' (ie: NM_*****) to proteins (ie:
NP_*****) and Jen gave you an excellent recipe how to do that in Galaxy.

However, you cannot crosslink 'RNA' (ie: NR_*****, which are "non-coding
transcripts including structural RNAs, transcribed pseudogenes, and
others.") to proteins!


I hope this clarifies the confusion.

Regards, Hans





On 10/29/2011 12:24 AM, Li, Jilong (MU-Student) wrote:

> Hi,
>
> I have some refseq gene id, like NM_***** and NR_******.
>
> I know how to transfer NM_****** into protein ID NP_*****. But, how to transfer NR_***** into protein id, like NP_****? I do not know. Could you please tell me?
>
> Thank you very much!
>
> Victor
>
>
>
>
> ________________________________________
> From: Jennifer Jackson [[hidden email]]
> Sent: Thursday, October 27, 2011 11:23 PM
> To: Li, Jilong (MU-Student)
> Cc: [hidden email]
> Subject: Re: [galaxy-user] how to transfer gene id into protein id
>
> Hello,
>
> If the reference genome is in UCSC and has a RefSeq track, then you can
> extract a file with the transcript and protein identifiers from the
> Table Browser called "refLink" and subset it for rows in your query
> RefSeq transcript identifiers.
>
> If the RefSeq data is at BioMart or another source, a similar path to
> the one I outline below will work with some modifications, it all
> depends on the file format, but Galaxy's tools can manipulate data is
> just about every way you will need.
>
> Using a transcript identifier query, subset protein identifiers in a
> UCSC RefSeq track:
>
> A.
> Load your list of NM* identifiers ("Get Data ->  Upload).
> - set the file format to "tabular" (use "pencil" icon to "Edit
> Attributes ->  Change data type") if needed.
>
> B.
> Load RefSeq id mapping data with "Get Data ->  UCSC Main" and set the
> form parameters as needed, choosing the track "RefSeq Genes" and the
> table "refLink". Make sure the region is the entire genome. Send to
> Galaxy formatted as-is (tabular).
>
> B.
> Next, cut columns 3 and 4 out of the table with tool "Text Manipulation
> ->Cut" and the options "c3,c4".
>
> C. OPTIONAL, if you want the full list of coding RefSeqs for another
> purpose... remove the non-coding RefSeqs with the tool "Filter and Sort
> ->  Select" and the options "that: NOT Matching" and "the pattern:
> ^NR_.*$". Be sure to enter the regular expression '^NR_.*$' without any
> quotes.
>
> D. Perform a join using "Join, Subtract and Group ->  Compare two
> Datasets" with the options>:
>       - "Compare:<file of trans and prot id, filtered or not>"
>       - "Using column: c1" where c1 is the trans ids
>       - "against:<file of trans ids>"
>       - "and column: c1" where c1 is the trans ids
>       - "To find: Matching rows of first dataset"
>
> E.
> Result dataset is a two column tabular file:
>      transcript id<tab>  protein id
>
>
> Hopefully this helps you and others who are doing a similar task. If you
> think you will be doing this a lot, be sure to consider extracting the
> steps into a workflow.
>
> Thanks for using Galaxy,
>
> Jen
> Galaxy team
>
>
>
> On 10/27/11 1:34 PM, Li, Jilong (MU-Student) wrote:
>> Hi,
>>
>> I have some refseq gene id, like NM_*****.
>>
>> How can I transfer these gene id into protein id, like NP_****?
>>
>> Thank you very much!
>>
>> Victor
>>
>>
>> ___________________________________________________________
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org.  Please keep all replies on the list by
>> using "reply all" in your mail client.  For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>>     http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>>     http://lists.bx.psu.edu/
>
> --
> Jennifer Jackson
> http://usegalaxy.org
> http://galaxyproject.org/wiki/Support
>
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/