|
|
 | | From: | Stefek Borkowski | | Subject: | Comparing protein sequences. | | Date: | Wed, 1 Dec 2004 20:43:31 +0100 |
|
|
 | Hi, I have a problem. I would compare 2 known sequences of ca. 130 amino acid residues, namely human epidermal fatty acid-binding protein vs. human ileal fatty acid-binding protein. I have tried WWW BLAST at http://www.ncbi.nlm.nih.gov/blast/ - chosing "Protein-protein BLAST (blastp)", but unfortunately cannot find my match in the report of the program. I suspect that the homology may be too little so BLAST skips this pair of proteins in the report. How do I do it then? Maybe I should change something in the settings of BLAST interface? Is there a kind of software (maybe working offline) which accepts two sequences as input and simply compares the two of them? Thank you for any help you can offer. Kind regards from Poland, Stefek
|
|
 | | From: | Scott Coutts | | Subject: | Re: Comparing protein sequences. | | Date: | Thu, 02 Dec 2004 09:48:03 +1100 |
|
|
 | Stefek Borkowski wrote:
> Hi, > I have a problem. I would compare 2 known sequences of ca. 130 amino > acid residues, namely human epidermal fatty acid-binding protein vs. > human ileal fatty acid-binding protein. I have tried WWW BLAST at > http://www.ncbi.nlm.nih.gov/blast/ - chosing "Protein-protein BLAST > (blastp)", but unfortunately cannot find my match in the report of the > program. I suspect that the homology may be too little so BLAST skips > this pair of proteins in the report. How do I do it then? Maybe I should > change something in the settings of BLAST interface? Is there a kind of > software (maybe working offline) which accepts two sequences as input > and simply compares the two of them? > Thank you for any help you can offer. > Kind regards from Poland, > Stefek
You'd be better off finding both sequences (you can do this by simply using a keyword search) and then doing an alignment of the two sequences. You can do this on the web, using one of the 'clustal' programs, or you can download a stand-alone version of clustal and view your alignment using another downloadable program called 'genedoc'. I dont have the web addresses on hand at the moment, but you can easily find them with a google search.
Good luck!
Scott.
|
|
 | | From: | Stefek Borkowski | | Subject: | Re: Comparing protein sequences. | | Date: | Wed, 1 Dec 2004 23:56:19 +0100 |
|
|
 | Scott Coutts wrote: > > You'd be better off finding both sequences (you can do this by simply > using a keyword search) and then doing an alignment of the two > sequences. You can do this on the web, using one of the 'clustal' > programs, or you can download a stand-alone version of clustal and > view your alignment using another downloadable program called > 'genedoc'. I dont have the web addresses on hand at the moment, but > you can easily find them with a google search. > Thanks Scott for your quick answer. I just figured it out that I can use the WWW module of BLAST called "BLAST 2 Sequences". Although I still have an interpretation problem. Would you care to comment on the below, please.
I would like to know what the BLAST interpretation really is in the case of comparing two sequences by the BLAST 2 Sequences online modul. The report goes as follows: Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%) I would say that the homology of the two proteins is equal to the value of "Identities", so it would be 28%. What about the "Positives" then? I happend somewhere in the literature on estimation of the homology between the 2 proteins, stating that it is equal to 36%. This seems to be more or less the average arithmetic mean of "Identities" and "Positives", namely (28 + 46)/2 is 37% which seems close to the literature value. Is my way of thinking correct or not necessarily. In other words, whot is the recommended algorithm of estimationg the homology of two sequences, on the basis of BLAST report. Thanks for all your help. Kind regards, Stefek
|
|
 | | From: | Scott Coutts | | Subject: | Re: Comparing protein sequences. | | Date: | Thu, 02 Dec 2004 10:46:28 +1100 |
|
|
 | Stefek Borkowski wrote:
> Scott Coutts wrote: > >> >> You'd be better off finding both sequences (you can do this by simply >> using a keyword search) and then doing an alignment of the two >> sequences. You can do this on the web, using one of the 'clustal' >> programs, or you can download a stand-alone version of clustal and >> view your alignment using another downloadable program called >> 'genedoc'. I dont have the web addresses on hand at the moment, but >> you can easily find them with a google search. >> > Thanks Scott for your quick answer. I just figured it out that I can use > the WWW module of BLAST called "BLAST 2 Sequences". Although I still > have an interpretation problem. Would you care to comment on the below, > please. > > I would like to know what the BLAST interpretation really is in the case > of comparing two sequences by the BLAST 2 Sequences online modul. The > report goes as follows: > Identities = 32/114 (28%), Positives = 53/114 (46%), Gaps = 1/114 (0%) > I would say that the homology of the two proteins is equal to the value
Firstly, a technical point here... when your talking about genes, you should say 'similarity' rather than 'homology'. Either a gene is a homolog of another, or it's not.
http://homepage.usask.ca/~ctl271/857/def_homolog.shtml http://www.biomedcentral.com/news/20040309/01
But anyway...
> > of "Identities", so it would be 28%. What about the "Positives" then? I > happend somewhere in the literature on estimation of the homology > between the 2 proteins, stating that it is equal to 36%. This seems to > be more or less the average arithmetic mean of "Identities" and > "Positives", namely (28 + 46)/2 is 37% which seems close to the > literature value. Is my way of thinking correct or not necessarily. In > other words, whot is the recommended algorithm of estimationg the > homology of two sequences, on the basis of BLAST report.
I'm not sure what figure they were quoting, but if it is properly quoted in the literature as a percentage, then it should include a statement of whether it is identities or similarities (positives), the region over which the count was obtained (if it's not mentioned the usually it's the whole protein).
You should read the documentation that comes with BLAST to understand how it works. The 'identities' is indicating the number of amino acids that are exactly the same, and the 'positives' is indicating the number that are similar (i.e. maintain similar properties, for example, both hydrophobic etc). You should also consider the E value that you're given.
Scott.
|
|
 | | From: | Stefek Borkowski | | Subject: | Re: Comparing protein sequences. | | Date: | Thu, 2 Dec 2004 12:14:08 +0100 |
|
|
 | Scott Coutts wrote: >> ... > Firstly, a technical point here... when your talking about genes, you > should say 'similarity' rather than 'homology'. Either a gene is a > homolog of another, or it's not. > > http://homepage.usask.ca/~ctl271/857/def_homolog.shtml > http://www.biomedcentral.com/news/20040309/01 > > But anyway... > >> >> of "Identities", so it would be 28%. What about the "Positives" >> then? I happend somewhere in the literature on estimation of the >> homology between the 2 proteins, stating that it is equal to 36%. >> This seems to be more or less the average arithmetic mean of >> "Identities" and "Positives", namely (28 + 46)/2 is 37% which seems >> close to the literature value. Is my way of thinking correct or not >> necessarily. In other words, whot is the recommended algorithm of >> estimationg the homology of two sequences, on the basis of BLAST >> report. > > I'm not sure what figure they were quoting, but if it is properly > quoted in the literature as a percentage, then it should include a > statement of whether it is identities or similarities (positives), > the region over which the count was obtained (if it's not mentioned > the usually it's the whole protein). > > You should read the documentation that comes with BLAST to understand > how it works. The 'identities' is indicating the number of amino acids > that are exactly the same, and the 'positives' is indicating the > number that are similar (i.e. maintain similar properties, for > example, both hydrophobic etc). You should also consider the E value > that you're given.
Thank you so much Scott. Your explanation helped me a lot! I visited the links you'd given me and already benefited from understanding this "E" value calculated by BLAST. Though not everything is clear for me still, I have made a big step forward. Thanks again. May you have a nice day :) Special regards from Stefek
|
|
|