[go] partitioning gene association files

Chris Mungall cjm at fruitfly.org
Wed Jan 30 10:14:25 PST 2008


I agree with Val

but we will never all agree

We need a smarter way of downloading association files

I have marked this high priority on the web presence tools (ie amigo)  
tracker:

http://sourceforge.net/tracker/index.php? 
func=detail&aid=1882857&group_id=36855&atid=494390

On Jan 30, 2008, at 9:59 AM, Valerie Wood wrote:

> I'm still not clear of the reason for splitting out the IEA  
> evidence codes. If this is to shield people from non experimental  
> data whay aren't we also splitting RCA and ISS? By splitting out  
> the IEA codes, if it isn't to distinguish from purely experimental,  
> we are implying these are a lower quality. This is often assumed,  
> but there is really no evidence that this is the case.
>
> I would say, after assessing thousands of annotations manually that  
> the IEA evidence code are now as accurate as  manually curated data  
> (based on the fact that the number of annotation errors I report to  
> SGD is roughly similar to  those reported to Uniprot and Interpro).  
> IEA annotations are more conservative, and the majority are  
> redundant with manual annotations. The remainder are useful  
> additions to fill in annotation gaps.
>
> Would it be better to make a more informative header (which needed  
> to be removed from the file), stating explicity, which data  is  
> included? I can see a case for splitting out NOT annotations, but  
> if the users are 'advanced users' and they need to split out the  
> IEA data for a specific purpose I presume they can split out the  
> IEA data themselves.
>
> If people think that there really are problems with the IEA data,  
> shouldn't we address these issues?
> Perhaps people could assess their IEA data and report any remaining  
> problems which would improve the mappings for everyone.
>
> Val
>
>
> Doug Howe <dhowe at cs.uoregon.edu> wrote:
>> Hi Jen,
>>    Points all well taken!
>> -Doug
>>
>> On Wed, 30 Jan 2008, Jennifer Deegan (nee Clark) wrote:
>>
>>> Hi,
>>>
>>> Doug howe wrote:
>>>
>>>> If even advanced users who work with GA files can't get past the
>>>> distinction between IEA and experimental codes, I have to wonder  
>>>> if they
>>>> are serving any purpose worth their hassle?  By splitting the  
>>>> file we
>>>> are just shielding users from the complexity of the evidence  
>>>> codes and
>>>> allowing them to continue to not understand them.
>>>
>>>
>>> Just for context, this question came up because a group of  
>>> advanced users at
>>> a meeting last year specifically asked us to split out electronic  
>>> annotations
>>> and NOT annotations into separate files.
>>>
>>> I think that whilst there are advanced users, and we would like  
>>> to think that
>>> they understand our system in all its details, we have to accept  
>>> that users
>>> have a lot to do, and many tight deadlines. If we know that there  
>>> are
>>> difficult things in the files (such as NOT annotations) that  
>>> might trip
>>> people up, then it makes sense that there should be an extra step in
>>> downloading these just so that they notice that they are getting  
>>> something
>>> slightly different.
>>>
>>> The users often ask me for separation of electronic annotations,  
>>> because they
>>> do not wholely trust these annotations. We may not agree with that
>>> assessment, but this lack of trust is common amongst the users,  
>>> and having
>>> the separate files just allows them to play it safe. In some  
>>> cases it gets
>>> them past a barrier to using the GO at all. It also encourages  
>>> groups to
>>> start manual annotation, which has to be a good thing.
>>>
>>> Jen
>>>
>>
>>
>




More information about the Go mailing list