[go] partitioning gene association files
Chris Mungall
cjm at fruitfly.org
Wed Jan 30 10:14:25 PST 2008
I agree with Val
but we will never all agree
We need a smarter way of downloading association files
I have marked this high priority on the web presence tools (ie amigo)
tracker:
http://sourceforge.net/tracker/index.php?
func=detail&aid=1882857&group_id=36855&atid=494390
On Jan 30, 2008, at 9:59 AM, Valerie Wood wrote:
> I'm still not clear of the reason for splitting out the IEA
> evidence codes. If this is to shield people from non experimental
> data whay aren't we also splitting RCA and ISS? By splitting out
> the IEA codes, if it isn't to distinguish from purely experimental,
> we are implying these are a lower quality. This is often assumed,
> but there is really no evidence that this is the case.
>
> I would say, after assessing thousands of annotations manually that
> the IEA evidence code are now as accurate as manually curated data
> (based on the fact that the number of annotation errors I report to
> SGD is roughly similar to those reported to Uniprot and Interpro).
> IEA annotations are more conservative, and the majority are
> redundant with manual annotations. The remainder are useful
> additions to fill in annotation gaps.
>
> Would it be better to make a more informative header (which needed
> to be removed from the file), stating explicity, which data is
> included? I can see a case for splitting out NOT annotations, but
> if the users are 'advanced users' and they need to split out the
> IEA data for a specific purpose I presume they can split out the
> IEA data themselves.
>
> If people think that there really are problems with the IEA data,
> shouldn't we address these issues?
> Perhaps people could assess their IEA data and report any remaining
> problems which would improve the mappings for everyone.
>
> Val
>
>
> Doug Howe <dhowe at cs.uoregon.edu> wrote:
>> Hi Jen,
>> Points all well taken!
>> -Doug
>>
>> On Wed, 30 Jan 2008, Jennifer Deegan (nee Clark) wrote:
>>
>>> Hi,
>>>
>>> Doug howe wrote:
>>>
>>>> If even advanced users who work with GA files can't get past the
>>>> distinction between IEA and experimental codes, I have to wonder
>>>> if they
>>>> are serving any purpose worth their hassle? By splitting the
>>>> file we
>>>> are just shielding users from the complexity of the evidence
>>>> codes and
>>>> allowing them to continue to not understand them.
>>>
>>>
>>> Just for context, this question came up because a group of
>>> advanced users at
>>> a meeting last year specifically asked us to split out electronic
>>> annotations
>>> and NOT annotations into separate files.
>>>
>>> I think that whilst there are advanced users, and we would like
>>> to think that
>>> they understand our system in all its details, we have to accept
>>> that users
>>> have a lot to do, and many tight deadlines. If we know that there
>>> are
>>> difficult things in the files (such as NOT annotations) that
>>> might trip
>>> people up, then it makes sense that there should be an extra step in
>>> downloading these just so that they notice that they are getting
>>> something
>>> slightly different.
>>>
>>> The users often ask me for separation of electronic annotations,
>>> because they
>>> do not wholely trust these annotations. We may not agree with that
>>> assessment, but this lack of trust is common amongst the users,
>>> and having
>>> the separate files just allows them to play it safe. In some
>>> cases it gets
>>> them past a barrier to using the GO at all. It also encourages
>>> groups to
>>> start manual annotation, which has to be a good thing.
>>>
>>> Jen
>>>
>>
>>
>
More information about the Go
mailing list