[Ontology-editors] non-critical warnings on GO file

Karen Christie kchris at genome.stanford.edu
Tue Apr 28 14:46:30 PDT 2009


Hi GO ontology editors,

I did an edit today and noticed that there were 26 non-critical warnings. 
I went through all of them to see what they were. There are a couple types 
of warnings where we should probably change what the verification check 
looks for (which is why I cc'd the OEWG on this), but there were a bunch 
of user errors which people should have fixed before they committed.

In the process of releasing OE2, Jen did a lot of work to clean up the 
hundreds of these that we used to have, so that people could actually use 
the verification checks to catch problems they introduced. But if we start 
collecting a whole bunch of these again, then everyone will ignore the 
verification checks again and we'll be back to where we were, and 
eventually someone will have to go through and clean them up again.

I think it would be best if we can keep GO "clean" of these types of 
problems so that the verification checks are useful to each person as they 
save, so they can use it to fix their own problems BEFORE they commit 
them.

Below is what I found in going through the warnings. Maybe we can talk 
about appropriate procedure to avoid accumulating these warnings, and 
perhaps the OEWG can talk about whether two of the checks are picking up 
things they shouldn't be.

thanks,

-Karen


1. User Errors: Almost half were simple typos, e.g. "anaphasep" instead of 
"anaphase.", internal newlines within definitions, or missing final 
periods from definitions, the latter often occurring in defs from EC or 
MetaCyc.

It seems that people should NOT be committing the ontology with these 
types of errors, they should fix them before they commit so that we con't 
accumulate scads of them.

There was also one url in a definition. By comparison with the other urls 
that the verificatino check flagged, it seems that perhaps this should be 
in the comment, not the definition?


2. Verification Check issues:

Then, there were two other types of warnings, where it looks like maybe 
the checks are picking up things that should be allowed.

Repeated word - There were four "repeated words" reported where it ignored 
the fact that there was punctuation in between the two instances of the 
repeated word. While two of these might be less than gramatically ideal to 
use the same word twice in close succession, none of these are illegal, 
and two of them there is probably no other way to phrase it. Perhaps it 
should not report repeated words when there is punctuation in between.

Issue with sentence boundaries - Most of the rest of the warnings were 
about periods with no whitespace after them, resulting in two warnings:
- sentences that do not start with a capital
- sentences that are not separated by whitespace.

However, none of the flagged issues were supposed to be sentences. Most 
were urls in comment fields. A couple others were names or formulas that 
contained periods where there was no whitespace after the period. Perhaps 
we should not look for periods followed by a non-whitespace character.



More information about the Ontology-editors mailing list