[Ontology-editors] non-critical warnings on GO file
Karen Christie
kchris at genome.stanford.edu
Tue Apr 28 14:46:30 PDT 2009
Hi GO ontology editors,
I did an edit today and noticed that there were 26 non-critical warnings.
I went through all of them to see what they were. There are a couple types
of warnings where we should probably change what the verification check
looks for (which is why I cc'd the OEWG on this), but there were a bunch
of user errors which people should have fixed before they committed.
In the process of releasing OE2, Jen did a lot of work to clean up the
hundreds of these that we used to have, so that people could actually use
the verification checks to catch problems they introduced. But if we start
collecting a whole bunch of these again, then everyone will ignore the
verification checks again and we'll be back to where we were, and
eventually someone will have to go through and clean them up again.
I think it would be best if we can keep GO "clean" of these types of
problems so that the verification checks are useful to each person as they
save, so they can use it to fix their own problems BEFORE they commit
them.
Below is what I found in going through the warnings. Maybe we can talk
about appropriate procedure to avoid accumulating these warnings, and
perhaps the OEWG can talk about whether two of the checks are picking up
things they shouldn't be.
thanks,
-Karen
1. User Errors: Almost half were simple typos, e.g. "anaphasep" instead of
"anaphase.", internal newlines within definitions, or missing final
periods from definitions, the latter often occurring in defs from EC or
MetaCyc.
It seems that people should NOT be committing the ontology with these
types of errors, they should fix them before they commit so that we con't
accumulate scads of them.
There was also one url in a definition. By comparison with the other urls
that the verificatino check flagged, it seems that perhaps this should be
in the comment, not the definition?
2. Verification Check issues:
Then, there were two other types of warnings, where it looks like maybe
the checks are picking up things that should be allowed.
Repeated word - There were four "repeated words" reported where it ignored
the fact that there was punctuation in between the two instances of the
repeated word. While two of these might be less than gramatically ideal to
use the same word twice in close succession, none of these are illegal,
and two of them there is probably no other way to phrase it. Perhaps it
should not report repeated words when there is punctuation in between.
Issue with sentence boundaries - Most of the rest of the warnings were
about periods with no whitespace after them, resulting in two warnings:
- sentences that do not start with a capital
- sentences that are not separated by whitespace.
However, none of the flagged issues were supposed to be sentences. Most
were urls in comment fields. A couple others were names or formulas that
contained periods where there was no whitespace after the period. Perhaps
we should not look for periods followed by a non-whitespace character.
More information about the Ontology-editors
mailing list