======================================================= Format specification for English SENSEVAL results files ======================================================= The results of a system's processing should be reported in a single file, with one line for each test instance in the evaluation data for which the system is returning an answer. The lines in this results file do not have to be in any particular order. Each line must include the following items, in the specified order: 1. name of evaluation file containing test instance 2. a single space 3. reference number for test instance (6-digit number beginning with "7") 4. a single space 5. list of sense tags (6-digit uid numbers beginning with "5" or "9") The sense-tag list may be either weighted or unweighted. If unweighted, it is simply a sequence of sense tags separated by spaces. If weighted, it is a sequence of tag/weight pairs, again separated by spaces, where each pair is a tag followed immediately by a slash ("/") and then a weight (expressed as an integer or a decimal real number -- exponential notation not accepted). Optionally, a line may also contain a comment, which must follow after all the above items, and must be preceded by a separator consisting of two consecutive exclamation marks ("!!"). Comments will not be used in scoring. If there is no comment, the line must end immediately after the sense-tag list. Examples of well-formed lines: bother-v 700001 501566 bother-v 700002 501566 999997 !! bother-v 700006 501566/0.5 501573/0.4 503751/0.1 bother-v 700015 503751/94 999999/87 !! comment . . . A well-formed line for reporting a system's sense annotation for a test instance with reference number R from a file named F thus may be described schematically as follows: F R { unweighted-list , weighted-list } (!! ( additional text )) where unweighted-list has the form ( )+ and weighted-list has the form ( / )+ Lines that do not conform to this specification will be disregarded. NOTES: (1) a list of sense annotations containing both weighted and unweighted uid numbers should not occur, but if one does, it will be converted to a list of unweighted annotations by treating all uid numbers in the list, regardless of whether or not they are followed by a weight, as unweighted. (2) In case the results file contains two or more lines for the same reference number from the same evaluation file, the first such line will be counted as the system's answer and the subsequent lines will be disregarded.