neleval evaluate

Evaluate system output

Usage summary

$ neleval evaluate --help
usage: neleval evaluate [-h] -g GOLD [-f {json,none,tab}] [-m NAME] [-b FIELD]
                        [--by-doc] [--by-type] [--overall]
                        [--type-weights FILE]
                        FILE

Evaluate system output

positional arguments:
  FILE

optional arguments:
  -h, --help            show this help message and exit
  -g GOLD, --gold GOLD
  -f {json,none,tab}, --fmt {json,none,tab}
  -m NAME, --measure NAME
                        Which measures to use: specify a name (or group name)
                        from the list-measures command. This flag may be
                        repeated.
  -b FIELD, --group-by FIELD
                        Report results per field-value, and micro/macro-
                        averaged over these, Multiple --group-by may be used.
                        E.g. -b docid -b type. NB: micro-average may not equal
                        overall score.
  --by-doc              Alias for -b docid
  --by-type             Alias for -b type
  --overall             With --group-by, report only overall, not per-group
                        results
  --type-weights FILE   File mapping gold and sys types to a weight, such as
                        produced by weights-for-hierarchy

Evaluating each document separately

TODO