The work reports some initial success in extending the Rutgers Paradigm of IR evaluation to the realm of concrete measurement, not in information retrieval per se, but in the arguably more complex domain of Question Answering. Crucial to the paradigm are two components: cross evaluation, and an analytical model that controls for the potential problems of cross evaluation. We describe the experimental design and analytical models. In the models, interaction effects are examined and found not to be important. After eliminating the interaction effects, we are able to extract meaningful and useful results from a very small study involving just three analysts, five topics, and two “systems”.