| Title: | Generalizability Theory for Information Retrieval Evaluation | 
| Version: | 2.0 | 
| Description: | Provides tools to measure the reliability of an Information Retrieval test collection. It allows users to estimate reliability using Generalizability Theory and map those estimates onto well-known indicators such as Kendall tau correlation or sensitivity. | 
| Depends: | R (≥ 3.2) | 
| License: | MIT + file LICENSE | 
| BugReports: | https://github.com/julian-urbano/gt4ireval/issues | 
| URL: | https://github.com/julian-urbano/gt4ireval/ | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Suggests: | testthat, knitr, rmarkdown | 
| RoxygenNote: | 6.0.1 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2017-03-06 01:20:03 UTC; caerolus | 
| Author: | Julián Urbano [aut, cre] | 
| Maintainer: | Julián Urbano <urbano.julian@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2017-03-06 08:29:02 | 
TREC-3 Ad hoc track.
Description
This is the set of Average Precision scores of the 40 systems submitted to the TREC-3 Ad hoc track, evaluated over 50 topics.
Usage
adhoc3
Format
A data frame with 40 columns (systems) and 50 rows (queries).
References
D. Harman (1994). Overview of the Third Text REtrieval Conference (TREC-3). Text REtrieval Conference.
See Also
D-study (Decision)
Description
dstudy runs a D-study from the results of a gstudy and computes, for a
certain number of queries, the expected generalizability coefficient Erho2 and index of
dependability Phi, possibly with confidence intervals. Alternatively, it can estimate the
number of queries needed to achieve a certain level of stability, also with confidence intervals.
Usage
dstudy(gdata, queries = gdata$n.q, stability = 0.95, alpha = 0.025)
Arguments
| gdata | The result of running a  | 
| queries | A vector with different query set sizes for which to estimate Erho2 and Phi.
Defaults to the number of queries used to compute  | 
| stability | A vector with target Erho2 and Phi values to estimate required query set sizes. | 
| alpha | A vector of confidence levels to compute intervals for Erho2, Phi and query set
sizes. This is the probability on each side of the interval, so for a 90% confidence interval
one must set  | 
Value
An object of class dstudy, with the following components:
| Erho2,Erho2.lwr,Erho2.upr | Expected generalizability coefficient, and lower and upper limits of the intervals around it. | 
| Phi,Phi.lwr,Phi.upr | Expected index of dependability, and lower and upper limits of the intervals around it. | 
| n.q_Erho2,n.q_Erho2.lwr,n.q_Erho2.upr | Expected number of queries to achieve the generalizability coefficient, and lower and upper limits of the intervals around it. | 
| n.q_Phi,n.q_Phi.lwr,n.q_Phi.upr | Expected number of queries to achieve the index of dependability, and lower and upper limits of the intervals around it. | 
| call | A list with the gstudyused in this D-study, the target number ofqueries, target level ofstabilityandalphalevel for the confidence
  intervals. | 
Author(s)
Julián Urbano
References
R.L. Brennan (2001). Generalizability Theory. Springer.
L.S. Feldt (1965). The Approximate Sampling Distribution of Kuder-Richardson Reliability Coefficient Twenty. Psychometrika, 30(3):357–370.
C. Arteaga, S. Jeyaratnam, and G. A. Franklin (1982). Confidence Intervals for Proportions of Total Variance in the Two-Way Cross Component of Variance Model. Communications in Statistics: Theory and Methods, 11(15):1643–1658.
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
dstudy(g)
# estimate stability at various query set sizes
dstudy(g, queries = seq(50, 200, 10))
# estimate required query set sizes for various stability levels
dstudy(g, stability = seq(0.8, 0.95, 0.01))
# compute both 95% and 99% confidence intervals
dstudy(g, stability = 0.9, alpha = c(0.05, 0.01) / 2)
# compute 1-tailed 95% confidence intervals
dstudy(g, alpha = 0.05)
G-study (Generalizability)
Description
gstudy runs a G-study with the given data, assuming a fully crossed design (all systems
evaluated on the same queries). It can be used to estimate variance components, which can further
be used to run a D-study with dstudy.
Usage
gstudy(data, drop = 0)
Arguments
| data | A data frame or matrix with the existing effectiveness scores. Systems are columns and queries are rows. | 
| drop | The fraction of worst-performing systems to drop from the data before analysis. Defaults to 0 (include all systems). | 
Value
An object of class gstudy, with the following components:
| n.s,n.q | Number of systems and number of queries of the existing data. | 
| var.s,var.q,var.e | Variance of the system, query, and residual effects. | 
| em.s,em.q,em.e | Mean squares of the system, query and residual components. | 
| call | A list with the existing dataand the percentage of systems todrop. | 
Author(s)
Julián Urbano
References
R.L. Brennan (2001). Generalizability Theory. Springer.
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
# same, but drop the 20% worst systems
g20 <- gstudy(adhoc3, drop = 0.2)
Map GT-based Indicators onto Data-based Indicators
Description
Maps Erho2 and Phi scores from Generalizability Theory onto traditional data-based scores like the Kendall tau correlation, AP correlation, power, minor conflict rate and major conflict rate with 2-tailed t-tests, absolute and relative sensitivity, and rooted mean squared error.
Usage
gt2tau(Erho2)
gt2tauAP(Erho2)
gt2power(Erho2)
gt2minor(Erho2)
gt2major(Erho2)
gt2asens(Erho2)
gt2rsens(Phi)
gt2rmse(Phi)
Arguments
| Erho2 | Vector of generalizability coefficients to map from. | 
| Phi | Vector of indices of dependability to map from. | 
Details
Take these mappings with a grain of salt. See figure 3 in (Urbano, 20013).
Value
A vector of data-based indicator values.
Author(s)
Julián Urbano
References
J. Urbano, M. Marrero and D. Martín (2013). On the Measurement of Test Collection Reliability. ACM SIGIR, pp. 393-402.
See Also
Examples
g <- gstudy(adhoc3)
d <- dstudy(g)
gt2tau(d$Erho2)
gt2rmse(d$Phi)
Synthetic dataset no. 4.
Description
This is the Synthetic dataset no. 4 from Table 3.2 on page 73 of Brennan (2001), recasted as a p x i design, as required on page 182.
Usage
synthetic4
Format
A data frame with 10 columns (systems) and 12 rows (queries).
References
R.L. Brennan, "Generalizability Theory". Springer, 2001.