########################################### fbsthw: FBST for Hardy-Weinberg Equilibrium Test General Description: Given k alleles A_1, A_2, ... A_k in a locus, and their genotype counts in lower triangular form x = (x_{ij}), 1<=j<=i<=k, fbstwh computes the evidence for the Hardy-Weinberg Equilibrium) HWE. The parameter of interest, denoted by theta_{ij}, is the population relative frequencies of genotypes A_{i}A_{j} (1<=j<=i<=k). Assuming a multinomial model for x and a Dirichlet(1,1,...,1) priori for theta, the posterior p.d.f is f(theta | x) = Dirichlet( x_{11}+1, x_{12}+1, ..., x_{1k}+1 ) Parameter space: THETA = {theta = (theta_{ij}), 1<=j<=i<=k | 0<=theta_{ij}<=1, sum(theta)=1} Parameter space under HWE: H = {theta in THETA | there exists a vector p=(p_1, p_2, ..., p_k), 0<=p<=1, sum(p)=1, such that: theta_{ij} = (p_i)^2 if i=j; theta_{ij} = 2*p_i*p_j if i<>j} The evidence measure against the hypothesis is computed in two steps: 1) compute f_0 = max_H f(theta|x) 2) compute the integral evb = int_T f(theta|x) dtheta / int_THETA f(theta|x) dtheta where T = {theta in THETA | f(theta | x) > f_0} The evidence measure suporting HWE is: evid = 1 - evb References: 1. M.S.Lauretto, F.Nakano, S.R.Faria Jr, C.A.B.Pereira, J.M.Stern. A straightforward multiallelic significance test for the Hardy-Weinberg equilibrium law. Genetics and Molecular Biology 32(3): 619-625, 2009. 2. C.A.B.Pereira, J.M.Stern. Evidence and credibility: full Bayesian significance test for precise hypotheses. Entropy Journal 1: 69-80, 1999. Usage in R: Currently, we recommend the use of R script fbsthw.r, which calls fbsthw.exe and computes the hypothesis significance (p-value) based on the evidence. Usage instructions are found in the script source code. Some examples are given in examples.r. Usage in MS-DOS: From MS-DOS Prompt, call: fbsthw.exe The input file must contain the following data (separated by tabs, spaces or CR/LF): Number_of_alleles x(1,1) x(2,1) x(2,2) x(3,1) x(3,2) x(3,3) ... x(k,1) x(K,2) ... x(k,k) (see below) prec nmcmin nmcmax Parameter details: x: is a vector of genotype counts in a lower triangular form: x = x_11, x_21, x_22, x_31, x_32, x_33, ... x_k1, x_k2, .... x_kk prec: parameter for 99% confidence interval for the evidence measure: CI99% = evid+-prec (default=5e-3) nmcmin: minimum number of iterations in MC integration (default=3000*k) nmcmax: maximum number of iterations in MC integration (default=30000*k) Lay-out of output file: 1st line: evidence in favor of the HWE hypothesis (see references) 2nd line: maximum value of posteriori pdf in the parameter space 3rd line: maximum posteriori under HWE hypothesis 4th line: optimal point in the parameter space 5th line: optimal point under HWE hypothesis Example: The following example is taken from Louis and Dempster (1987) - see reference at Lauretto et al (2009). Genotype counts: 0, 3, 1, 5, 18, 1, 3, 7, 5, 2 File drvinp.txt: 4 0 3 1 5 18 1 3 7 5 2 0.005 50000 300000 Call: fbsthw.exe drvinp.txt drvout.txt The result will be writen in the file drvout.txt, partially transcribed below: 3.428000e-002 -8.158314e+001 -9.017332e+001 2.222e-006 6.667e-002 2.222e-002 1.111e-001 3.999e-001 ... 1.493e-002 8.148e-002 1.111e-001 8.148e-002 2.222e-001 ...