Based on data on pairwise distances between viral sequences, this website provides online statistical tests for comparing the distribution of distances between two groups. Examples include comparing two geographic regions, two disease states, two phenotypes, or two compartments within an individual. A program by Jerry Learn that transforms a PAUP matrix to the format needed for the program is available at: http://indra.mullins.microbiol.washington.edu/paup_utils/cgi/format.html Three tests are performed, which are described in Gilbert, Rossini, and Shankarappa (2005), "Two-sample tests for comparing intra-individual genetic sequence diversity between populations", Biometrics, 2005, 61:107-118. The test statistics are Tpoolmean, Tpoolmedian, and Tsubjmean, which are defined in formulas (6), (7), and (8) of Gilbert, Rossini, and Shankarappa (2005), respectively. The first two test statistics can be used with only one individual per group, whereas Tsubjmean must have multiple invididuals in each group, because it treats the averages of the pairwise distances within each individual as the observations. For each test statistic, the program prints out (to the screen and to the file diverstest.out) the output of the the testing procedures (including test statistic, p-value, estimates of mean or median values, and 95% confidence intervals. The motivating application is data on multiple HIV sequences for each of several childen, where the goal is to compare intra-child diversity of viral sequences (as measured by pairwise distances among all sequences from the same child) between long-term non-progressor and rapid progressor groups. The program inputs a text file with the following columns: column 1: An integer indicating the sequence number of a given subject [Takes values 1 to the number of sequences for that subject minus 1]. column 2: An integer indicating the sequence number of the same subject being compared to the sequence in column 1 [Takes values 2 to the number of sequences for that subject]. column 3: An indicator of whether the subject is in group 1 or 2 [Takes values 1 or 2]. column 4: Identification number of the subject [Any positive integer uniquely identifying a subject]. column 5: Pairwise distance between the two sequences that are indicated in columns 1 and 2, for the given subject indicated in column 4. [Note that the pairwise distance in column 5 does not have to be a real distance (i.e., >= 0), it can be any variable contrasting two sequences. In Gilbert, Rossini, and Shankarappa (2005), a difference in two distances for putative CTL-rich and putative non-CTL-rich regions of the HIV genome were used.] Example 1: Suppose there are 4 subjects, the first with 3 sequences in group 1, the second with 4 sequences in group 1, the third with 5 sequences in group 2, and the fourth with 3 sequences in group 2. The input .txt file would be in the following format: 1 2 1 1001 0.1154 1 3 1 1001 0.1542 2 3 1 1001 0.0972 1 2 1 1002 0.2576 1 3 1 1002 0.2134 1 4 1 1002 0.1879 2 3 1 1002 0.1734 2 4 1 1002 0.2735 3 4 1 1002 0.1873 1 2 2 1003 0.2421 1 3 2 1003 0.2962 1 4 2 1003 0.1736 1 5 2 1003 0.1555 2 3 2 1003 0.1434 2 4 2 1003 0.2673 2 5 2 1003 0.1864 3 4 2 1003 0.1772 3 5 2 1003 0.1498 4 5 2 1003 0.2001 1 2 2 1004 0.1755 1 3 2 1004 0.1987 2 3 2 1004 0.2410 Example 2: The program can also be used for simpler analyses for which there is only one sequence per individual. For example, suppose a single HIV sequence is measured from each of 4 individuals in population 1, and a single HIV sequence is measured from each of 5 individuals in population 2. In this case, the goal is to compare inter-individual sequence diversity between populations. The input data file is the same as above, with only one "subject" per group. The input .txt file would be in the following format: 1 2 1 1001 0.1154 1 3 1 1001 0.1542 1 4 1 1001 0.1222 2 3 1 1001 0.0972 2 4 1 1001 0.2576 3 4 1 1001 0.2134 1 2 2 1002 0.1879 1 3 2 1002 0.1734 1 4 2 1002 0.2735 1 5 2 1002 0.1873 2 3 2 1002 0.1674 2 4 2 1002 0.1873 2 5 2 1002 0.2531 3 4 2 1002 0.2444 3 5 2 1002 0.2159 4 5 2 1002 0.1721 Example 3: As in example 2, suppose the goal is to compare inter-individual sequence diversity between two populations. Suppose the following pairwise matrices were generated by PAUP, for the first and second populations, respectively. Population 1: Maximum-likelihood distance matrix 22 23 24 25 26 27 28 29 22 04BR013 - 23 04BR021 0.11441 - 24 04BR038 0.13722 0.13454 - 25 04BR073 0.13111 0.11563 0.11034 - 26 04BR137 0.13411 0.11685 0.13698 0.05881 - 27 04BR142 0.13909 0.08250 0.11721 0.13907 0.13505 - 28 92BR025 0.05868 0.11331 0.13292 0.05479 0.05369 0.13196 - 29 98BR004 0.08199 0.09326 0.08127 0.11416 0.11021 0.08524 0.13388 - 40 ARG4006 0.13405 0.11574 0.13118 0.13782 0.11381 0.08255 0.13785 0.08998 31 TRA4011 0.11044 0.08290 0.11013 0.13473 0.13882 0.11868 0.05864 0.08356 Maximum-likelihood distance matrix (continued) 40 31 40 ARG4006 - 31 TRA4011 0.11291 - Population 2: Maximum-likelihood distance matrix 11 12 13 14 15 16 17 18 11 01IN565 10 - 12 93IN101 0.11175 - 13 93IN904 0.13689 0.05249 - 14 93IN905 0.11159 0.05479 0.05140 - 15 93IN999 0.08226 0.13181 0.05226 0.05346 - 16 94IN11246 0.11731 0.21681 0.05323 0.21197 0.13121 - 17 94IN476 0.10094 0.08512 0.08386 0.09124 0.10456 0.10411 - 18 95IN21068 0.11248 0.21361 0.05605 0.21252 0.05509 0.05272 0.09555 - 19 98IN012 0.08442 0.13573 0.05912 0.13715 0.13727 0.11372 0.10777 0.13124 20 98IN022 0.09616 0.13205 0.14097 0.13918 0.08220 0.11339 0.11717 0.11133 21 mIDU101 3 0.08647 0.13107 0.05467 0.05933 0.11120 0.13633 0.10146 0.13239 Maximum-likelihood distance matrix (continued) 19 20 21 19 98IN012 - 20 98IN022 0.08826 - 21 mIDU101 3 0.08231 0.01207 - The input data file would be as follows: 1 2 1 1001 0.11441 1 3 1 1001 0.13722 1 4 1 1001 0.13111 1 5 1 1001 0.13411 1 6 1 1001 0.13909 1 7 1 1001 0.05868 1 8 1 1001 0.08199 1 9 1 1001 0.13405 1 10 1 1001 0.11044 2 3 1 1001 0.13454 2 4 1 1001 0.11563 2 5 1 1001 0.11685 2 6 1 1001 0.08250 2 7 1 1001 0.11331 2 8 1 1001 0.09326 2 9 1 1001 0.1157 2 10 1 1001 0.08290 3 4 1 1001 0.11034 3 5 1 1001 0.13698 3 6 1 1001 0.11721 3 7 1 1001 0.13292 3 8 1 1001 0.08127 3 9 1 1001 0.13118 3 10 1 1001 0.11013 4 5 1 1001 0.05881 4 6 1 1001 0.1390 4 7 1 1001 0.05479 4 8 1 1001 0.11416 4 9 1 1001 0.13782 4 10 1 1001 0.13473 5 6 1 1001 0.13505 5 7 1 1001 0.05369 5 8 1 1001 0.11021 5 9 1 1001 0.11381 5 10 1 1001 0.13882 6 7 1 1001 0.13196 6 8 1 1001 0.08524 6 9 1 1001 0.08255 6 10 1 1001 0.11868 7 8 1 1001 0.13388 7 9 1 1001 0.13785 7 10 1 1001 0.0586 8 9 1 1001 0.08998 8 10 1 1001 0.08356 9 10 1 1001 0.1129 1 2 2 1002 0.11175 1 3 2 1002 0.13689 1 4 2 1002 0.11159 1 5 2 1002 0.08226 1 6 2 1002 0.11731 1 7 2 1002 0.10094 1 8 2 1002 0.11248 1 9 2 1002 0.08442 1 10 2 1002 0.09616 1 11 2 1002 0.08647 2 3 2 1002 0.05249 2 4 2 1002 0.05479 2 5 2 1002 0.13181 2 6 2 1002 0.21681 2 7 2 1002 0.08512 2 8 2 1002 0.21361 2 9 2 1002 0.13573 2 10 2 1002 0.13205 2 11 2 1002 0.13107 3 4 2 1002 0.05140 3 5 2 1002 0.05226 3 6 2 1002 0.05323 3 7 2 1002 0.08386 3 8 2 1002 0.05605 3 9 2 1002 0.05912 3 10 2 1002 0.14097 3 11 2 1002 0.05467 4 5 2 1002 0.05346 4 6 2 1002 0.21197 4 7 2 1002 0.09124 4 8 2 1002 0.21252 4 9 2 1002 0.13715 4 10 2 1002 0.13918 4 11 2 1002 0.05933 5 6 2 1002 0.13121 5 7 2 1002 0.10456 5 8 2 1002 0.05509 5 9 2 1002 0.13727 5 10 2 1002 0.08220 5 11 2 1002 0.11120 6 7 2 1002 0.10411 6 8 2 1002 0.05272 6 9 2 1002 0.11372 6 10 2 1002 0.11339 6 11 2 1002 0.13633 7 8 2 1002 0.09555 7 9 2 1002 0.10777 7 10 2 1002 0.11717 7 11 2 1002 0.10146 8 9 2 1002 0.13124 8 10 2 1002 0.11133 8 11 2 1002 0.13239 9 10 2 1002 0.08826 9 11 2 1002 0.08231 10 11 2 1002 0.01207