|
|
||||||||||
Project Home |
Tracker |
Documents |
Tasks |
Source Code |
Discussions |
Reports |
File Releases |
Wiki |
Project Admin |
|||||||||||||
| wiki1024: LibaffyCorrectness | |||||||||||||||||||||||||||||||||||
libaffy Verification of CorrectnessSteven Eschrich11-30-2006DatasetSensitivity of expression algorithms is often measured against several well-known datasets, including a spikein dataset produced by Affymetrix (U133A) or a series of dilutions. However, these datasets are very self-similar and we wanted to ensure correctness over a large range of expression (low and high) in real data. We selected a set of normal tissue chips, available from the Gene Expression Omnibus (GEO).Author's Summary (from GEO) Author's Reference RMAAlthough implemented independently, RMA was developed within Bioconductor and the source code for this algorithm is freely available. The libaffy code writes out six decimal places of precision. When comparing Bioconductor to libaffy, the expression values agree to a mean difference 2.033e-05 (range is [-2.3e-05, 5.45e-05]). As seen below, most of the differences were positive (differences were computed as libaffy-r), no doubt an indication of the rounding that occurs. Examining the mean difference by chip, all of the chips have a slight positive difference, again in the range of 2e-05.
MAS 5.0In the case of the MAS5.0 algorithm it has been documented elsewhere (http://bmbolstad.com/misc/MAS5diff/Mas5difference.html) that there are differences between the bioconductor algorithm and the Affymetrix MAS5.0 implementation. We investigated this phenomenon within our code, which is detailed here.For correctness, there are two pairings to consider:
MAS5.0 uses scaling to normalize chips to the same trimmed mean intensity. Since we cannot perfectly replicate the original intensities it is unclear whether scaling exacerbates the initial discrepancies or introduces additional differences. Therefore we compare unscaled values. BioconductorBioconductor vs. libaffy is very straightforward. The differences appear to be due only to the precision of the representation (libaffy writes results to text files at 6 digits of precision). While not shown, these differences are identical when scaling is applied. The conclusion is that with bioconductor compatability, the results between the two implementations are identical.
AffymetrixWhen considering libaffy vs. the Affymetrix implementation, looking overall we see very little difference in expression. Indeed, the mean difference is -0.0001. There are several probesets that are extreme, all differences are bounded by the range [-37.84716,10.69301]. A closer look at intensity differences greater than 1 indicates there are 5 probesets with differences greater than 10, across all chips examined.
A comparison with the Bioconductor version shows that slightly fewer probesets have larger differences in libaffy, although the differences are very slight. In fact, the following frequency table illustrates this point (keep in mind there are 802,188 or 22283*36 probeset values, of which ~100 are greater than 1).
|
|||||||||||||||||||||||||||||||||||
|
|