Project Home

Tracker

Documents

Tasks

Source Code

Discussions

Reports

File Releases

Wiki

Project Admin
Search Wiki Pages Project: libaffy     Wiki > LibaffyCorrectness > View Wiki Page
wiki1024: LibaffyCorrectness

libaffy Verification of Correctness

Steven Eschrich

11-30-2006


Dataset

Sensitivity of expression algorithms is often measured against several well-known datasets, including a spikein dataset produced by Affymetrix (U133A) or a series of dilutions. However, these datasets are very self-similar and we wanted to ensure correctness over a large range of expression (low and high) in real data. We selected a set of normal tissue chips, available from the Gene Expression Omnibus (GEO).

Author's Summary (from GEO)
We performed expression profiling of 36 types of normal human tissues and identified 2,503 tissue-specific genes. We then systematically studied the expression of these genes in cancers by re-analyzing a large collection of published DNA microarray datasets. Our study shows that integration of each gene's breadth of expression (BOE) in normal tissues is important for biological interpretation of the expression profiles of cancers in terms of tumor differentiation, cell lineage and metastasis. Twenty five total RNA specimens were purchased from Clontech (Palo Alto, CA), Ambion (Austin, TX) and Strategene (La Jolla, CA). We tried to cover as many tissue types as possible by using pooled RNA samples. In order to define breadth-of-expression (BOE) accurately at a reasonable cost, we tried to cover as many tissue types as possible by using pooled RNA samples. Each specimen represents a human organ. We used RNA samples pooled from 2 to 84 donors to avoid differences at the individual level.

Author's Reference
Ge X et al., Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues.
Genomics. 2005 Aug;86(2):127-141.
PMID: 15950434
Web Link http://www.genome.rcast.u-tokyo.ac.jp/normal/

RMA

Although implemented independently, RMA was developed within Bioconductor and the source code for this algorithm is freely available. The libaffy code writes out six decimal places of precision. When comparing Bioconductor to libaffy, the expression values agree to a mean difference 2.033e-05 (range is [-2.3e-05, 5.45e-05]). As seen below, most of the differences were positive (differences were computed as libaffy-r), no doubt an indication of the rounding that occurs. Examining the mean difference by chip, all of the chips have a slight positive difference, again in the range of 2e-05.

rma-diffs-histogram.jpgrma-diffs-histogram.jpg

MAS 5.0

In the case of the MAS5.0 algorithm it has been documented elsewhere (http://bmbolstad.com/misc/MAS5diff/Mas5difference.html) that there are differences between the bioconductor algorithm and the Affymetrix MAS5.0 implementation. We investigated this phenomenon within our code, which is detailed here.

For correctness, there are two pairings to consider:

  1. Affymetrix MAS5.0 vs. libaffy
  2. Bioconductor vs. libaffy

MAS5.0 uses scaling to normalize chips to the same trimmed mean intensity. Since we cannot perfectly replicate the original intensities it is unclear whether scaling exacerbates the initial discrepancies or introduces additional differences. Therefore we compare unscaled values.

Bioconductor

Bioconductor vs. libaffy is very straightforward. The differences appear to be due only to the precision of the representation (libaffy writes results to text files at 6 digits of precision). While not shown, these differences are identical when scaling is applied. The conclusion is that with bioconductor compatability, the results between the two implementations are identical.

mas5-bioconductor-noscale.jpg

Affymetrix

When considering libaffy vs. the Affymetrix implementation, looking overall we see very little difference in expression. Indeed, the mean difference is -0.0001. There are several probesets that are extreme, all differences are bounded by the range [-37.84716,10.69301]. A closer look at intensity differences greater than 1 indicates there are 5 probesets with differences greater than 10, across all chips examined.

mas5-affy-noscale-all.jpg
mas5-affy-noscale-size1.jpg mas5-r-affy-noscale-size1.jpg

A comparison with the Bioconductor version shows that slightly fewer probesets have larger differences in libaffy, although the differences are very slight. In fact, the following frequency table illustrates this point (keep in mind there are 802,188 or 22283*36 probeset values, of which ~100 are greater than 1).

Difference Greater Than Bioconductor libaffy
1 109 77
2 58 39
3 36 26
4 29 18
5 27 12
6 23 9
7 16 8
8 12 7
9 10 6
10 7 5

Attachments:
mas5-r-affy-noscale-size1.jpg [LibaffyCorrectness/mas5-r-affy-noscale-size1.jpg]
mas5-affy-noscale-size1.jpg [LibaffyCorrectness/mas5-affy-noscale-size1.jpg]
mas5-affy-noscale-all.jpg [LibaffyCorrectness/mas5-affy-noscale-all.jpg]
mas5-bioconductor-noscale.jpg [LibaffyCorrectness/mas5-bioconductor-noscale.jpg]
rma-diffs-histogram-bychip.jpg [LibaffyCorrectness/rma-diffs-histogram-bychip.jpg]
rma-diffs-histogram.jpg [LibaffyCorrectness/rma-diffs-histogram.jpg]