Microarray Read Affymetrix CEL file Options

Use this to specify options when opening Affymetrix CEL files. This menu appears as each CEL file is opened. As CEL files can be very large (for example, 500,000 to 900,000 rows per file), it can be advantageous to process the data directly to the server, file by file, and save the results to a spreadsheet file. This menu provides some options to process the files in this way. In addition, you can reduce the memory overhead required by only reading the necessary columns from the CEL file. There are options to control how cells marked as outliers or masked are handled.

When data are loaded the Calculate Affymetrix Expression Values menu can be used to calculate expression values summarized over the PM/MM pairs or atoms. This mwnu provides more options than are available by just using the batch process which has only the summary method.

Batch process CEL files to expression values

When selected, the CEL files and associated CDF file will be loaded into the server, and a summary spreadsheet will be produced. This option is only available when CEL files are opened using the Open Microarray Data Files menu.

Method

The statistical method used to summarize over the PM/MM pairs. The methods available are:

RMA – Robust means analysis model	The probe level model introduced by Irizarry et al. (2003) which only uses PM information and transforms the values based on a kernel density estimate of the PM distribution
RMA2- Robust means analysis 2	An adaptation of RMA algorithm which fits the kernel density to a truncated distribution of the PM values, with the truncation point based on an initial kernel density estimate.
MAS4 – Affymetrix version 4	The AvDiff algorithm introduced in the Affymetrix version 4 software
MAS5 – Affymetrix version 5	The Tukey biweight algorithm introduced in the Affymetrix version 5 software

In the Affymetrix MAS 4 and 5 methods the difference between the signals, PM – MM is averaged using a robust averaging. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the differences depending on how far they are from the median, and discards any differences which are more than 5 times the median absolute distance from the median. The MAS 5 algorithm also replaces the MM value with a value which is always less than the PM value, calculating what is known as an Ideal mismatch (IM) in this situation.

The standard RMA algorithm would normally use the log 2 transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarized with a median polish of the slides by atom values for each probe.

Use log base 2 transformation

This controls whether a to use a log base 2 transformation for the PM/MM intensities.

Default	MAS5, RMA and RMA2 are transformed and MAS4 is not transformed
No	The PM/MM intensities are not transformed.
Yes	Log base 2 transformation is used for all the PM/MM intensities.

The calculation for the transformation using Genstat command language is:
CALC LogY = LOG(Y)/LOG(2)

Save results to a GSH file

The results from the Batch process will be written to the specified .gsh file. You can click on the Browse button to locate a file and folder.

CEL data read in

When reading data from CEL files some columns can be excluded. These options control which columns can be excluded when the data is loaded.

Standard deviations	The standard deviations of the pixel values used to calculate the intensity of each cell on the chip in the image analysis stage.
Pixel counts	The number of pixel values used to calculate the intensity of each cell on the chip in the image analysis stage.

Masked cells and outliers

These two options control how the information on masked cells and outliers will be read in.

Report units with a factor	A factor called Flags will be created. This will have 4 potential labels, None, Outlier, Masked and Both for a cell which is both an outlier and is masked. The original intensity value will be read in the Intensity column.
Set Intensity to missing	A missing value () will be inserted in the Intensity* column wherever the cell is flagged as an outlier or as masked by the user.

Action buttons

OK	Read in the CEL files using the current options.
Cancel	Close the dialog and do not open the CEL files.
Reset	Reset options to the default settings.

Action Icons

	Clear	Clear all fields and list boxes.
	Help	Open the Help topic for this dialog.