Data format

Different types of data are used in Hipathia. Some of this data require a certain structure explained on the following links:

Expression matrix file format

Expression matrix file is a Tab-separated values file.

A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure (e.g. database or spreadsheet data). Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. More about TSV...

This file has two columns if there is only one sample, and more than two columns if there are many samples. The first line is a header and must contain the sample names. The first column corresponds to genes, probes or proteins, and the following IDs are accepted:

Ensembl gene
HGNC symbol
Entrez id
Affy HG U133A probeset
Affy HG U133B probeset
Affy HG U133-PLUS-2 probeset
Affy HTA 2.0

The next columns correspond to gene expression values in numeric format from each sample.

Here is an example of a file with only one example:

id	sampleName
1	0.3
2	1
3	0.73

And here is another example with more than one sample:

id	sample1	sample2	sample3
1	0.31	0.6	0.24
2	1	0.81	0.91
3	0.7	0.9	0.3
4	0.23	0.45	0.33

For a file example see brca_genes_vals_bn.txt

Note: If probe expression values are provided, these are recodified to gene expression values, obtained as the average value of all the probes mapping in the gene.

Experimental design file format

Experimental design is Tab-separated values file. This file has two columns, the first one corresponds to the sample name and the second one corresponds to the phenotype.

Note: In case of paired data the Experimental design file must be ordered.

Here is an example of a file with 4 piared samples (sample1_Normal and sample1_Treated are the same sample before and after treatment):

sample1_Normal	Group_1
sample2_Normal	Group_1
sample1_Treated	Group_2
sample2_Treated	Group_2

Here is an other file example see brca_normal-basal_ed.txt.

Gene list file format

Gene List is Tab-separated values file. This file has just one column, that is the Entrez ID of genes (1 Entrez ID per line).

Here is an example of a file with 4 genes to be evaluated:

Gene_1
Gene_2
Gene_3
Gene_4

CoV-HiPathia

Sidebar

Table of Contents

Data format

Expression matrix file format

Experimental design file format

Gene list file format

CoV-HiPathia

User Tools

Site Tools

Sidebar

Table of Contents

Data format

Expression matrix file format

Experimental design file format

Gene list file format

Page Tools