====== Data format ====== Different types of data are used in Hipathia. Some of this data require a certain structure explained on the following links: **Note:** The recommended file extensions are '.txt' or '.tsv'. ===== Expression matrix file format ===== Expression matrix file is a Tab-separated values file. A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure (e.g. database or spreadsheet data). Each record in the table is one line of the text file. Each field value of a record is separated from the next by a tab stop character. [[https://en.wikipedia.org/wiki/Tab-separated_values|More about TSV...]] This file has two columns if there is only one sample, and more than two columns if there are many samples. The first line is a header and must contain the sample names. The first column corresponds to genes, probes or proteins, and the following IDs are accepted: * Ensembl gene * HGNC symbol * Entrez id * Affy HG U133A probeset * Affy HG U133B probeset * Affy HG U133-PLUS-2 probeset * Affy HTA 2.0 The next columns correspond to gene expression values in numeric format from each sample. Here is an example of a file with only one example: id sampleName 1 0.3 2 1 3 0.73 And here is another example with more than one sample: id sample1 sample2 sample3 1 0.31 0.6 0.24 2 1 0.81 0.91 3 0.7 0.9 0.3 4 0.23 0.45 0.33 For a file example see {{:brca_genes_vals_bn.txt|}} **Note**: If probe expression values are provided, these are recodified to gene expression values, obtained as the average value of all the probes mapping in the gene. ===== Experimental design file format ===== Experimental design is Tab-separated values file. This file has two columns, the first one corresponds to the sample name and the second one corresponds to the phenotype. sample1 Group_1 sample2 Group_1 sample3 Group_2 **Note**: In case of **paired data** the Experimental design file must be **ordered**. Here is an example of a file with 4 piared samples (sample1_Normal and sample1_Treated are the same sample before and after treatment): sample1_Normal Group_1 sample2_Normal Group_1 sample1_Treated Group_2 sample2_Treated Group_2 Here is an other file example see {{:brca_normal-basal_ed.txt|}}. ===== Gene list file format ===== Gene List is Tab-separated values file. This file has just one column, that is the Entrez ID of genes (1 Entrez ID per line). Here is an example of a file with 4 genes to be evaluated: Gene_1 Gene_2 Gene_3 Gene_4 ====== Character encoding ====== We recommend using the **[[https://en.wikipedia.org/wiki/UTF-8 | UTF-8]]** character encoding for your content or data.