Description of the data

This directory contains the data files used to compare the two sequencing platforms: Illumina Hiseq 2500 and the BGISEQ-500. There are 8 samples included in the study, viz., 1921, 214, FRC, L, M1, P79, P83 and P84. Details on the samples can be found in the manuscript.

All the data is organized in 2 main subdirectories, raw reads and trimmed reads, which are described below.

rawReads

This directory contains the raw reads (reads, as they are obtained from the sequencing machines - warts and all) for all the samples sequenced as part of this study. There are 3 sub-folders associated with each of the samples listed above.

The directories without the letter "z" as the first letter in the name contain the raw reads from the Illumina Hiseq 2500.

The directories beginning with "z" but not ending with "p" contain the raw reads from the BGISEQ-500 for the non-purified version of the BGISEQ-500 library.

Finally, the directories that begin with "z" and end with "p" contain the raw reads from the BGISEQ-500 for the purified version of the BGISEQ-500 library.

trimmedReads

This directory is similar to the rawReads directory, but instead of the raw reads from the sequencing machine, it contains the reads after they have been processed by AdapterRemoval2 to remove the platform specific adapters and trimmed for bases with qualities lower than 2. The subfolder structure in this directory is identical the one under the directory rawReads. In case of any questions, please contact the authors of the paper.