This vignette demonstrates how to use ASCAT to analyse multiple phylogenetically related samples. For the general usage of ASCAT including parameters that are not specific to multi-sample analysis please refer to the ASCAT webpage and the example pipeline.
We start by loading the ASCAT package.
library(ASCAT)
Next we load the data.
ascat.bcMulti <- ascat.loadData(
Tumor_LogR_file = system.file("extdata", "tumour.logR.txt", package="ASCAT"),
Tumor_BAF_file = system.file("extdata", "tumour.BAF.txt", package="ASCAT"),
Germline_LogR_file = system.file("extdata", "singlenormal.logR.txt", package="ASCAT"),
Germline_BAF_file = system.file("extdata", "singlenormal.BAF.txt", package="ASCAT"))
## [1] Reading Tumor LogR data...
## [1] Reading Tumor BAF data...
## [1] Reading Germline LogR data...
## [1] Reading Germline BAF data...
## [1] Registering SNP locations...
## [1] Splitting genome in distinct chunks...
Both Tumor_LogR_file and Tumor_BAF_file are expected to contain a column for each of the samples to analyse.
head(ascat.bcMulti$Tumor_LogR)
## S1 S2
## SNP1 0.03615 -1.03950
## SNP2 0.14998 -0.79433
## SNP3 -0.00891 -0.76137
## SNP4 0.40188 -0.67521
## SNP5 0.14902 -0.72980
## SNP6 0.24118 -1.11302
head(ascat.bcMulti$Tumor_BAF)
## S1 S2
## SNP1 0.51596 0.99262
## SNP2 0.67903 0.00255
## SNP3 1.00000 1.00000
## SNP4 0.00000 0.00000
## SNP5 1.00000 1.00000
## SNP6 0.45572 0.04925
The next step is to run the segmentation. When analysing phylogenetically related samples, it is expected that some of the copy number segment boundaries are shared between samples. In this case a joint segmentation of all samples is recommended. The synthetic data set used in this example was also simulated with partly shared segment boundaries. The ground truth copy number plots of the two samples we are going to analyse are shown in the following plots.
The multi-sample segmentation algorithm can be run using the function ascat.asmultipcf.
ascat.bcMulti <- ascat.asmultipcf(ascat.bcMulti,penalty = 5)
## [1] "Segmentlength 5"
Finally ASCAT can be run on the segmented data set.
ascat.outputMulti = ascat.runAscat(ascat.bcMulti)
Finally, we compare our result to that of standard single sample segmentation using ascat.aspcf.
ascat.bc = ascat.loadData(system.file("extdata", "tumour.logR.txt", package="ASCAT"),
system.file("extdata", "tumour.BAF.txt", package="ASCAT"),
system.file("extdata", "normal.logR.txt", package="ASCAT"),
system.file("extdata", "normal.BAF.txt", package="ASCAT"))
## [1] Reading Tumor LogR data...
## [1] Reading Tumor BAF data...
## [1] Reading Germline LogR data...
## [1] Reading Germline BAF data...
## [1] Registering SNP locations...
## [1] Splitting genome in distinct chunks...
ascat.bc = ascat.aspcf(ascat.bc,penalty = 25)
## [1] Sample S1 (1/2)
## [1] Sample S2 (2/2)
Note that in the single-sample case the same segmentation sensitivity is achieved with a higher penalty parameter compared to the multi-sample case. This means, when switching from single- to multi-sample segmentation, the penalty parameter needs to be lowered to maintain a similar sensitivity.
We plot the segment boundaries inferred for each of the two samples by multi- and single-sample segmentation.
plot.segments(v1=cumsum(rle(ascat.bc$Tumor_LogR_segmented[,1])$lengths),
v2=cumsum(rle(ascat.bc$Tumor_LogR_segmented[,2])$lengths),
main="Single-sample segmentation")
plot.segments(v1=cumsum(rle(ascat.bcMulti$Tumor_LogR_segmented[,1])$lengths),
v2=cumsum(rle(ascat.bcMulti$Tumor_LogR_segmented[,2])$lengths),
main="Multi-sample segmentation")
In case of single-sample segmentation the inferred positions of most of the shared segment boundaries vary slightly between the two samples, whereas the multi-sample segmentation infers a common breakpoint when there is no significant evidence that the boundaries differ between samples.