Last updated: 2025-06-09
Checks: 7 0
Knit directory: ATAC_learning/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20231016)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version efea011. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/H3K27ac_integration_noM.Rmd
Ignored: analysis/figure/
Ignored: data/ACresp_SNP_table.csv
Ignored: data/ARR_SNP_table.csv
Ignored: data/All_merged_peaks.tsv
Ignored: data/CAD_gwas_dataframe.RDS
Ignored: data/CTX_SNP_table.csv
Ignored: data/Collapsed_expressed_NG_peak_table.csv
Ignored: data/DEG_toplist_sep_n45.RDS
Ignored: data/FRiP_first_run.txt
Ignored: data/Final_four_data/
Ignored: data/Frip_1_reads.csv
Ignored: data/Frip_2_reads.csv
Ignored: data/Frip_3_reads.csv
Ignored: data/Frip_4_reads.csv
Ignored: data/Frip_5_reads.csv
Ignored: data/Frip_6_reads.csv
Ignored: data/GO_KEGG_analysis/
Ignored: data/HF_SNP_table.csv
Ignored: data/Ind1_75DA24h_dedup_peaks.csv
Ignored: data/Ind1_TSS_peaks.RDS
Ignored: data/Ind1_firstfragment_files.txt
Ignored: data/Ind1_fragment_files.txt
Ignored: data/Ind1_peaks_list.RDS
Ignored: data/Ind1_summary.txt
Ignored: data/Ind2_TSS_peaks.RDS
Ignored: data/Ind2_fragment_files.txt
Ignored: data/Ind2_peaks_list.RDS
Ignored: data/Ind2_summary.txt
Ignored: data/Ind3_TSS_peaks.RDS
Ignored: data/Ind3_fragment_files.txt
Ignored: data/Ind3_peaks_list.RDS
Ignored: data/Ind3_summary.txt
Ignored: data/Ind4_79B24h_dedup_peaks.csv
Ignored: data/Ind4_TSS_peaks.RDS
Ignored: data/Ind4_V24h_fraglength.txt
Ignored: data/Ind4_fragment_files.txt
Ignored: data/Ind4_fragment_filesN.txt
Ignored: data/Ind4_peaks_list.RDS
Ignored: data/Ind4_summary.txt
Ignored: data/Ind5_TSS_peaks.RDS
Ignored: data/Ind5_fragment_files.txt
Ignored: data/Ind5_fragment_filesN.txt
Ignored: data/Ind5_peaks_list.RDS
Ignored: data/Ind5_summary.txt
Ignored: data/Ind6_TSS_peaks.RDS
Ignored: data/Ind6_fragment_files.txt
Ignored: data/Ind6_peaks_list.RDS
Ignored: data/Ind6_summary.txt
Ignored: data/Knowles_4.RDS
Ignored: data/Knowles_5.RDS
Ignored: data/Knowles_6.RDS
Ignored: data/LiSiLTDNRe_TE_df.RDS
Ignored: data/MI_gwas.RDS
Ignored: data/SNP_GWAS_PEAK_MRC_id
Ignored: data/SNP_GWAS_PEAK_MRC_id.csv
Ignored: data/SNP_gene_cat_list.tsv
Ignored: data/SNP_supp_schneider.RDS
Ignored: data/TE_info/
Ignored: data/TFmapnames.RDS
Ignored: data/all_TSSE_scores.RDS
Ignored: data/all_four_filtered_counts.txt
Ignored: data/aln_run1_results.txt
Ignored: data/anno_ind1_DA24h.RDS
Ignored: data/anno_ind4_V24h.RDS
Ignored: data/annotated_gwas_SNPS.csv
Ignored: data/background_n45_he_peaks.RDS
Ignored: data/cardiac_muscle_FRIP.csv
Ignored: data/cardiomyocyte_FRIP.csv
Ignored: data/col_ng_peak.csv
Ignored: data/cormotif_full_4_run.RDS
Ignored: data/cormotif_full_4_run_he.RDS
Ignored: data/cormotif_full_6_run.RDS
Ignored: data/cormotif_full_6_run_he.RDS
Ignored: data/cormotif_probability_45_list.csv
Ignored: data/cormotif_probability_45_list_he.csv
Ignored: data/cormotif_probability_all_6_list.csv
Ignored: data/cormotif_probability_all_6_list_he.csv
Ignored: data/datasave.RDS
Ignored: data/embryo_heart_FRIP.csv
Ignored: data/enhancer_list_ENCFF126UHK.bed
Ignored: data/enhancerdata/
Ignored: data/filt_Peaks_efit2.RDS
Ignored: data/filt_Peaks_efit2_bl.RDS
Ignored: data/filt_Peaks_efit2_n45.RDS
Ignored: data/first_Peaksummarycounts.csv
Ignored: data/first_run_frag_counts.txt
Ignored: data/full_bedfiles/
Ignored: data/gene_ref.csv
Ignored: data/gwas_1_dataframe.RDS
Ignored: data/gwas_2_dataframe.RDS
Ignored: data/gwas_3_dataframe.RDS
Ignored: data/gwas_4_dataframe.RDS
Ignored: data/gwas_5_dataframe.RDS
Ignored: data/high_conf_peak_counts.csv
Ignored: data/high_conf_peak_counts.txt
Ignored: data/high_conf_peaks_bl_counts.txt
Ignored: data/high_conf_peaks_counts.txt
Ignored: data/hits_files/
Ignored: data/hyper_files/
Ignored: data/hypo_files/
Ignored: data/ind1_DA24hpeaks.RDS
Ignored: data/ind1_TSSE.RDS
Ignored: data/ind2_TSSE.RDS
Ignored: data/ind3_TSSE.RDS
Ignored: data/ind4_TSSE.RDS
Ignored: data/ind4_V24hpeaks.RDS
Ignored: data/ind5_TSSE.RDS
Ignored: data/ind6_TSSE.RDS
Ignored: data/initial_complete_stats_run1.txt
Ignored: data/left_ventricle_FRIP.csv
Ignored: data/median_24_lfc.RDS
Ignored: data/median_3_lfc.RDS
Ignored: data/mergedPeads.gff
Ignored: data/mergedPeaks.gff
Ignored: data/motif_list_full
Ignored: data/motif_list_n45
Ignored: data/motif_list_n45.RDS
Ignored: data/multiqc_fastqc_run1.txt
Ignored: data/multiqc_fastqc_run2.txt
Ignored: data/multiqc_genestat_run1.txt
Ignored: data/multiqc_genestat_run2.txt
Ignored: data/my_hc_filt_counts.RDS
Ignored: data/my_hc_filt_counts_n45.RDS
Ignored: data/n45_bedfiles/
Ignored: data/n45_files
Ignored: data/other_papers/
Ignored: data/peakAnnoList_1.RDS
Ignored: data/peakAnnoList_2.RDS
Ignored: data/peakAnnoList_24_full.RDS
Ignored: data/peakAnnoList_24_n45.RDS
Ignored: data/peakAnnoList_3.RDS
Ignored: data/peakAnnoList_3_full.RDS
Ignored: data/peakAnnoList_3_n45.RDS
Ignored: data/peakAnnoList_4.RDS
Ignored: data/peakAnnoList_5.RDS
Ignored: data/peakAnnoList_6.RDS
Ignored: data/peakAnnoList_Eight.RDS
Ignored: data/peakAnnoList_full_motif.RDS
Ignored: data/peakAnnoList_n45_motif.RDS
Ignored: data/siglist_full.RDS
Ignored: data/siglist_n45.RDS
Ignored: data/summarized_peaks_dataframe.txt
Ignored: data/summary_peakIDandReHeat.csv
Ignored: data/test.list.RDS
Ignored: data/testnames.txt
Ignored: data/toplist_6.RDS
Ignored: data/toplist_full.RDS
Ignored: data/toplist_full_DAR_6.RDS
Ignored: data/toplist_n45.RDS
Ignored: data/trimmed_seq_length.csv
Ignored: data/unclassified_full_set_peaks.RDS
Ignored: data/unclassified_n45_set_peaks.RDS
Ignored: data/xstreme/
Untracked files:
Untracked: RNA_seq_integration.Rmd
Untracked: Rplot.pdf
Untracked: analysis/.gitignore
Untracked: analysis/Cormotif_analysis_testing diff.Rmd
Untracked: analysis/Diagnosis-tmm.Rmd
Untracked: analysis/Expressed_RNA_associations.Rmd
Untracked: analysis/LFC_corr.Rmd
Untracked: analysis/SVA.Rmd
Untracked: analysis/Tan2020.Rmd
Untracked: analysis/making_master_peaks_list.Rmd
Untracked: analysis/my_hc_filt_counts.csv
Untracked: code/Concatenations_for_export.R
Untracked: code/IGV_snapshot_code.R
Untracked: code/LongDARlist.R
Untracked: code/just_for_Fun.R
Untracked: my_plot.pdf
Untracked: my_plot.png
Untracked: output/cormotif_probability_45_list.csv
Untracked: output/cormotif_probability_all_6_list.csv
Untracked: setup.RData
Unstaged changes:
Modified: ATAC_learning.Rproj
Modified: analysis/AF_HF_SNPs.Rmd
Modified: analysis/Cardiotox_SNPs.Rmd
Modified: analysis/Cormotif_analysis.Rmd
Modified: analysis/DEG_analysis.Rmd
Modified: analysis/Enhancer_enrichment_DOX_DAR.Rmd
Modified: analysis/H3K27ac_initial_QC.Rmd
Modified: analysis/Jaspar_motif.Rmd
Modified: analysis/Jaspar_motif_ff.Rmd
Modified: analysis/TE_analysis_DOX_DAR.Rmd
Modified: analysis/TE_analysis_norm.Rmd
Modified: analysis/final_four_analysis.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/Counts_matrix.Rmd
) and
HTML (docs/Counts_matrix.html
) files. If you’ve configured
a remote Git repository (see ?wflow_git_remote
), click on
the hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | efea011 | reneeisnowhere | 2025-06-09 | update log2cpm of VEH correlation |
html | 6ad2095 | reneeisnowhere | 2025-05-21 | Build site. |
Rmd | 3af1886 | reneeisnowhere | 2025-05-21 | updates to testing PCA variance contribution |
html | 54e129a | reneeisnowhere | 2025-05-15 | Build site. |
Rmd | d49fd7c | reneeisnowhere | 2025-05-15 | rastr updates and spelling |
html | 5e6e462 | reneeisnowhere | 2025-05-07 | Build site. |
Rmd | 2db35c7 | reneeisnowhere | 2025-05-07 | updates to analysis |
html | 0b53518 | reneeisnowhere | 2025-05-06 | Build site. |
Rmd | e08ea71 | reneeisnowhere | 2025-05-06 | updates |
html | 193c36d | reneeisnowhere | 2025-05-06 | Build site. |
Rmd | 7c62aac | reneeisnowhere | 2025-05-06 | reanalyze because of error |
library(tidyverse)
library(kableExtra)
library(broom)
library(RColorBrewer)
library(ChIPseeker)
library("TxDb.Hsapiens.UCSC.hg38.knownGene")
library("org.Hs.eg.db")
library(rtracklayer)
library(edgeR)
library(ggfortify)
library(limma)
library(readr)
library(BiocGenerics)
library(gridExtra)
library(VennDiagram)
library(scales)
library(BiocParallel)
library(ggpubr)
library(devtools)
library(eulerr)
library(ggsignif)
library(plyranges)
library(ggrepel)
library(ComplexHeatmap)
library(cowplot)
library(smplot2)
library(data.table)
library(ChIPpeakAnno)
drug_pal <- c("#8B006D","#DF707E","#F1B72B", "#3386DD","#707031","#41B333")
prop_var_percent <- function(pca_result){
# Ensure the input is a PCA result object
if (!inherits(pca_result, "prcomp")) {
stop("Input must be a result from prcomp()")
}
# Get the standard deviations from the PCA result
sdev <- pca_result$sdev
# Calculate the proportion of variance
proportion_variance <- (sdev^2) / sum(sdev^2)*100
return(proportion_variance)
}
This section is using the featureCounts matrix of the 172,481 high confidence peaks across all samples called “all_four_filtered_counts.txt”.
### Load the table
all_four_filt_counts <- read.delim("data/all_four_filtered_counts.txt",header=TRUE) %>%
GRanges() %>%
keepStandardChromosomes(pruning.mode = "coarse") %>%
as.data.frame()
##remove tail from column names
names(all_four_filt_counts) = gsub(pattern = "_S.*", replacement = "", x = names(all_four_filt_counts))
### remove directory from individual file names
names(all_four_filt_counts) = gsub(pattern = "ind1.trimmed.filt_files.trimmed_|ind2.trimmed.filt_files.trimmed_|ind3.trimmed.filt_files.trimmed_|ind6.trimmed.filt_files.trimmed_", replacement = "", x = names(all_four_filt_counts))
The number of peaks from master peak set: 172481 peaks
raw_counts <- all_four_filt_counts %>%
dplyr::select(Geneid,Ind1_75DA24h:Ind6_71V3h) %>%
column_to_rownames("Geneid") %>%
rename_with(.,~gsub(pattern = "Ind1_75", replacement = "D_",.)) %>%
rename_with(.,~gsub(pattern = "Ind2_87", replacement = "A_",.)) %>%
rename_with(.,~gsub(pattern = "Ind3_77", replacement = "B_",.)) %>%
rename_with(.,~gsub(pattern = "Ind6_71", replacement = "C_",.)) %>%
rename_with(.,~gsub( "DX" ,'DOX',.)) %>%
rename_with(.,~gsub( "DA" ,'DNR',.)) %>%
rename_with(.,~gsub( "E" ,'EPI',.)) %>%
rename_with(.,~gsub( "T" ,'TRZ',.)) %>%
rename_with(.,~gsub( "M" ,'MTX',.)) %>%
rename_with(.,~gsub( "V" ,'VEH',.)) %>%
rename_with(.,~gsub("24h","_24h",.)) %>%
rename_with(.,~gsub("3h","_3h",.)) %>%
as.matrix()
hist(raw_counts, main= "Raw counts across samples",
xlab = "Raw counts",
col=4)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
hist(cpm(raw_counts, log=TRUE),
main = expression("Histogram of Log"[2]*" cpm unfiltered"),
xlab = expression("Log"[2]*" counts-per-million"),
col=4)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
boxplot(cpm(raw_counts, log=TRUE),
main=expression("Boxplot of Log"[2]*" counts-per-million"),
col=4,
names=colnames(raw_counts),
las=2, cex.axis=.7)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
Saving a .tsv of the raw unfiltered counts:
raw_counts %>%
as.data.frame() %>%
rownames_to_column("Peakid") %>%
write_delim(., "data/Final_four_data/re_analysis/Raw_unfiltered_counts.tsv",delim="\t")
cor_raw_counts <- raw_counts %>%
cpm(., log = TRUE) %>%
cor(.,method = "pearson")
anno_raw_counts <- data.frame(timeset = colnames(cor_raw_counts))
counts_corr_mat <-anno_raw_counts %>%
separate(timeset, into = c("indv","trt","time"), sep= "_") %>%
mutate(class = if_else(trt == "DNR", "AC",
if_else(trt == "DOX", "AC",
if_else(trt == "EPI", "AC", "nAC")))) %>%
mutate(TOP2i = if_else(trt == "DNR", "yes",
if_else(trt == "DOX", "yes",
if_else(trt == "EPI", "yes",
if_else(trt == "MTX", "yes", "no")))))
mat_colors <- list(
trt= c("#F1B72B","#8B006D","#DF707E","#3386DD","#707031","#41B333"),
indv=c("#1B9E77", "#D95F02" ,"#7570B3", "#E6AB02"),
time=c("pink", "chocolate4"),
class=c("yellow1","darkorange1"),
TOP2i =c("darkgreen","lightgreen"))
names(mat_colors$trt) <- unique(counts_corr_mat$trt)
names(mat_colors$indv) <- unique(counts_corr_mat$indv)
names(mat_colors$time) <- unique(counts_corr_mat$time)
names(mat_colors$class) <- unique(counts_corr_mat$class)
names(mat_colors$TOP2i) <- unique(counts_corr_mat$TOP2i)
htanno_full <- ComplexHeatmap::HeatmapAnnotation(df = counts_corr_mat, col = mat_colors)
Heatmap(cor_raw_counts, top_annotation = htanno_full)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
Using a log2cpm row_means>0, I filtered out the peaks with very low counts. I discovered there were three Y-chromosome reads and also removed them.
lcpm <- cpm(raw_counts, log= TRUE)
### for determining the basic cutoffs
filt_raw_counts <- raw_counts[rowMeans(lcpm)> 0,]
dim(filt_raw_counts)
[1] 155560 48
tail(rownames(filt_raw_counts),n=10)
[1] "chrX.155755841.155756459" "chrX.155767114.155768688"
[3] "chrX.155819666.155820696" "chrX.155880909.155881872"
[5] "chrX.155888167.155888502" "chrX.155948972.155949499"
[7] "chrX.155997370.155997776" "chrY.11293234.11293637"
[9] "chrY.11295399.11295868" "chrY.56836676.56837364"
Looking at the data set, I found there were three Y chromosome peaks and removed them.
filt_raw_counts_noY <- filt_raw_counts[!grepl("chrY",rownames(filt_raw_counts)),]
dim(filt_raw_counts_noY)
[1] 155557 48
tail(filt_raw_counts_noY)
D_DNR_24h D_DNR_3h D_DOX_24h D_DOX_3h D_EPI_24h
chrX.155767114.155768688 105 57 139 58 110
chrX.155819666.155820696 257 78 250 64 273
chrX.155880909.155881872 277 248 287 254 324
chrX.155888167.155888502 42 19 83 20 72
chrX.155948972.155949499 39 23 68 26 47
chrX.155997370.155997776 23 4 19 9 37
D_EPI_3h D_MTX_24h D_MTX_3h D_TRZ_24h D_TRZ_3h
chrX.155767114.155768688 55 102 34 81 68
chrX.155819666.155820696 49 219 94 72 58
chrX.155880909.155881872 199 280 176 270 202
chrX.155888167.155888502 13 52 27 12 32
chrX.155948972.155949499 18 52 10 21 14
chrX.155997370.155997776 12 22 9 27 9
D_VEH_24h D_VEH_3h A_DNR_24h A_DNR_3h A_DOX_24h
chrX.155767114.155768688 114 42 49 98 51
chrX.155819666.155820696 106 36 123 95 129
chrX.155880909.155881872 406 104 241 235 204
chrX.155888167.155888502 22 10 11 19 12
chrX.155948972.155949499 29 12 31 30 25
chrX.155997370.155997776 42 16 8 10 11
A_DOX_3h A_EPI_24h A_EPI_3h A_MTX_24h A_MTX_3h
chrX.155767114.155768688 79 48 123 100 127
chrX.155819666.155820696 63 55 70 152 160
chrX.155880909.155881872 184 183 226 228 341
chrX.155888167.155888502 10 7 19 42 28
chrX.155948972.155949499 19 8 7 7 13
chrX.155997370.155997776 2 12 8 8 11
A_TRZ_24h A_TRZ_3h A_VEH_24h A_VEH_3h B_DNR_24h
chrX.155767114.155768688 64 81 76 72 39
chrX.155819666.155820696 64 124 54 80 119
chrX.155880909.155881872 262 194 213 188 117
chrX.155888167.155888502 8 12 13 10 26
chrX.155948972.155949499 15 24 13 17 18
chrX.155997370.155997776 12 8 6 14 23
B_DNR_3h B_DOX_24h B_DOX_3h B_EPI_24h B_EPI_3h
chrX.155767114.155768688 39 43 44 37 41
chrX.155819666.155820696 68 127 22 117 71
chrX.155880909.155881872 197 131 119 126 118
chrX.155888167.155888502 17 31 8 27 11
chrX.155948972.155949499 14 25 3 25 15
chrX.155997370.155997776 18 40 10 29 11
B_MTX_24h B_MTX_3h B_TRZ_24h B_TRZ_3h B_VEH_24h
chrX.155767114.155768688 50 46 42 43 20
chrX.155819666.155820696 76 82 60 41 13
chrX.155880909.155881872 112 194 149 99 84
chrX.155888167.155888502 20 25 8 6 2
chrX.155948972.155949499 4 8 8 10 6
chrX.155997370.155997776 29 19 20 15 10
B_VEH_3h C_DNR_24h C_DNR_3h C_DOX_24h C_DOX_3h
chrX.155767114.155768688 38 76 37 65 51
chrX.155819666.155820696 44 180 42 89 50
chrX.155880909.155881872 84 189 113 119 95
chrX.155888167.155888502 17 58 10 20 13
chrX.155948972.155949499 5 33 8 16 9
chrX.155997370.155997776 4 10 14 10 10
C_EPI_24h C_EPI_3h C_MTX_24h C_MTX_3h C_TRZ_24h
chrX.155767114.155768688 44 35 39 58 32
chrX.155819666.155820696 98 57 69 76 43
chrX.155880909.155881872 137 58 76 125 100
chrX.155888167.155888502 19 9 24 11 9
chrX.155948972.155949499 13 7 7 14 11
chrX.155997370.155997776 21 6 7 12 4
C_TRZ_3h C_VEH_24h C_VEH_3h
chrX.155767114.155768688 29 59 68
chrX.155819666.155820696 37 70 50
chrX.155880909.155881872 74 125 106
chrX.155888167.155888502 9 16 6
chrX.155948972.155949499 10 5 8
chrX.155997370.155997776 4 14 9
After verifying removal, I wanted to see the effect on boxplot and histogram of removed regions.
hist(filt_raw_counts_noY, main= "Filtered raw counts across samples",
xlab = "Raw counts",
col=3)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
hist(cpm(filt_raw_counts_noY, log=TRUE),
main = expression("Histogram of Log"[2]*" cpm filtered counts"),
xlab = expression("Log"[2]*" counts-per-million"),
col=3)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
boxplot(cpm(filt_raw_counts_noY, log=TRUE),
main=expression("Boxplot of Log"[2]*" counts-per-million filtered counts"),
col=3,
names=colnames(raw_counts),
las=2, cex.axis=.7)
Version | Author | Date |
---|---|---|
193c36d | reneeisnowhere | 2025-05-06 |
Now to look at the principle components of the data.
##get log2cpm of raw filtered counts
filt4_matrix_lcpm <- cpm(filt_raw_counts_noY , log=TRUE)
## store PRcomp
PCA4_info_filter <- (prcomp(t(filt4_matrix_lcpm), scale. = TRUE))
###make annotation dataframe
annotation_mat <- data.frame(timeset=colnames(filt_raw_counts_noY)) %>%
mutate(sample = timeset) %>%
separate(timeset, into = c("indv","trt","time"), sep= "_") %>%
mutate(time = factor(time, levels = c("3h", "24h"), labels= c("3 hours","24 hours"))) %>%
mutate(trt = factor(trt, levels = c("DOX","EPI", "DNR", "MTX", "TRZ", "VEH")))
### join together for plotting
pca_final_four_anno <- data.frame(annotation_mat, PCA4_info_filter$x)
plotting_var_names <- prop_var_percent(PCA4_info_filter)
pca_final_four_anno %>%
ggplot(.,aes(x = PC1, y = PC2, col=trt, shape=time, group=indv))+
geom_point(size= 5)+
scale_color_manual(values=drug_pal)+
ggrepel::geom_text_repel(aes(label = indv))+
ggtitle(expression("PCA of log"[2]*"(cpm) filtered peak set"))+
theme_bw()+
guides(col="none", size =4)+
labs(y = paste0("PC 2 (",round(plotting_var_names[2],2),"%)")
, x =paste0("PC 1 (",round(plotting_var_names[1],2),"%)"))+
theme(plot.title=element_text(size= 14,hjust = 0.5),
axis.title = element_text(size = 12, color = "black"))
pca_final_four_anno %>%
ggplot(.,aes(x = PC3, y = PC4, col=trt, shape=time, group=indv))+
geom_point(size= 5)+
scale_color_manual(values=drug_pal)+
ggrepel::geom_text_repel(aes(label = indv))+
ggtitle(expression("PCA of log"[2]*"(cpm) filtered peak set"))+
theme_bw()+
guides(col="none", size =4)+
labs(y = paste0("PC 4 (",round(plotting_var_names[4],2),"%)")
, x =paste0("PC 3 (",round(plotting_var_names[3],2),"%)"))+
theme(plot.title=element_text(size= 14,hjust = 0.5),
axis.title = element_text(size = 12, color = "black"))
#### functions needed for the plotting
get_regr_pval <- function(mod) {
# Returns the p-value for the Fstatistic of a linear model
# mod: class lm
stopifnot(class(mod) == "lm")
fstat <- summary(mod)$fstatistic
pval <- 1 - pf(fstat[1], fstat[2], fstat[3])
return(pval)
}
plot_versus_pc <- function(df, pc_num, fac) {
# df: data.frame
# pc_num: numeric, specific PC for plotting
# fac: column name of df for plotting against PC
pc_char <- paste0("PC", pc_num)
# Calculate F-statistic p-value for linear model
pval <- get_regr_pval(lm(df[, pc_char] ~ df[, fac]))
if (is.numeric(df[, f])) {
ggplot(df, aes_string(x = f, y = pc_char)) + geom_point() +
geom_smooth(method = "lm") + labs(title = sprintf("p-val: %.2f", pval))
} else {
ggplot(df, aes_string(x = f, y = pc_char)) + geom_boxplot() +
labs(title = sprintf("p-val: %.2f", pval))
}
}
pca_plot <-
function(df, col_var = NULL, shape_var = NULL, title = "") {
ggplot(df) + geom_point(aes(
x = PC1,
y = PC2,
color = col_var,
shape = shape_var
),
size = 5) +
labs(title = title, x = "PC 1", y = "PC 2") +
scale_color_manual(values = c(
"#8B006D",
"#DF707E",
"#F1B72B",
"#3386DD",
"#707031",
"#41B333"
))
}
facs <- c("indv", "trt", "time")
names(facs) <- c("Individual", "Treatment", "Time")
drug1 <- c("DOX","EPI", "DNR", "MTX", "TRZ", "VEH")##for changing shapes and colors
time <- rep(c("24h", "3h"),24) %>% factor(., levels = c("3h","24h"))
##gglistmaking
for (f in facs) {
# PC1 v PC2
pca_plot(pca_final_four_anno, col_var = f, shape_var = time,
title = names(facs)[which(facs == f)])
print(last_plot())
# Plot f versus PC1 and PC2
f_v_pc1 <- arrangeGrob(plot_versus_pc(pca_final_four_anno, 1, f))
f_v_pc2 <- arrangeGrob(plot_versus_pc(pca_final_four_anno, 2, f))
grid.arrange(f_v_pc1, f_v_pc2, ncol = 2, top = names(facs)[which(facs == f)])
# summary(plot_versus_pc(PCA_info_anno_all, 1, f))
# summary(plot_versus_pc(PCA_info_anno_all, 2, f))
}
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
Version | Author | Date |
---|---|---|
6ad2095 | reneeisnowhere | 2025-05-21 |
To see how this filtered peak set looks compared to other heart data ATAC data, I downloaded ENCFF966JZT from Encode that contains ATAC peaks from the heart: left ventricle of 41 year old female.
Snyder_41peaks <- read.delim("data/other_papers/ENCFF966JZT_bed_Snyder_41peaks.bed",header=TRUE) %>%
GRanges()
peaks_list <- row.names(filt4_matrix_lcpm) %>%
as.data.frame() %>%
separate_wider_delim(., cols =.,
names=c("chr","start","end"),
delim = ".",
cols_remove = FALSE) %>%
dplyr::rename("peakid"=4) %>% GRanges()
heart_overlap <- join_overlap_intersect(Snyder_41peaks,peaks_list)
ol_count <-heart_overlap %>%
as.data.frame %>%
distinct(peakid) %>% tally
s_ol_count <- heart_overlap %>%
as.data.frame %>%
distinct(name) %>% tally
my_peaks <- peaks_list
snyder_peaks <- Snyder_41peaks %>% as.data.frame()
heart_overlap_snyder <- heart_overlap %>% as.data.frame() %>% distinct(name)
heart_overlap_me <- heart_overlap %>% as.data.frame() %>% distinct(peakid)
fit <- euler(c("This study" = 155557 - 66927,
"Snyder study" = 218982 - 137976, # Unique to B
"This study&Snyder study" = min(66927)))
plot(fit, fills = list(fill = c("skyblue", "lightcoral"), alpha = 0.6),
labels = FALSE, edges = TRUE, quantities = TRUE,
main = "Euler diagram between this study and Snyder's study")
Version | Author | Date |
---|---|---|
5e6e462 | reneeisnowhere | 2025-05-07 |
matrixa <- matrix(c(66927,88630,71049,137976), nrow=2)
fisher.test(matrixa)
Fisher's Exact Test for Count Data
data: matrixa
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.446729 1.486445
sample estimates:
odds ratio
1.466404
cor_cpm_counts_veh_pearson <-
filt_raw_counts_noY %>%
as.data.frame() %>%
dplyr::select(A_VEH_3h,A_VEH_24h,
B_VEH_3h,B_VEH_24h,
C_VEH_3h,C_VEH_24h,
D_VEH_3h,D_VEH_24h) %>%
cpm(., log = TRUE) %>%
cor(.,method = "pearson")
cor_cpm_counts_veh_spearman <-
filt_raw_counts_noY %>%
as.data.frame() %>%
dplyr::select(A_VEH_3h,A_VEH_24h,
B_VEH_3h,B_VEH_24h,
C_VEH_3h,C_VEH_24h,
D_VEH_3h,D_VEH_24h) %>%
cpm(., log = TRUE) %>%
cor(.,method = "spearman")
anno_raw_counts_veh <- data.frame(timeset = colnames(cor_cpm_counts_veh_pearson))
counts_corr_mat_veh <-anno_raw_counts_veh %>%
separate(timeset, into = c("indv","trt","time"), sep= "_")
mat_colors_veh <- list(
trt= c("#F1B72B","#8B006D","#DF707E","#3386DD","#707031","#41B333"),
indv=c("#1B9E77", "#D95F02" ,"#7570B3", "#E6AB02"),
time=c("pink", "chocolate4"))
names(mat_colors_veh$trt) <- c("DNR","DOX","EPI","MTX","TRZ","VEH")
names(mat_colors_veh$indv) <- unique(counts_corr_mat_veh$indv)
names(mat_colors_veh$time) <- unique(counts_corr_mat_veh$time)
htanno_full_veh <- ComplexHeatmap::HeatmapAnnotation(df = counts_corr_mat_veh, col = mat_colors_veh)
Heatmap(cor_cpm_counts_veh_pearson,
top_annotation = htanno_full_veh,
column_title = "Pearson log2cpm of VEH samples")
Heatmap(cor_cpm_counts_veh_spearman,
top_annotation = htanno_full_veh,
column_title = "Spearman log2cpm of VEH samples")
sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ChIPpeakAnno_3.40.0
[2] data.table_1.17.0
[3] smplot2_0.2.5
[4] cowplot_1.1.3
[5] ComplexHeatmap_2.22.0
[6] ggrepel_0.9.6
[7] plyranges_1.26.0
[8] ggsignif_0.6.4
[9] eulerr_7.0.2
[10] devtools_2.4.5
[11] usethis_3.1.0
[12] ggpubr_0.6.0
[13] BiocParallel_1.40.0
[14] scales_1.3.0
[15] VennDiagram_1.7.3
[16] futile.logger_1.4.3
[17] gridExtra_2.3
[18] ggfortify_0.4.17
[19] edgeR_4.4.2
[20] limma_3.62.2
[21] rtracklayer_1.66.0
[22] org.Hs.eg.db_3.20.0
[23] TxDb.Hsapiens.UCSC.hg38.knownGene_3.20.0
[24] GenomicFeatures_1.58.0
[25] AnnotationDbi_1.68.0
[26] Biobase_2.66.0
[27] GenomicRanges_1.58.0
[28] GenomeInfoDb_1.42.3
[29] IRanges_2.40.1
[30] S4Vectors_0.44.0
[31] BiocGenerics_0.52.0
[32] ChIPseeker_1.42.1
[33] RColorBrewer_1.1-3
[34] broom_1.0.7
[35] kableExtra_1.4.0
[36] lubridate_1.9.4
[37] forcats_1.0.0
[38] stringr_1.5.1
[39] dplyr_1.1.4
[40] purrr_1.0.4
[41] readr_2.1.5
[42] tidyr_1.3.1
[43] tibble_3.2.1
[44] ggplot2_3.5.1
[45] tidyverse_2.0.0
[46] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] R.methodsS3_1.8.2
[2] progress_1.2.3
[3] urlchecker_1.0.1
[4] nnet_7.3-20
[5] Biostrings_2.74.1
[6] vctrs_0.6.5
[7] ggtangle_0.0.6
[8] digest_0.6.37
[9] png_0.1-8
[10] shape_1.4.6.1
[11] git2r_0.35.0
[12] magick_2.8.5
[13] MASS_7.3-65
[14] reshape2_1.4.4
[15] httpuv_1.6.15
[16] foreach_1.5.2
[17] qvalue_2.38.0
[18] withr_3.0.2
[19] xfun_0.51
[20] ggfun_0.1.8
[21] ellipsis_0.3.2
[22] survival_3.8-3
[23] memoise_2.0.1
[24] profvis_0.4.0
[25] systemfonts_1.2.1
[26] tidytree_0.4.6
[27] zoo_1.8-13
[28] GlobalOptions_0.1.2
[29] gtools_3.9.5
[30] R.oo_1.27.0
[31] Formula_1.2-5
[32] prettyunits_1.2.0
[33] KEGGREST_1.46.0
[34] promises_1.3.2
[35] httr_1.4.7
[36] rstatix_0.7.2
[37] restfulr_0.0.15
[38] ps_1.9.0
[39] rstudioapi_0.17.1
[40] UCSC.utils_1.2.0
[41] miniUI_0.1.1.1
[42] generics_0.1.3
[43] DOSE_4.0.0
[44] base64enc_0.1-3
[45] processx_3.8.6
[46] curl_6.2.1
[47] zlibbioc_1.52.0
[48] polyclip_1.10-7
[49] polylabelr_0.3.0
[50] GenomeInfoDbData_1.2.13
[51] SparseArray_1.6.2
[52] RBGL_1.82.0
[53] xtable_1.8-4
[54] doParallel_1.0.17
[55] evaluate_1.0.3
[56] S4Arrays_1.6.0
[57] BiocFileCache_2.14.0
[58] hms_1.1.3
[59] colorspace_2.1-1
[60] filelock_1.0.3
[61] magrittr_2.0.3
[62] later_1.4.1
[63] ggtree_3.14.0
[64] lattice_0.22-6
[65] getPass_0.2-4
[66] XML_3.99-0.18
[67] matrixStats_1.5.0
[68] Hmisc_5.2-2
[69] pillar_1.10.1
[70] nlme_3.1-167
[71] iterators_1.0.14
[72] pwalign_1.2.0
[73] caTools_1.18.3
[74] compiler_4.4.2
[75] stringi_1.8.4
[76] SummarizedExperiment_1.36.0
[77] GenomicAlignments_1.42.0
[78] plyr_1.8.9
[79] crayon_1.5.3
[80] abind_1.4-8
[81] BiocIO_1.16.0
[82] gridGraphics_0.5-1
[83] locfit_1.5-9.12
[84] bit_4.6.0
[85] fastmatch_1.1-6
[86] whisker_0.4.1
[87] codetools_0.2-20
[88] bslib_0.9.0
[89] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[90] GetoptLong_1.0.5
[91] multtest_2.62.0
[92] mime_0.12
[93] splines_4.4.2
[94] circlize_0.4.16
[95] Rcpp_1.0.14
[96] dbplyr_2.5.0
[97] knitr_1.49
[98] blob_1.2.4
[99] clue_0.3-66
[100] AnnotationFilter_1.30.0
[101] fs_1.6.5
[102] checkmate_2.3.2
[103] pkgbuild_1.4.6
[104] ggplotify_0.1.2
[105] Matrix_1.7-3
[106] callr_3.7.6
[107] statmod_1.5.0
[108] tzdb_0.4.0
[109] svglite_2.1.3
[110] pkgconfig_2.0.3
[111] tools_4.4.2
[112] cachem_1.1.0
[113] RSQLite_2.3.9
[114] viridisLite_0.4.2
[115] DBI_1.2.3
[116] fastmap_1.2.0
[117] rmarkdown_2.29
[118] Rsamtools_2.22.0
[119] sass_0.4.9
[120] patchwork_1.3.0
[121] graph_1.84.1
[122] carData_3.0-5
[123] rpart_4.1.24
[124] farver_2.1.2
[125] yaml_2.3.10
[126] MatrixGenerics_1.18.1
[127] foreign_0.8-88
[128] cli_3.6.4
[129] lifecycle_1.0.4
[130] lambda.r_1.2.4
[131] sessioninfo_1.2.3
[132] backports_1.5.0
[133] timechange_0.3.0
[134] gtable_0.3.6
[135] rjson_0.2.23
[136] parallel_4.4.2
[137] ape_5.8-1
[138] jsonlite_1.9.1
[139] bitops_1.0-9
[140] bit64_4.6.0-1
[141] pwr_1.3-0
[142] yulab.utils_0.2.0
[143] futile.options_1.0.1
[144] jquerylib_0.1.4
[145] GOSemSim_2.32.0
[146] R.utils_2.13.0
[147] lazyeval_0.2.2
[148] shiny_1.10.0
[149] htmltools_0.5.8.1
[150] enrichplot_1.26.6
[151] GO.db_3.20.0
[152] rappdirs_0.3.3
[153] formatR_1.14
[154] ensembldb_2.30.0
[155] glue_1.8.0
[156] httr2_1.1.1
[157] XVector_0.46.0
[158] RCurl_1.98-1.16
[159] InteractionSet_1.34.0
[160] rprojroot_2.0.4
[161] treeio_1.30.0
[162] BSgenome_1.74.0
[163] boot_1.3-31
[164] universalmotif_1.24.2
[165] igraph_2.1.4
[166] R6_2.6.1
[167] gplots_3.2.0
[168] labeling_0.4.3
[169] cluster_2.1.8.1
[170] pkgload_1.4.0
[171] regioneR_1.38.0
[172] aplot_0.2.5
[173] DelayedArray_0.32.0
[174] tidyselect_1.2.1
[175] plotrix_3.8-4
[176] ProtGenerics_1.38.0
[177] htmlTable_2.4.3
[178] xml2_1.3.7
[179] car_3.1-3
[180] munsell_0.5.1
[181] KernSmooth_2.23-26
[182] htmlwidgets_1.6.4
[183] fgsea_1.32.2
[184] biomaRt_2.62.1
[185] rlang_1.1.5
[186] remotes_2.5.0
[187] Cairo_1.6-2