Last updated: 2025-10-06
Checks: 6 1
Knit directory: Paul_CX_2025/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
The R Markdown file has unstaged changes. To know which version of
the R Markdown file created these results, you’ll want to first commit
it to the Git repo. If you’re still working on the analysis, you can
ignore this warning. When you’re finished, you can run
wflow_publish
to commit the R Markdown file and build the
HTML.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20250129)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 64e2d5a. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: 0.1 box.svg
Ignored: Rplot04.svg
Unstaged changes:
Modified: analysis/Revision.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/Revision.Rmd
) and HTML
(docs/Revision.html
) files. If you’ve configured a remote
Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 64e2d5a | sayanpaul01 | 2025-10-06 | Commit |
html | 64e2d5a | sayanpaul01 | 2025-10-06 | Commit |
# 📦 Load Required Libraries
# --- Libraries ---
library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.2
library(tidyr)
Warning: package 'tidyr' was built under R version 4.3.3
library(tibble)
library(ggplot2)
# --- Step 1: Heart Failure Risk Loci (Entrez IDs) ---
entrez_ids <- c(
9709, 8882, 4023, 29959, 5496, 3992, 9415, 5308, 1026, 54437, 79068, 10221,
9031, 1187, 1952, 3705, 84722, 7273, 23293, 155382, 9531, 602, 27258, 84163,
81846, 79933, 56911, 64753, 93210, 1021, 283450, 5998, 57602, 114991, 7073,
3156, 100101267, 22996, 285025, 11080, 11124, 54810, 7531, 27241, 4774, 57794,
463, 91319, 6598, 9640, 2186, 26010, 80816, 571, 88, 51652, 64788, 90523, 2969,
7781, 80777, 10725, 23387, 817, 134728, 8842, 949, 6934, 129787, 10327, 202052,
2318, 5578, 6801, 6311, 10019, 80724, 217, 84909, 388591, 55101, 9839, 27161,
5310, 387119, 4641, 5587, 55188, 222553, 9960, 22852, 10087, 9570, 54497,
200942, 26249, 4137, 375056, 5409, 64116, 8291, 22876, 339855, 4864, 5142,
221692, 55023, 51426, 6146, 84251, 8189, 27332, 57099, 1869, 1112, 23327,
11264, 6001
)
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function (save raw counts to table) ---
fisher_or <- function(df, sample_name) {
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
HF_risk = Entrez_ID %in% entrez_ids
)
a <- sum(df$DEG & df$HF_risk, na.rm = TRUE)
b <- sum(df$DEG & !df$HF_risk, na.rm = TRUE)
c <- sum(!df$DEG & df$HF_risk, na.rm = TRUE)
d <- sum(!df$DEG & !df$HF_risk, na.rm = TRUE)
test <- fisher.test(matrix(c(a, b, c, d), nrow = 2))
tibble(
Sample = sample_name,
DE_Risk = a,
DE_nonRisk = b,
nonDE_Risk = c,
nonDE_nonRisk = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests for all DEG files ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata (Drug, Conc, Time) ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48"))
)
# --- Step 6: Forest plot (clean, like before) ---
ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "Heart Failure Risk Loci Enrichment Across Conditions",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Step 7: Export results with raw numbers for supplement ---
write.csv(results, "data/HF_Cardiotoxicity_ForestPlot_Data.csv", row.names = FALSE)
# --- Libraries ---
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
# --- Step 1: Define DOX cardiotoxicity risk loci (Entrez IDs) ---
entrez_ids <- c(
847, 873, 2064, 2878, 2944, 3038, 4846, 51196, 5880, 6687,
7799, 4292, 5916, 3077, 51310, 9154, 64078, 5244, 10057, 10060,
89845, 56853, 4625, 1573, 79890
)
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function (now saves raw counts too) ---
fisher_or <- function(df, sample_name) {
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
DOXrisk = Entrez_ID %in% entrez_ids
)
a <- sum(df$DEG & df$DOXrisk, na.rm = TRUE)
b <- sum(df$DEG & !df$DOXrisk, na.rm = TRUE)
c <- sum(!df$DEG & df$DOXrisk, na.rm = TRUE)
d <- sum(!df$DEG & !df$DOXrisk, na.rm = TRUE)
test <- fisher.test(matrix(c(a, b, c, d), nrow = 2))
tibble(
Sample = sample_name,
DE_Risk = a,
DE_nonRisk = b,
nonDE_Risk = c,
nonDE_nonRisk = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests for all DEG files ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata (Drug, Conc, Time) ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48"))
)
# --- Step 6: Forest plot (same clean look, 0–20 scale, stars only) ---
ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "DOX Cardiotoxicity Loci Enrichment",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Step 7: Save results with raw counts for supplement ---
write.csv(results, "data/DOX_Cardiotoxicity_ForestPlot_Data.csv", row.names = FALSE)
# --- Libraries ---
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
# --- Step 1: AC Cardiotoxicity Risk Loci (Entrez IDs) ---
Entrez_IDs <- c(
6272, 8029, 11128, 79899, 54477, 121665, 5095, 22863, 57161, 4692,
8214, 23151, 56606, 108, 22999, 56895, 9603, 3181, 4023, 10499,
92949, 4363, 10057, 5243, 5244, 5880, 1535, 2950, 847, 5447,
3038, 3077, 4846, 3958, 23327, 29899, 23155, 80856, 55020, 78996,
23262, 150383, 9620, 79730, 344595, 5066, 6251, 3482, 9588, 339416,
7292, 55157, 87769, 23409, 720, 3107, 54535, 1590, 80059, 7991,
57110, 8803, 323, 54826, 5916, 23371, 283337, 64078, 80010, 1933,
10818, 51020
)
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function (with raw counts) ---
fisher_or <- function(df, sample_name) {
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
AC_risk = Entrez_ID %in% Entrez_IDs
)
a <- sum(df$DEG & df$AC_risk, na.rm = TRUE)
b <- sum(df$DEG & !df$AC_risk, na.rm = TRUE)
c <- sum(!df$DEG & df$AC_risk, na.rm = TRUE)
d <- sum(!df$DEG & !df$AC_risk, na.rm = TRUE)
test <- fisher.test(matrix(c(a, b, c, d), nrow = 2))
tibble(
Sample = sample_name,
DE_Risk = a,
DE_nonRisk = b,
nonDE_Risk = c,
nonDE_nonRisk = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests for all DEG files ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata (Drug, Conc, Time) ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48"))
)
# --- Step 6: Forest plot (scale 0–20, faceted by Time) ---
ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "AC Cardiotoxicity Loci Enrichment Across Conditions",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Step 7: Export results table (with raw numbers + OR) ---
write.csv(results, "data/AC_Cardiotoxicity_ForestPlot_Data.csv", row.names = FALSE)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(org.Hs.eg.db)
Warning: package 'AnnotationDbi' was built under R version 4.3.2
Warning: package 'BiocGenerics' was built under R version 4.3.1
Warning: package 'Biobase' was built under R version 4.3.1
Warning: package 'IRanges' was built under R version 4.3.1
Warning: package 'S4Vectors' was built under R version 4.3.2
library(AnnotationDbi)
# --- Load Heart-Specific Genes ---
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)
heart_genes$Entrez_ID <- mapIds(
org.Hs.eg.db,
keys = heart_genes$Gene,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"
)
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID)
# --- Load CorrMotif Groups ---
# 0.1 µM
prob_1_0.1 <- as.character(read.csv("data/prob_1_0.1.csv")$Entrez_ID)
prob_2_0.1 <- as.character(read.csv("data/prob_2_0.1.csv")$Entrez_ID)
prob_3_0.1 <- as.character(read.csv("data/prob_3_0.1.csv")$Entrez_ID)
# 0.5 µM
prob_1_0.5 <- as.character(read.csv("data/prob_1_0.5.csv")$Entrez_ID)
prob_2_0.5 <- as.character(read.csv("data/prob_2_0.5.csv")$Entrez_ID)
prob_3_0.5 <- as.character(read.csv("data/prob_3_0.5.csv")$Entrez_ID)
prob_4_0.5 <- as.character(read.csv("data/prob_4_0.5.csv")$Entrez_ID)
prob_5_0.5 <- as.character(read.csv("data/prob_5_0.5.csv")$Entrez_ID)
# --- Annotate Groups ---
df_0.1 <- data.frame(Entrez_ID = unique(c(prob_1_0.1, prob_2_0.1, prob_3_0.1))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.1 ~ "Non response (0.1)",
Entrez_ID %in% prob_2_0.1 ~ "CX_DOX_1",
Entrez_ID %in% prob_3_0.1 ~ "DOX_sp_1"
),
Category = ifelse(Entrez_ID %in% heart_entrez_ids,
"Heart-specific Genes", "Non-Heart-specific Genes"),
Concentration = "0.1"
)
df_0.5 <- data.frame(Entrez_ID = unique(c(prob_1_0.5, prob_2_0.5, prob_3_0.5, prob_4_0.5, prob_5_0.5))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.5 ~ "Non response (0.5)",
Entrez_ID %in% prob_2_0.5 ~ "DOX_sp_2",
Entrez_ID %in% prob_3_0.5 ~ "DOX_sp_3",
Entrez_ID %in% prob_4_0.5 ~ "CX_DOX_2",
Entrez_ID %in% prob_5_0.5 ~ "CX_DOX_3"
),
Category = ifelse(Entrez_ID %in% heart_entrez_ids,
"Heart-specific Genes", "Non-Heart-specific Genes"),
Concentration = "0.5"
)
df_combined <- bind_rows(df_0.1, df_0.5)
# --- Fisher Odds Ratio Function ---
fisher_or_group <- function(df_group, df_all, group_name, conc) {
# counts in group
a <- sum(df_group$Category == "Heart-specific Genes")
b <- sum(df_group$Category == "Non-Heart-specific Genes")
# counts in rest of groups at same conc
ref <- df_all %>% filter(Response_Group != group_name)
c <- sum(ref$Category == "Heart-specific Genes")
d <- sum(ref$Category == "Non-Heart-specific Genes")
mat <- matrix(c(a, b, c, d), nrow = 2)
if (any(mat == 0)) mat <- mat + 0.5 # continuity correction
test <- fisher.test(mat)
tibble(
Response_Group = group_name,
Concentration = conc,
a = a, b = b, c = c, d = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Run Fisher tests safely ---
results <- list()
for (conc in unique(df_combined$Concentration)) {
conc_df <- df_combined %>% filter(Concentration == conc)
for (grp in unique(conc_df$Response_Group)) {
grp_df <- conc_df %>% filter(Response_Group == grp)
results[[paste(conc, grp)]] <- fisher_or_group(grp_df, conc_df, grp, conc)
}
}
Warning in fisher.test(mat): 'x' has been rounded to integer: Mean relative
difference: 0.0001415629
results <- bind_rows(results)
# --- Save Results Table ---
write.csv(results, "data/CorrMotif_HeartGenes_OddsRatios.csv", row.names = FALSE)
# --- Forest Plot ---
ggplot(results, aes(x = Response_Group, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Concentration, ncol = 1, scales = "free_y") + # facet by concentration instead of time
ylim(0, 20) +
labs(
title = "Heart-Specific Gene Enrichment Across CorrMotif Groups",
x = "Response Group",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- Correlation ---
cor_test <- cor.test(
merged_24h_0.5$logFC_CX,
merged_24h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2) # scientific notation
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot with fixed label placement ---
ggplot(merged_24h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf, hjust = -0.1, vjust = 1.2, # top-left corner inside panel
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- Correlation ---
cor_test <- cor.test(
merged_48h_0.5$logFC_CX,
merged_48h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2) # scientific notation
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot with fixed label placement ---
ggplot(merged_48h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf, hjust = -0.1, vjust = 1.2, # top-left corner inside panel
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Extract DEGs from DOX (adj.P.Val < 0.05) ---
dox_degs <- DOX_0.5_24$Entrez_ID[DOX_0.5_24$adj.P.Val < 0.05]
# --- Merge CX & DOX, filter for DOX DEGs only ---
merged_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_24h_0.5 <- merged_24h_0.5[merged_24h_0.5$Entrez_ID %in% dox_degs, ]
# --- Correlation ---
cor_test <- cor.test(
merged_24h_0.5$logFC_CX,
merged_24h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_24h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.6, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM, DOX DEGs only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Extract DEGs from DOX (adj.P.Val < 0.05) ---
dox_degs <- DOX_0.5_48$Entrez_ID[DOX_0.5_48$adj.P.Val < 0.05]
# --- Merge CX & DOX, filter for DOX DEGs only ---
merged_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_48h_0.5 <- merged_48h_0.5[merged_48h_0.5$Entrez_ID %in% dox_degs, ]
# --- Correlation ---
cor_test <- cor.test(
merged_48h_0.5$logFC_CX,
merged_48h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_48h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.6, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM, DOX DEGs only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
library(dplyr)
# --- DDR gene set ---
entrez_ids <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# --- Load Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Merge and filter for DDR genes ---
merged_ddr_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_ddr_24h_0.5 <- merged_ddr_24h_0.5 %>%
filter(Entrez_ID %in% entrez_ids)
# --- Correlation ---
cor_test <- cor.test(
merged_ddr_24h_0.5$logFC_CX,
merged_ddr_24h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_ddr_24h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.8, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM, DDR genes only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
library(dplyr)
# --- DDR gene set (Entrez IDs) ---
entrez_ids <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# --- Load Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Merge and filter for DDR genes ---
merged_ddr_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_ddr_48h_0.5 <- merged_ddr_48h_0.5 %>%
filter(Entrez_ID %in% entrez_ids)
# --- Correlation ---
cor_test <- cor.test(
merged_ddr_48h_0.5$logFC_CX,
merged_ddr_48h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_ddr_48h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.8, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM, DDR genes only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(dplyr)
library(ggplot2)
library(org.Hs.eg.db)
library(AnnotationDbi)
# --- Load Heart-Specific Genes ---
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)
heart_genes$Entrez_ID <- mapIds(
org.Hs.eg.db,
keys = heart_genes$Gene,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"
)
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID) %>% as.character()
# --- Load DEG Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Merge and filter for heart-specific genes ---
merged_heart_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_heart_24h_0.5 <- merged_heart_24h_0.5 %>%
filter(Entrez_ID %in% heart_entrez_ids)
# --- Correlation ---
cor_test <- cor.test(
merged_heart_24h_0.5$logFC_CX,
merged_heart_24h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_heart_24h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.8, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM, Heart-specific genes only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(dplyr)
library(ggplot2)
library(org.Hs.eg.db)
library(AnnotationDbi)
# --- Load Heart-Specific Genes ---
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)
heart_genes$Entrez_ID <- mapIds(
org.Hs.eg.db,
keys = heart_genes$Gene,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"
)
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID) %>% as.character()
# --- Load DEG Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Merge and filter for heart-specific genes ---
merged_heart_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
merged_heart_48h_0.5 <- merged_heart_48h_0.5 %>%
filter(Entrez_ID %in% heart_entrez_ids)
# --- Correlation ---
cor_test <- cor.test(
merged_heart_48h_0.5$logFC_CX,
merged_heart_48h_0.5$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot ---
ggplot(merged_heart_48h_0.5, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.8, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM, Heart-specific genes only)",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# ------------------ Load Libraries ------------------
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.3.2
Warning: package 'readr' was built under R version 4.3.3
Warning: package 'purrr' was built under R version 4.3.3
Warning: package 'stringr' was built under R version 4.3.2
Warning: package 'lubridate' was built under R version 4.3.3
library(org.Hs.eg.db)
# ------------------ Define DDR Gene Set ------------------
ddr_entrez <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# ------------------ Load DEG Data ------------------
deg_list <- list(
"CX_0.1_3" = read.csv("data/DEGs/Toptable_CX_0.1_3.csv"),
"CX_0.1_24" = read.csv("data/DEGs/Toptable_CX_0.1_24.csv"),
"CX_0.1_48" = read.csv("data/DEGs/Toptable_CX_0.1_48.csv"),
"CX_0.5_3" = read.csv("data/DEGs/Toptable_CX_0.5_3.csv"),
"CX_0.5_24" = read.csv("data/DEGs/Toptable_CX_0.5_24.csv"),
"CX_0.5_48" = read.csv("data/DEGs/Toptable_CX_0.5_48.csv"),
"DOX_0.1_3" = read.csv("data/DEGs/Toptable_DOX_0.1_3.csv"),
"DOX_0.1_24" = read.csv("data/DEGs/Toptable_DOX_0.1_24.csv"),
"DOX_0.1_48" = read.csv("data/DEGs/Toptable_DOX_0.1_48.csv"),
"DOX_0.5_3" = read.csv("data/DEGs/Toptable_DOX_0.5_3.csv"),
"DOX_0.5_24" = read.csv("data/DEGs/Toptable_DOX_0.5_24.csv"),
"DOX_0.5_48" = read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
)
# ------------------ Compute Proportions ------------------
proportion_data <- map_dfr(names(deg_list), function(sample) {
sig_df <- deg_list[[sample]] %>% filter(adj.P.Val < 0.05)
sample_degs <- unique(as.character(sig_df$Entrez_ID))
yes <- sum(sample_degs %in% ddr_entrez)
no <- length(sample_degs) - yes
data.frame(
Sample = sample,
Category = c("DDR_DEG", "Non_DDR_DEG"),
Count = c(yes, no)
)
}) %>%
group_by(Sample) %>%
mutate(Proportion = Count / sum(Count) * 100)
# ------------------ Set Order ------------------
proportion_data$Sample <- factor(proportion_data$Sample, levels = c(
"CX_0.1_3", "CX_0.1_24", "CX_0.1_48",
"CX_0.5_3", "CX_0.5_24", "CX_0.5_48",
"DOX_0.1_3", "DOX_0.1_24", "DOX_0.1_48",
"DOX_0.5_3", "DOX_0.5_24", "DOX_0.5_48"
))
proportion_data$Category <- factor(proportion_data$Category,
levels = c("DDR_DEG", "Non_DDR_DEG"))
# ------------------ Statistical Testing (CX vs DOX) ------------------
matched_pairs <- list(
"0.1_3" = c("CX_0.1_3", "DOX_0.1_3"),
"0.1_24" = c("CX_0.1_24", "DOX_0.1_24"),
"0.1_48" = c("CX_0.1_48", "DOX_0.1_48"),
"0.5_3" = c("CX_0.5_3", "DOX_0.5_3"),
"0.5_24" = c("CX_0.5_24", "DOX_0.5_24"),
"0.5_48" = c("CX_0.5_48", "DOX_0.5_48")
)
test_results <- map_dfr(names(matched_pairs), function(label) {
cx <- matched_pairs[[label]][1]
dox <- matched_pairs[[label]][2]
cx_df <- deg_list[[cx]] %>% filter(adj.P.Val < 0.05)
dox_df <- deg_list[[dox]] %>% filter(adj.P.Val < 0.05)
a1 <- sum(cx_df$Entrez_ID %in% ddr_entrez)
b1 <- nrow(cx_df) - a1
a2 <- sum(dox_df$Entrez_ID %in% ddr_entrez)
b2 <- nrow(dox_df) - a2
mat <- matrix(c(a1, b1, a2, b2), nrow = 2, byrow = TRUE)
test <- if (min(mat) < 5) fisher.test(mat) else chisq.test(mat)
data.frame(
Comparison = label,
CX_sample = cx,
DOX_sample = dox,
a1 = a1, b1 = b1, a2 = a2, b2 = b2,
P_value = test$p.value,
Stars = ifelse(test$p.value < 0.05, "*", "")
)
})
Warning in chisq.test(mat): Chi-squared approximation may be incorrect
Warning in chisq.test(mat): Chi-squared approximation may be incorrect
Warning in chisq.test(mat): Chi-squared approximation may be incorrect
Warning in chisq.test(mat): Chi-squared approximation may be incorrect
write.csv(test_results, "data/DDR_DEG_overlap_stats.csv", row.names = FALSE)
# ------------------ Plot ------------------
ggplot(proportion_data, aes(x = Sample, y = Proportion, fill = Category)) +
geom_bar(stat = "identity") +
geom_text(
data = test_results %>% filter(Stars != "") %>%
mutate(Sample = CX_sample, y_pos = 102),
aes(x = Sample, y = y_pos, label = Stars),
inherit.aes = FALSE, size = 6
) +
scale_fill_manual(values = c("DDR_DEG" = "#e41a1c", "Non_DDR_DEG" = "#377eb8")) +
scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 105)) +
labs(
title = "Proportion of DDR genes in CX and DOX DEG Set",
x = "Sample",
y = "Percentage of DEGs",
fill = "Category"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.title = element_blank(),
panel.border = element_rect(color = "black", fill = NA, linewidth = 1)
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Libraries ---
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
# --- Step 1: DDR Gene Set (Entrez IDs) ---
ddr_entrez <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function (save raw counts too) ---
fisher_or <- function(df, sample_name) {
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
DDR_gene = Entrez_ID %in% ddr_entrez
)
a <- sum(df$DEG & df$DDR_gene, na.rm = TRUE) # DDR genes among DEGs
b <- sum(df$DEG & !df$DDR_gene, na.rm = TRUE) # Non-DDR among DEGs
c <- sum(!df$DEG & df$DDR_gene, na.rm = TRUE) # DDR genes not DEGs
d <- sum(!df$DEG & !df$DDR_gene, na.rm = TRUE) # Non-DDR not DEGs
test <- fisher.test(matrix(c(a, b, c, d), nrow = 2))
tibble(
Sample = sample_name,
DE_DDR = a,
DE_nonDDR = b,
nonDE_DDR = c,
nonDE_nonDDR = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests for all DEG files ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata (Drug, Conc, Time) ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48")) # enforce order
)
# --- Step 6: Forest Plot ---
ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(color = "black", size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) + # fixed OR scale
labs(
title = "DDR Gene Enrichment Across Conditions",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Step 7: Export results (with raw numbers) ---
write.csv(results, "data/DDR_ForestPlot_Data.csv", row.names = FALSE)
# ------------------ Load Libraries ------------------
library(tidyverse)
library(org.Hs.eg.db)
library(AnnotationDbi)
# ------------------ Load Heart-Specific Genes ------------------
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)
heart_genes$Entrez_ID <- mapIds(
org.Hs.eg.db,
keys = heart_genes$Gene,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"
)
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID) %>% as.character()
# ------------------ Load DEG Data ------------------
deg_list <- list(
"CX_0.1_3" = read.csv("data/DEGs/Toptable_CX_0.1_3.csv"),
"CX_0.1_24" = read.csv("data/DEGs/Toptable_CX_0.1_24.csv"),
"CX_0.1_48" = read.csv("data/DEGs/Toptable_CX_0.1_48.csv"),
"CX_0.5_3" = read.csv("data/DEGs/Toptable_CX_0.5_3.csv"),
"CX_0.5_24" = read.csv("data/DEGs/Toptable_CX_0.5_24.csv"),
"CX_0.5_48" = read.csv("data/DEGs/Toptable_CX_0.5_48.csv"),
"DOX_0.1_3" = read.csv("data/DEGs/Toptable_DOX_0.1_3.csv"),
"DOX_0.1_24" = read.csv("data/DEGs/Toptable_DOX_0.1_24.csv"),
"DOX_0.1_48" = read.csv("data/DEGs/Toptable_DOX_0.1_48.csv"),
"DOX_0.5_3" = read.csv("data/DEGs/Toptable_DOX_0.5_3.csv"),
"DOX_0.5_24" = read.csv("data/DEGs/Toptable_DOX_0.5_24.csv"),
"DOX_0.5_48" = read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
)
# ------------------ Compute Proportions ------------------
proportion_data <- map_dfr(names(deg_list), function(sample) {
sig_df <- deg_list[[sample]] %>% filter(adj.P.Val < 0.05)
sample_degs <- unique(as.character(sig_df$Entrez_ID))
yes <- sum(sample_degs %in% heart_entrez_ids)
no <- length(sample_degs) - yes
data.frame(
Sample = sample,
Category = c("Heart_DEG", "Non_Heart_DEG"),
Count = c(yes, no)
)
}) %>%
group_by(Sample) %>%
mutate(Proportion = Count / sum(Count) * 100)
# ------------------ Set Order ------------------
proportion_data$Sample <- factor(proportion_data$Sample, levels = c(
"CX_0.1_3", "CX_0.1_24", "CX_0.1_48",
"CX_0.5_3", "CX_0.5_24", "CX_0.5_48",
"DOX_0.1_3", "DOX_0.1_24", "DOX_0.1_48",
"DOX_0.5_3", "DOX_0.5_24", "DOX_0.5_48"
))
proportion_data$Category <- factor(proportion_data$Category,
levels = c("Heart_DEG", "Non_Heart_DEG"))
# ------------------ Statistical Testing (CX vs DOX) ------------------
matched_pairs <- list(
"0.1_3" = c("CX_0.1_3", "DOX_0.1_3"),
"0.1_24" = c("CX_0.1_24", "DOX_0.1_24"),
"0.1_48" = c("CX_0.1_48", "DOX_0.1_48"),
"0.5_3" = c("CX_0.5_3", "DOX_0.5_3"),
"0.5_24" = c("CX_0.5_24", "DOX_0.5_24"),
"0.5_48" = c("CX_0.5_48", "DOX_0.5_48")
)
test_results <- map_dfr(names(matched_pairs), function(label) {
cx <- matched_pairs[[label]][1]
dox <- matched_pairs[[label]][2]
cx_df <- deg_list[[cx]] %>% filter(adj.P.Val < 0.05)
dox_df <- deg_list[[dox]] %>% filter(adj.P.Val < 0.05)
a1 <- sum(cx_df$Entrez_ID %in% heart_entrez_ids)
b1 <- nrow(cx_df) - a1
a2 <- sum(dox_df$Entrez_ID %in% heart_entrez_ids)
b2 <- nrow(dox_df) - a2
mat <- matrix(c(a1, b1, a2, b2), nrow = 2, byrow = TRUE)
test <- if (min(mat) < 5) fisher.test(mat) else chisq.test(mat)
data.frame(
Comparison = label,
CX_sample = cx,
DOX_sample = dox,
P_value = test$p.value,
Stars = ifelse(test$p.value < 0.05, "*", "")
)
})
write.csv(test_results, "data/Heart_DEG_overlap_stats.csv", row.names = FALSE)
# ------------------ Plot ------------------
ggplot(proportion_data, aes(x = Sample, y = Proportion, fill = Category)) +
geom_bar(stat = "identity") +
geom_text(
data = test_results %>% filter(Stars != "") %>%
mutate(Sample = CX_sample, y_pos = 102),
aes(x = Sample, y = y_pos, label = Stars),
inherit.aes = FALSE, size = 6
) +
scale_fill_manual(values = c("Heart_DEG" = "#e41a1c", "Non_Heart_DEG" = "#377eb8")) +
scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 105)) +
labs(
title = "Proportion of Heart-Specific DEGs in CX and DOX treated samples",
x = "Sample",
y = "Percentage of DEGs",
fill = "Category"
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.title = element_blank(),
panel.border = element_rect(color = "black", fill = NA, linewidth = 1)
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Libraries ---
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(org.Hs.eg.db)
library(AnnotationDbi)
# --- Step 1: Load Heart-Specific Genes ---
heart_genes <- read.csv("data/Human_Heart_Genes.csv", stringsAsFactors = FALSE)
heart_genes$Entrez_ID <- mapIds(
org.Hs.eg.db,
keys = heart_genes$Gene,
column = "ENTREZID",
keytype = "SYMBOL",
multiVals = "first"
)
heart_entrez_ids <- na.omit(heart_genes$Entrez_ID) %>% as.character()
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function ---
fisher_or <- function(df, sample_name) {
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
HeartGene = Entrez_ID %in% heart_entrez_ids
)
a <- sum(df$DEG & df$HeartGene, na.rm = TRUE)
b <- sum(df$DEG & !df$HeartGene, na.rm = TRUE)
c <- sum(!df$DEG & df$HeartGene, na.rm = TRUE)
d <- sum(!df$DEG & !df$HeartGene, na.rm = TRUE)
test <- fisher.test(matrix(c(a, b, c, d), nrow = 2))
tibble(
Sample = sample_name,
DE_Heart = a,
DE_nonHeart = b,
nonDE_Heart = c,
nonDE_nonHeart = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48"))
)
# --- Step 6: Forest plot ---
ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "Heart-Specific Gene Enrichment Across Conditions",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Step 7: Export results with raw counts ---
write.csv(results, "data/HeartSpecific_ForestPlot_Data.csv", row.names = FALSE)
# ----------------- Load Required Libraries -----------------
library(dplyr)
library(ggplot2)
# ----------------- Load DDR Genes -----------------
ddr_entrez <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# ----------------- Load CorrMotif Groups -----------------
# 0.1 µM
prob_1_0.1 <- as.character(read.csv("data/prob_1_0.1.csv")$Entrez_ID)
prob_2_0.1 <- as.character(read.csv("data/prob_2_0.1.csv")$Entrez_ID)
prob_3_0.1 <- as.character(read.csv("data/prob_3_0.1.csv")$Entrez_ID)
# 0.5 µM
prob_1_0.5 <- as.character(read.csv("data/prob_1_0.5.csv")$Entrez_ID)
prob_2_0.5 <- as.character(read.csv("data/prob_2_0.5.csv")$Entrez_ID)
prob_3_0.5 <- as.character(read.csv("data/prob_3_0.5.csv")$Entrez_ID)
prob_4_0.5 <- as.character(read.csv("data/prob_4_0.5.csv")$Entrez_ID)
prob_5_0.5 <- as.character(read.csv("data/prob_5_0.5.csv")$Entrez_ID)
# ----------------- Annotate CorrMotif Groups -----------------
df_0.1 <- data.frame(Entrez_ID = unique(c(prob_1_0.1, prob_2_0.1, prob_3_0.1))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.1 ~ "Non response (0.1)",
Entrez_ID %in% prob_2_0.1 ~ "CX_DOX_1",
Entrez_ID %in% prob_3_0.1 ~ "DOX_sp_1"
),
Category = ifelse(Entrez_ID %in% ddr_entrez, "DDR Genes", "Non-DDR Genes"),
Concentration = "0.1"
)
df_0.5 <- data.frame(Entrez_ID = unique(c(prob_1_0.5, prob_2_0.5, prob_3_0.5, prob_4_0.5, prob_5_0.5))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.5 ~ "Non response (0.5)",
Entrez_ID %in% prob_2_0.5 ~ "DOX_sp_2",
Entrez_ID %in% prob_3_0.5 ~ "DOX_sp_3",
Entrez_ID %in% prob_4_0.5 ~ "CX_DOX_2",
Entrez_ID %in% prob_5_0.5 ~ "CX_DOX_3"
),
Category = ifelse(Entrez_ID %in% ddr_entrez, "DDR Genes", "Non-DDR Genes"),
Concentration = "0.5"
)
# ----------------- Combine Data -----------------
df_combined <- bind_rows(df_0.1, df_0.5)
# ----------------- Calculate Proportions -----------------
proportion_data <- df_combined %>%
group_by(Concentration, Response_Group, Category) %>%
summarise(Count = n(), .groups = "drop") %>%
group_by(Concentration, Response_Group) %>%
mutate(Percentage = Count / sum(Count) * 100)
# ----------------- Fisher's Exact Test -----------------
run_fisher_test <- function(df, ref_group) {
ref_counts <- df %>%
filter(Response_Group == ref_group) %>%
dplyr::select(Category, Count) %>%
{setNames(.$Count, .$Category)}
df %>%
filter(Response_Group != ref_group) %>%
group_by(Response_Group) %>%
summarise(
p_value = {
group_counts <- Count[Category %in% c("DDR Genes", "Non-DDR Genes")]
if (length(group_counts) < 2) group_counts <- c(group_counts, 0)
contingency_table <- matrix(c(
group_counts[1], group_counts[2],
ref_counts["DDR Genes"], ref_counts["Non-DDR Genes"]
), nrow = 2)
fisher.test(contingency_table)$p.value
},
.groups = "drop"
) %>%
mutate(Significance = ifelse(!is.na(p_value) & p_value < 0.05, "*", ""))
}
# Run Fisher’s test for each concentration
fisher_0.1 <- run_fisher_test(proportion_data %>% filter(Concentration == "0.1"), "Non response (0.1)")
fisher_0.5 <- run_fisher_test(proportion_data %>% filter(Concentration == "0.5"), "Non response (0.5)")
fisher_all <- bind_rows(
fisher_0.1 %>% mutate(Concentration = "0.1"),
fisher_0.5 %>% mutate(Concentration = "0.5")
)
# ----------------- Merge Fisher Results -----------------
proportion_data <- proportion_data %>%
left_join(fisher_all, by = c("Concentration", "Response_Group"))
# ----------------- Reorder Groups -----------------
proportion_data$Response_Group <- factor(proportion_data$Response_Group, levels = c(
"Non response (0.1)", "CX_DOX_1", "DOX_sp_1",
"Non response (0.5)", "DOX_sp_2", "DOX_sp_3", "CX_DOX_2", "CX_DOX_3"
))
# ----------------- Star Labels -----------------
label_data <- proportion_data %>%
group_by(Concentration, Response_Group) %>%
summarise(Significance = dplyr::first(Significance), .groups = "drop") %>%
filter(!is.na(Significance)) %>%
mutate(y_pos = 100)
# ----------------- Plot -----------------
ggplot(proportion_data, aes(x = Response_Group, y = Percentage, fill = Category)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(
data = label_data,
aes(x = Response_Group, y = y_pos, label = Significance),
inherit.aes = FALSE,
size = 5,
color = "black"
) +
facet_wrap(~ Concentration, scales = "free_x") +
scale_fill_manual(values = c("DDR Genes" = "#e41a1c", "Non-DDR Genes" = "#377eb8")) +
scale_y_continuous(limits = c(0, 110), expand = c(0, 0)) +
labs(
title = "DDR Gene Proportions Across CorrMotif Groups",
x = "Response Groups",
y = "Percentage of Genes",
fill = "Gene Category"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 11, angle = 45, hjust = 1),
axis.text.y = element_text(size = 12),
legend.title = element_text(size = 13),
legend.text = element_text(size = 12),
strip.text = element_text(size = 14, face = "bold"),
panel.border = element_rect(color = "black", fill = NA, linewidth = 1.2),
panel.spacing = unit(1.2, "lines")
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
# --- Load DDR Gene Entrez IDs ---
ddr_entrez <- c(
10111, 1017, 1019, 1020, 1021, 1026, 1027, 10912, 11011, 1111,
11200, 1385, 1643, 1647, 1869, 207, 2177, 25, 27113, 27244,
3014, 317, 355, 4193, 4292, 4361, 4609, 4616, 4683, 472, 50484,
5366, 5371, 54205, 545, 55367, 5591, 581, 5810, 5883, 5884,
5888, 5893, 5925, 595, 5981, 6118, 637, 672, 7157, 7799,
8243, 836, 841, 84126, 842, 8795, 891, 894, 896, 898,
9133, 9134, 983, 9874, 993, 995, 5916
) %>% as.character()
# --- Load CorrMotif Groups ---
# 0.1 µM
prob_1_0.1 <- as.character(read.csv("data/prob_1_0.1.csv")$Entrez_ID)
prob_2_0.1 <- as.character(read.csv("data/prob_2_0.1.csv")$Entrez_ID)
prob_3_0.1 <- as.character(read.csv("data/prob_3_0.1.csv")$Entrez_ID)
# 0.5 µM
prob_1_0.5 <- as.character(read.csv("data/prob_1_0.5.csv")$Entrez_ID)
prob_2_0.5 <- as.character(read.csv("data/prob_2_0.5.csv")$Entrez_ID)
prob_3_0.5 <- as.character(read.csv("data/prob_3_0.5.csv")$Entrez_ID)
prob_4_0.5 <- as.character(read.csv("data/prob_4_0.5.csv")$Entrez_ID)
prob_5_0.5 <- as.character(read.csv("data/prob_5_0.5.csv")$Entrez_ID)
# --- Annotate Groups ---
df_0.1 <- data.frame(Entrez_ID = unique(c(prob_1_0.1, prob_2_0.1, prob_3_0.1))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.1 ~ "Non response (0.1)",
Entrez_ID %in% prob_2_0.1 ~ "CX_DOX_1",
Entrez_ID %in% prob_3_0.1 ~ "DOX_sp_1"
),
Category = ifelse(Entrez_ID %in% ddr_entrez, "DDR Genes", "Non-DDR Genes"),
Concentration = "0.1"
)
df_0.5 <- data.frame(Entrez_ID = unique(c(prob_1_0.5, prob_2_0.5, prob_3_0.5, prob_4_0.5, prob_5_0.5))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.5 ~ "Non response (0.5)",
Entrez_ID %in% prob_2_0.5 ~ "DOX_sp_2",
Entrez_ID %in% prob_3_0.5 ~ "DOX_sp_3",
Entrez_ID %in% prob_4_0.5 ~ "CX_DOX_2",
Entrez_ID %in% prob_5_0.5 ~ "CX_DOX_3"
),
Category = ifelse(Entrez_ID %in% ddr_entrez, "DDR Genes", "Non-DDR Genes"),
Concentration = "0.5"
)
df_combined <- bind_rows(df_0.1, df_0.5)
# --- Fisher Odds Ratio Function ---
fisher_or_group <- function(df_group, df_all, group_name, conc) {
# counts in group
a <- sum(df_group$Category == "DDR Genes")
b <- sum(df_group$Category == "Non-DDR Genes")
# counts in rest of groups at same conc
ref <- df_all %>% filter(Response_Group != group_name)
c <- sum(ref$Category == "DDR Genes")
d <- sum(ref$Category == "Non-DDR Genes")
mat <- matrix(c(a, b, c, d), nrow = 2)
if (any(mat == 0)) mat <- mat + 0.5 # continuity correction
test <- fisher.test(mat)
tibble(
Response_Group = group_name,
Concentration = conc,
a = a, b = b, c = c, d = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Run Fisher tests ---
results <- list()
for (conc in unique(df_combined$Concentration)) {
conc_df <- df_combined %>% filter(Concentration == conc)
for (grp in unique(conc_df$Response_Group)) {
grp_df <- conc_df %>% filter(Response_Group == grp)
results[[paste(conc, grp)]] <- fisher_or_group(grp_df, conc_df, grp, conc)
}
}
results <- bind_rows(results)
# --- Save Results ---
write.csv(results, "data/CorrMotif_DDRGenes_OddsRatios.csv", row.names = FALSE)
# --- Forest Plot ---
ggplot(results, aes(x = Response_Group, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Concentration, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "DDR Gene Enrichment Across CorrMotif Groups",
x = "Response Group",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# 📦 Load Required Libraries
library(ggplot2)
library(dplyr)
library(tidyr)
# ✅ Step 1: Load DEG files
deg_list <- list(
"CX_0.1_3" = read.csv("data/DEGs/Toptable_CX_0.1_3.csv"),
"CX_0.1_24" = read.csv("data/DEGs/Toptable_CX_0.1_24.csv"),
"CX_0.1_48" = read.csv("data/DEGs/Toptable_CX_0.1_48.csv"),
"CX_0.5_3" = read.csv("data/DEGs/Toptable_CX_0.5_3.csv"),
"CX_0.5_24" = read.csv("data/DEGs/Toptable_CX_0.5_24.csv"),
"CX_0.5_48" = read.csv("data/DEGs/Toptable_CX_0.5_48.csv"),
"DOX_0.1_3" = read.csv("data/DEGs/Toptable_DOX_0.1_3.csv"),
"DOX_0.1_24" = read.csv("data/DEGs/Toptable_DOX_0.1_24.csv"),
"DOX_0.1_48" = read.csv("data/DEGs/Toptable_DOX_0.1_48.csv"),
"DOX_0.5_3" = read.csv("data/DEGs/Toptable_DOX_0.5_3.csv"),
"DOX_0.5_24" = read.csv("data/DEGs/Toptable_DOX_0.5_24.csv"),
"DOX_0.5_48" = read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
)
# ✅ Step 2: Load TS data
ts_data <- read.csv("data/TS.csv") %>%
mutate(Entrez_ID = as.character(Entrez_ID))
# ✅ Step 3: Merge DEGs with TS scores
ts_long <- bind_rows(lapply(names(deg_list), function(name) {
df <- deg_list[[name]] %>% filter(adj.P.Val < 0.05)
df$Sample <- name
df
})) %>%
mutate(Entrez_ID = as.character(Entrez_ID)) %>%
left_join(ts_data, by = "Entrez_ID") %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Heart_Ventricle = as.numeric(Heart_Ventricle),
Conc = factor(Conc, levels = c("0.1","0.5")),
Time = factor(Time, levels = c("3","24","48")) # ✅ enforce order
) %>%
filter(!is.na(Heart_Ventricle))
Warning in left_join(., ts_data, by = "Entrez_ID"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1904 of `x` matches multiple rows in `y`.
ℹ Row 6237 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `Heart_Ventricle = as.numeric(Heart_Ventricle)`.
Caused by warning:
! NAs introduced by coercion
# ✅ Step 4: Define CX vs DOX comparison table
comparison_table <- expand.grid(
Conc = c("0.1","0.5"),
Time = c("3","24","48"),
stringsAsFactors = FALSE
)
# ✅ Step 5: Run Wilcoxon tests
star_df <- lapply(1:nrow(comparison_table), function(i) {
conc <- comparison_table$Conc[i]
time <- comparison_table$Time[i]
cx_vals <- ts_long$Heart_Ventricle[ts_long$Drug == "CX" & ts_long$Conc == conc & ts_long$Time == time]
dox_vals <- ts_long$Heart_Ventricle[ts_long$Drug == "DOX" & ts_long$Conc == conc & ts_long$Time == time]
cx_vals <- cx_vals[!is.na(cx_vals)]
dox_vals <- dox_vals[!is.na(dox_vals)]
if (length(cx_vals) > 2 & length(dox_vals) > 2) {
test_result <- wilcox.test(cx_vals, dox_vals)
pval <- test_result$p.value
if (pval < 0.05) {
label <- case_when(
pval < 0.001 ~ "***",
pval < 0.01 ~ "**",
TRUE ~ "*"
)
y_pos <- max(c(cx_vals, dox_vals), na.rm = TRUE) + 0.4
return(data.frame(
Conc = conc, Time = factor(time, levels = c("3","24","48")), # ✅ enforce order here too
label = label,
y_pos = y_pos
))
}
}
return(NULL)
}) %>% bind_rows()
# ✅ Step 6: Define colors
drug_colors <- c("CX" = "#1f78b4", "DOX" = "#e31a1c")
# ✅ Step 7: Violin + Boxplot
p <- ggplot(ts_long, aes(x = Drug, y = Heart_Ventricle, fill = Drug)) +
geom_violin(trim = FALSE, scale = "width", color = "black", alpha = 0.8) +
geom_boxplot(width = 0.2, color = "black", fill = "white", outlier.shape = NA) +
facet_grid(Conc ~ Time, scales = "free_x") +
scale_fill_manual(values = drug_colors) +
scale_y_continuous(
limits = c(-10, max(ts_long$Heart_Ventricle, na.rm = TRUE) + 2),
breaks = seq(-10, 25, 5)
) +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
geom_text(
data = star_df,
aes(x = 1.5, y = y_pos, label = label),
inherit.aes = FALSE,
size = 6,
fontface = "bold"
) +
labs(
title = "Violin-Boxplot: Heart Ventricle TS (CX vs DOX DEGs)",
y = "Tissue specificity score (Heart Ventricle)",
x = ""
) +
theme_bw() +
theme(
axis.text.x = element_text(size = 12),
strip.text = element_text(size = 12, face = "bold"),
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
legend.position = "none"
)
print(p)
Warning: Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Warning: Computation failed in `stat_ydensity()`.
Caused by error in `$<-.data.frame`:
! replacement has 1 row, data has 0
Warning: Groups with fewer than two datapoints have been dropped.
ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- DOX-Cardiotoxicity Signature Entrez IDs ---
Entrez_IDs <- c(
847, 873, 2064, 2878, 2944, 3038, 4846, 51196, 5880, 6687,
7799, 4292, 5916, 3077, 51310, 9154, 64078, 5244, 10057, 10060,
89845, 56853, 4625, 1573, 79890
)
# --- Subset merged data ---
cardio_set <- merged_24h_0.5[merged_24h_0.5$Entrez_ID %in% Entrez_IDs, ]
# --- Correlation ---
cor_test <- cor.test(
cardio_set$logFC_CX,
cardio_set$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot (same style as 48 hr plot) ---
ggplot(cardio_set, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM)",
subtitle = "DOX-Cardiotoxicity Gene Set",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf, hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
# --- Load Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- DOX-Cardiotoxicity Signature Entrez IDs ---
Entrez_IDs <- c(
847, 873, 2064, 2878, 2944, 3038, 4846, 51196, 5880, 6687,
7799, 4292, 5916, 3077, 51310, 9154, 64078, 5244, 10057, 10060,
89845, 56853, 4625, 1573, 79890
)
# --- Subset merged data ---
cardio_set_48 <- merged_48h_0.5[merged_48h_0.5$Entrez_ID %in% Entrez_IDs, ]
# --- Correlation ---
cor_test <- cor.test(
cardio_set_48$logFC_CX,
cardio_set_48$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot (same publication style) ---
ggplot(cardio_set_48, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM)",
subtitle = "DOX-Cardiotoxicity Gene Set",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf, hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
library(dplyr)
# --- Load Data ---
CX_0.5_24 <- read.csv("data/DEGs/Toptable_CX_0.5_24.csv")
DOX_0.5_24 <- read.csv("data/DEGs/Toptable_DOX_0.5_24.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_24$Entrez_ID <- as.character(CX_0.5_24$Entrez_ID)
DOX_0.5_24$Entrez_ID <- as.character(DOX_0.5_24$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_24h_0.5 <- merge(
CX_0.5_24, DOX_0.5_24,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- Heart Failure Gene Set ---
entrez_ids <- c(
9709, 8882, 4023, 29959, 5496, 3992, 9415, 5308, 1026, 54437, 79068, 10221,
9031, 1187, 1952, 3705, 84722, 7273, 23293, 155382, 9531, 602, 27258, 84163,
81846, 79933, 56911, 64753, 93210, 1021, 283450, 5998, 57602, 114991, 7073,
3156, 100101267, 22996, 285025, 11080, 11124, 54810, 7531, 27241, 4774, 57794,
463, 91319, 6598, 9640, 2186, 26010, 80816, 571, 88, 51652, 64788, 90523, 2969,
7781, 80777, 10725, 23387, 817, 134728, 8842, 949, 6934, 129787, 10327, 202052,
2318, 5578, 6801, 6311, 10019, 80724, 217, 84909, 388591, 55101, 9839, 27161,
5310, 387119, 4641, 5587, 55188, 222553, 9960, 22852, 10087, 9570, 54497,
200942, 26249, 4137, 375056, 5409, 64116, 8291, 22876, 339855, 4864, 5142,
221692, 55023, 51426, 6146, 84251, 8189, 27332, 57099, 1869, 1112, 23327,
11264, 6001
) %>% as.character()
# --- Subset merged data ---
heart_set <- merged_24h_0.5[merged_24h_0.5$Entrez_ID %in% entrez_ids, ]
# --- Correlation ---
cor_test <- cor.test(
heart_set$logFC_CX,
heart_set$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot (same style) ---
ggplot(heart_set, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (24 hr, 0.5 µM)",
subtitle = "Heart Failure Gene Set",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
library(dplyr)
# --- Load Data ---
CX_0.5_48 <- read.csv("data/DEGs/Toptable_CX_0.5_48.csv")
DOX_0.5_48 <- read.csv("data/DEGs/Toptable_DOX_0.5_48.csv")
# --- Ensure Entrez_ID is character ---
CX_0.5_48$Entrez_ID <- as.character(CX_0.5_48$Entrez_ID)
DOX_0.5_48$Entrez_ID <- as.character(DOX_0.5_48$Entrez_ID)
# --- Merge on Entrez_ID ---
merged_48h_0.5 <- merge(
CX_0.5_48, DOX_0.5_48,
by = "Entrez_ID", suffixes = c("_CX", "_DOX")
)
# --- Heart Failure Gene Set ---
entrez_ids <- c(
9709, 8882, 4023, 29959, 5496, 3992, 9415, 5308, 1026, 54437, 79068, 10221,
9031, 1187, 1952, 3705, 84722, 7273, 23293, 155382, 9531, 602, 27258, 84163,
81846, 79933, 56911, 64753, 93210, 1021, 283450, 5998, 57602, 114991, 7073,
3156, 100101267, 22996, 285025, 11080, 11124, 54810, 7531, 27241, 4774, 57794,
463, 91319, 6598, 9640, 2186, 26010, 80816, 571, 88, 51652, 64788, 90523, 2969,
7781, 80777, 10725, 23387, 817, 134728, 8842, 949, 6934, 129787, 10327, 202052,
2318, 5578, 6801, 6311, 10019, 80724, 217, 84909, 388591, 55101, 9839, 27161,
5310, 387119, 4641, 5587, 55188, 222553, 9960, 22852, 10087, 9570, 54497,
200942, 26249, 4137, 375056, 5409, 64116, 8291, 22876, 339855, 4864, 5142,
221692, 55023, 51426, 6146, 84251, 8189, 27332, 57099, 1869, 1112, 23327,
11264, 6001
) %>% as.character()
# --- Subset merged data ---
heart_set_48 <- merged_48h_0.5[merged_48h_0.5$Entrez_ID %in% entrez_ids, ]
# --- Correlation ---
cor_test <- cor.test(
heart_set_48$logFC_CX,
heart_set_48$logFC_DOX,
method = "pearson"
)
r_val <- round(cor_test$estimate, 3)
p_val <- formatC(cor_test$p.value, format = "e", digits = 2)
label_text <- paste0("r = ", r_val, "\n", "p = ", p_val)
# --- Scatter Plot (matching style) ---
ggplot(heart_set_48, aes(x = logFC_CX, y = logFC_DOX)) +
geom_point(alpha = 0.5, color = "black") +
geom_smooth(method = "lm", color = "blue", se = FALSE) +
labs(
title = "CX vs DOX logFC (48 hr, 0.5 µM)",
subtitle = "Heart Failure Gene Set",
x = "logFC (CX vs VEH)",
y = "logFC (DOX vs VEH)"
) +
theme_minimal(base_size = 14) +
annotate(
"text",
x = -Inf, y = Inf,
hjust = -0.1, vjust = 1.2,
label = label_text,
size = 5, fontface = "bold"
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# --- Libraries ---
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(readr)
# --- Step 1: Load AF Risk Loci (Entrez IDs) ---
# Option 1: If you saved it earlier
# entrez_ids <- readRDS("data/AF_Entrez_IDs.rds")
# Option 2: Or load from mapped CSV (ensure column name is Entrez_ID)
AF_genes <- read.csv("data/Cleaned_AF_genes_with_Entrez.csv", stringsAsFactors = FALSE)
entrez_ids <- AF_genes$Entrez_ID %>% as.character() %>% unique() %>% na.omit()
cat("✅ Loaded", length(entrez_ids), "AF Entrez IDs.\n")
✅ Loaded 615 AF Entrez IDs.
# --- Step 2: Load DEG tables ---
deg_files <- list.files("data/DEGs/", pattern = "Toptable_.*\\.csv", full.names = TRUE)
if (length(deg_files) == 0) stop("❌ No DEG files found in data/DEGs/")
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("Toptable_|\\.csv", "", basename(deg_files))
# --- Step 3: Fisher test function ---
fisher_or <- function(df, sample_name) {
if (!"Entrez_ID" %in% colnames(df)) return(NULL)
df <- df %>%
mutate(
DEG = adj.P.Val < 0.05,
AF_risk = Entrez_ID %in% entrez_ids
)
a <- sum(df$DEG & df$AF_risk, na.rm = TRUE)
b <- sum(df$DEG & !df$AF_risk, na.rm = TRUE)
c <- sum(!df$DEG & df$AF_risk, na.rm = TRUE)
d <- sum(!df$DEG & !df$AF_risk, na.rm = TRUE)
# Fisher’s exact test
test <- tryCatch(
fisher.test(matrix(c(a, b, c, d), nrow = 2)),
error = function(e) fisher.test(matrix(c(a, b, c, d), nrow = 2), simulate.p.value = TRUE)
)
tibble(
Sample = sample_name,
DE_Risk = a,
DE_nonRisk = b,
nonDE_Risk = c,
nonDE_nonRisk = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Step 4: Run Fisher tests for all DEG tables ---
results <- bind_rows(mapply(fisher_or, deg_list, names(deg_list), SIMPLIFY = FALSE))
# --- Step 5: Extract metadata (Drug, Conc, Time) ---
results <- results %>%
separate(Sample, into = c("Drug","Conc","Time"), sep = "_") %>%
mutate(
Label = paste(Drug, Conc, sep = "_"),
Time = factor(Time, levels = c("3","24","48"))
)
# --- Step 6: Forest plot ---
p <- ggplot(results, aes(x = Label, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Time, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "Atrial Fibrillation Risk Loci Enrichment Across Conditions",
x = "Condition (Drug_Conc)",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
# --- Step 7: Display & save ---
print(p)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_segment()`).
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
#ggsave("plots/ForestPlot_AF_RiskLoci.png", plot = p, width = 7, height = 9, dpi = 300)
# --- Step 8: Export raw enrichment data ---
write.csv(results, "data/AF_Cardiotoxicity_ForestPlot_Data.csv", row.names = FALSE)
cat("✅ AF enrichment analysis complete. Results saved to data/AF_Cardiotoxicity_ForestPlot_Data.csv\n")
✅ AF enrichment analysis complete. Results saved to data/AF_Cardiotoxicity_ForestPlot_Data.csv
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
# --- Load AF Gene Entrez IDs ---
# Option 1: from mapped CSV
AF_genes <- read.csv("data/Cleaned_AF_genes_with_Entrez.csv", stringsAsFactors = FALSE)
af_entrez <- AF_genes$Entrez_ID %>% as.character() %>% unique() %>% na.omit()
# Option 2: if saved as RDS earlier
# af_entrez <- readRDS("data/AF_Entrez_IDs.rds")
cat("✅ Loaded", length(af_entrez), "AF Entrez IDs.\n")
✅ Loaded 615 AF Entrez IDs.
# --- Load CorrMotif Groups ---
# 0.1 µM
prob_1_0.1 <- as.character(read.csv("data/prob_1_0.1.csv")$Entrez_ID)
prob_2_0.1 <- as.character(read.csv("data/prob_2_0.1.csv")$Entrez_ID)
prob_3_0.1 <- as.character(read.csv("data/prob_3_0.1.csv")$Entrez_ID)
# 0.5 µM
prob_1_0.5 <- as.character(read.csv("data/prob_1_0.5.csv")$Entrez_ID)
prob_2_0.5 <- as.character(read.csv("data/prob_2_0.5.csv")$Entrez_ID)
prob_3_0.5 <- as.character(read.csv("data/prob_3_0.5.csv")$Entrez_ID)
prob_4_0.5 <- as.character(read.csv("data/prob_4_0.5.csv")$Entrez_ID)
prob_5_0.5 <- as.character(read.csv("data/prob_5_0.5.csv")$Entrez_ID)
# --- Annotate Groups ---
df_0.1 <- data.frame(Entrez_ID = unique(c(prob_1_0.1, prob_2_0.1, prob_3_0.1))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.1 ~ "Non response (0.1)",
Entrez_ID %in% prob_2_0.1 ~ "CX_DOX_1",
Entrez_ID %in% prob_3_0.1 ~ "DOX_sp_1"
),
Category = ifelse(Entrez_ID %in% af_entrez, "AF Genes", "Non-AF Genes"),
Concentration = "0.1"
)
df_0.5 <- data.frame(Entrez_ID = unique(c(prob_1_0.5, prob_2_0.5, prob_3_0.5, prob_4_0.5, prob_5_0.5))) %>%
mutate(
Response_Group = case_when(
Entrez_ID %in% prob_1_0.5 ~ "Non response (0.5)",
Entrez_ID %in% prob_2_0.5 ~ "DOX_sp_2",
Entrez_ID %in% prob_3_0.5 ~ "DOX_sp_3",
Entrez_ID %in% prob_4_0.5 ~ "CX_DOX_2",
Entrez_ID %in% prob_5_0.5 ~ "CX_DOX_3"
),
Category = ifelse(Entrez_ID %in% af_entrez, "AF Genes", "Non-AF Genes"),
Concentration = "0.5"
)
# --- Combine all CorrMotif groups ---
df_combined <- bind_rows(df_0.1, df_0.5)
# --- Fisher Odds Ratio Function ---
fisher_or_group <- function(df_group, df_all, group_name, conc) {
# counts in group
a <- sum(df_group$Category == "AF Genes")
b <- sum(df_group$Category == "Non-AF Genes")
# counts in rest of groups at same concentration
ref <- df_all %>% filter(Response_Group != group_name)
c <- sum(ref$Category == "AF Genes")
d <- sum(ref$Category == "Non-AF Genes")
mat <- matrix(c(a, b, c, d), nrow = 2)
if (any(mat == 0)) mat <- mat + 0.5 # continuity correction
test <- fisher.test(mat)
tibble(
Response_Group = group_name,
Concentration = conc,
a = a, b = b, c = c, d = d,
OR = unname(test$estimate),
Lower_CI = test$conf.int[1],
Upper_CI = test$conf.int[2],
Pval = test$p.value,
Stars = case_when(
test$p.value < 0.001 ~ "***",
test$p.value < 0.01 ~ "**",
test$p.value < 0.05 ~ "*",
TRUE ~ ""
)
)
}
# --- Run Fisher tests across groups ---
results <- list()
for (conc in unique(df_combined$Concentration)) {
conc_df <- df_combined %>% filter(Concentration == conc)
for (grp in unique(conc_df$Response_Group)) {
grp_df <- conc_df %>% filter(Response_Group == grp)
results[[paste(conc, grp)]] <- fisher_or_group(grp_df, conc_df, grp, conc)
}
}
results <- bind_rows(results)
# --- Save Results ---
write.csv(results, "data/CorrMotif_AFGenes_OddsRatios.csv", row.names = FALSE)
cat("✅ Results saved to data/CorrMotif_AFGenes_OddsRatios.csv\n")
✅ Results saved to data/CorrMotif_AFGenes_OddsRatios.csv
# --- Forest Plot ---
p <- ggplot(results, aes(x = Response_Group, y = OR, ymin = Lower_CI, ymax = Upper_CI)) +
geom_pointrange(aes(color = Pval < 0.05), size = 0.9) +
geom_text(aes(label = Stars), hjust = -0.5, size = 5) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
scale_color_manual(values = c("TRUE" = "blue", "FALSE" = "grey40")) +
coord_flip() +
facet_wrap(~Concentration, ncol = 1, scales = "free_y") +
ylim(0, 20) +
labs(
title = "AF Gene Enrichment Across CorrMotif Groups",
x = "Response Group",
y = "Odds Ratio (95% CI)"
) +
theme_classic(base_size = 14) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 12)
)
p
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
library(ggplot2)
library(dplyr)
## === iPSC-CM LD50 (µM) ===
ipsc_vals <- c(6.4, 10.9, 10.0)
ipsc_df <- data.frame(
CellLine = c("iPSC-CM_1", "iPSC-CM_2", "iPSC-CM_3"),
LD50 = ipsc_vals,
Group = "iPSC-CM"
)
## === Sanij et al. (GI50 nM → µM, renamed as LD50) ===
sanij_data <- data.frame(
CellLine = c("OVCAR3","FUOV1","COLO704","TOV112D","CH1","A2780","RMGI","2008","SKOV3","OVCA432",
"OAW42","ES2","EFO27","IGROV1","Caov3","Colo720E","MCAS","OVCAR5","OVCAR4","OVCAR8",
"JHOC5","JHOM1","KURAMOCHI","JHOC7","JHOS3","RMGII","OAW28","OV90"),
Mean_nM = c(12,25.33,31.5,49.67,73.5,82.33,143,106.7,301.3,206.7,
226.7,181.3,417,422.3,409.3,413.5,357.6,100.5,613.3,859,
731,861.2,1082,3267,3177,3370,3850,5170)
) %>%
dplyr::mutate(
LD50 = Mean_nM / 1000, # convert to µM
Group = ifelse(Mean_nM < 2000, "Sensitive", "Resistant")
) %>%
dplyr::select(CellLine, LD50, Group)
## === Cornelison et al. (IC50 µM, renamed as LD50) ===
cornelison <- data.frame(
CellLine = c("A2780ip2","A2780cp20","SKOV3ip1","SKOV3TRip2",
"HeyA8","HeyA8MDR","PEO1","PEO4","OV90","OVCAR3",
"OAW42","OVSAHO","CAOV3","COV362","ES2","HIO-180"),
LD50 = c(0.032,0.146,0.595,0.109,
2.05,0.191,0.525,1.20,1.06,2.00,
0.300,0.636,0.955,0.645,0.294,5.49),
Group = c(rep("Sensitive",15),"Resistant")
)
## === Combine all datasets ===
all_data <- dplyr::bind_rows(sanij_data, cornelison, ipsc_df)
## === Plot (linear scale) ===
ggplot(all_data, aes(x = Group, y = LD50, fill = Group)) +
geom_boxplot(outlier.shape = NA, alpha = 0.6, width = 0.6,
color = "black", linewidth = 0.4) +
geom_point(aes(color = Group),
position = position_nudge(x = 0),
size = 3, alpha = 0.9, show.legend = FALSE) +
scale_fill_manual(values = c("Sensitive"="#4A90E2",
"Resistant"="#B0B0B0",
"iPSC-CM"="#E74C3C")) +
scale_color_manual(values = c("Sensitive"="#4A90E2",
"Resistant"="#B0B0B0",
"iPSC-CM"="#E74C3C")) +
labs(
title = expression("CX-5461 Sensitivity (LD"[50]*", µM)"),
x = "",
y = expression("LD"[50]*" (µM)")
) +
theme_minimal(base_size = 14) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text = element_text(color = "black")
)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
suppressPackageStartupMessages({
library(dplyr)
library(readr)
library(stringr)
library(tibble)
library(tidyr)
})
# ------------------------------------------------------------
# INPUT FILES
# ------------------------------------------------------------
log2cpm_file <- "data/Feature_count_Matrix_Log2CPM_filtered.csv"
meta_file <- "data/Metadata.csv"
# ------------------------------------------------------------
# LOAD DATA
# ------------------------------------------------------------
log2cpm <- read.csv(log2cpm_file, check.names = FALSE)
colnames(log2cpm) <- gsub("^X", "", trimws(colnames(log2cpm)))
Metadata <- read.csv(meta_file, check.names = FALSE)
Metadata$Sample_name <- gsub("^([0-9]+)\\.([0-9]+)", "\\1-\\2", Metadata$Sample_name)
id_col <- grep("^Entrez|ENTREZ", colnames(log2cpm), value = TRUE)[1]
rownames(log2cpm) <- log2cpm[[id_col]]
# ------------------------------------------------------------
# FUNCTION: plot elbow for one condition
# ------------------------------------------------------------
plot_elbow_one <- function(drug_label = "CX-5461", conc = 0.1, time_h = 3) {
deg_prefix <- ifelse(drug_label == "CX-5461", "CX", "DOX")
deg_file <- sprintf("data/DEGs/Toptable_%s_%s_%s.csv", deg_prefix, conc, time_h)
if (!file.exists(deg_file)) {
message("Missing DEG file: ", deg_file)
return()
}
deg <- read.csv(deg_file)
sig <- deg %>% filter(adj.P.Val < 0.05) %>% pull(Entrez_ID) %>% unique()
if (length(sig) == 0) {
message("No DEGs in ", deg_file)
return()
}
samples <- Metadata %>%
filter(Drug %in% c(drug_label, "VEH"),
Conc. == conc,
Time == time_h) %>%
pull(Sample_name)
matched <- intersect(samples, colnames(log2cpm))
if (length(matched) == 0) {
message("No matching samples for ", drug_label, " ", conc, "µM ", time_h, "h")
return()
}
M <- log2cpm[rownames(log2cpm) %in% sig, matched, drop = FALSE]
M <- as.matrix(M)
if (nrow(M) < 3) {
message("Too few genes (<3) for elbow plot: ", drug_label, " ", conc, "µM ", time_h, "h")
return()
}
# ---- K-means elbow ----
set.seed(123)
max_k <- min(10, nrow(M) - 1)
wss <- sapply(1:max_k, function(k)
kmeans(M, centers = k, nstart = 10, iter.max = 50)$tot.withinss)
plot(1:max_k, wss, type = "b", pch = 19, col = "darkblue",
xlab = "Number of Clusters (K)",
ylab = "Total Within-Cluster SS",
main = sprintf("Elbow — %s (%.1f µM, %dh) vs VEH", drug_label, conc, time_h))
}
# ------------------------------------------------------------
# EXECUTE — Generate all 12 elbow plots sequentially
# ------------------------------------------------------------
# CX-5461
for (cc in c(0.1, 0.5)) {
for (tt in c(3, 24, 48)) {
message(">>> CX-5461 ", cc, " µM ", tt, " h")
plot_elbow_one("CX-5461", cc, tt)
dev.flush(); Sys.sleep(1)
}
}
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
# DOX
for (cc in c(0.1, 0.5)) {
for (tt in c(3, 24, 48)) {
message(">>> DOX ", cc, " µM ", tt, " h")
plot_elbow_one("DOX", cc, tt)
dev.flush(); Sys.sleep(1)
}
}
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Warning: Quick-TRANSfer stage steps exceeded maximum (= 491400)
Warning: Quick-TRANSfer stage steps exceeded maximum (= 491400)
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
suppressPackageStartupMessages({
library(ComplexHeatmap)
library(circlize)
library(dplyr)
library(readr)
library(tibble)
library(tidyr)
library(RColorBrewer)
library(stringr)
})
Warning: package 'ComplexHeatmap' was built under R version 4.3.1
Warning: package 'circlize' was built under R version 4.3.3
# ------------------------------------------------------------
# INPUT FILES
# ------------------------------------------------------------
log2cpm_file <- "data/Feature_count_Matrix_Log2CPM_filtered.csv"
meta_file <- "data/Metadata.csv"
# ------------------------------------------------------------
# LOAD DATA
# ------------------------------------------------------------
log2cpm <- read.csv(log2cpm_file, check.names = FALSE)
colnames(log2cpm) <- gsub("^X", "", trimws(colnames(log2cpm)))
Metadata <- read.csv(meta_file, check.names = FALSE)
Metadata$Sample_name <- gsub("^([0-9]+)\\.([0-9]+)", "\\1-\\2", Metadata$Sample_name)
id_col <- grep("^Entrez|ENTREZ", colnames(log2cpm), value = TRUE)[1]
rownames(log2cpm) <- log2cpm[[id_col]]
# ------------------------------------------------------------
# COLORS & MAPS
# ------------------------------------------------------------
rename_map <- c("17-3"="1","75-1"="2","84-1"="3","90-1"="4","87-1"="5","78-1"="6")
indv_colors <- setNames(brewer.pal(6, "Dark2"), as.character(1:6))
trt_colors <- c("CX-5461"="#8B006D", "DOX"="#DF707E", "VEH"="#41B333")
conc_colors <- c("0.1"="lightblue", "0.5"="lightcoral")
time_colors <- c("3"="purple", "24"="pink", "48"="tomato3")
cluster_map <- list(
"CX_0.1_3"=NA, "CX_0.1_24"=3, "CX_0.1_48"=3,
"CX_0.5_3"=NA, "CX_0.5_24"=3, "CX_0.5_48"=3,
"DOX_0.1_3"=NA, "DOX_0.1_24"=3, "DOX_0.1_48"=3,
"DOX_0.5_3"=3, "DOX_0.5_24"=3, "DOX_0.5_48"=3
)
# ------------------------------------------------------------
# FUNCTION — DRAW AND SHOW HEATMAP
# ------------------------------------------------------------
plot_heatmap_show <- function(drug_label="CX-5461", conc=0.1, time_h=3) {
deg_prefix <- ifelse(drug_label == "CX-5461", "CX", "DOX")
deg_file <- sprintf("data/DEGs/Toptable_%s_%s_%s.csv", deg_prefix, conc, time_h)
key <- sprintf("%s_%s_%s", deg_prefix, conc, time_h)
row_k <- cluster_map[[key]]
if (!file.exists(deg_file)) {
message("❌ Missing DEG file: ", deg_file)
return()
}
deg <- read.csv(deg_file)
sig <- deg %>% filter(adj.P.Val < 0.05) %>% pull(Entrez_ID) %>% unique()
if (length(sig) == 0) {
message("⚠️ No DEGs for ", key)
return()
}
samples <- Metadata %>%
filter(Drug %in% c(drug_label,"VEH"),
Conc. == conc,
Time == time_h) %>%
pull(Sample_name)
matched <- intersect(samples, colnames(log2cpm))
if (length(matched) == 0) {
message("⚠️ No matching samples for ", key)
return()
}
M <- log2cpm[rownames(log2cpm) %in% sig, matched, drop=FALSE]
if (nrow(M) == 0) {
message("⚠️ Empty matrix for ", key)
return()
}
M <- as.matrix(M)
colnames(M) <- str_replace_all(colnames(M), rename_map)
# --- Annotation
annot_df <- tibble(timeset = colnames(M)) %>%
mutate(sample = timeset) %>%
separate(timeset, into=c("indv","trt","conc","time"), sep="_", remove=FALSE) %>%
mutate(trt=factor(trt,levels=c(drug_label,"VEH")),
indv=factor(indv,levels=as.character(1:6)),
conc=factor(conc,levels=c("0.1","0.5")),
time=factor(time,levels=c("3","24","48"))) %>%
column_to_rownames("sample")
top_ha <- HeatmapAnnotation(
df=annot_df[,c("trt","indv","time")],
col=list(trt=trt_colors, indv=indv_colors, time=time_colors)
)
# --- Color scale
q <- quantile(as.numeric(M), c(0.01, 0.5, 0.99), na.rm=TRUE)
col_fun <- colorRamp2(c(q[1], q[2], q[3]), c("blue","white","red"))
# --- Plotting
message("✅ Displaying: ", key,
ifelse(is.na(row_k), " (no clusters)", paste0(" (k=", row_k, ")")))
if (is.na(row_k)) {
ht <- Heatmap(
M,
top_annotation=top_ha,
show_column_names=TRUE,
show_row_names=FALSE,
cluster_columns=TRUE,
cluster_rows=FALSE,
col=col_fun,
use_raster=TRUE,
raster_device="png",
raster_quality=2,
column_title=sprintf("%s (%.1f µM ; %dh) vs VEH — log2CPM",
drug_label, conc, time_h),
row_title=NULL
)
} else {
ht <- Heatmap(
M,
top_annotation=top_ha,
show_column_names=TRUE,
show_row_names=FALSE,
cluster_columns=TRUE,
cluster_rows=FALSE,
row_km=row_k,
col=col_fun,
use_raster=TRUE,
raster_device="png",
raster_quality=2,
column_title=sprintf("%s (%.1f µM ; %dh) vs VEH — log2CPM",
drug_label, conc, time_h),
row_title="Cluster"
)
}
draw(ht)
}
# ------------------------------------------------------------
# RUN ALL COMBOS
# ------------------------------------------------------------
for (drug in c("CX-5461","DOX")) {
for (cc in c(0.1,0.5)) {
for (tt in c(3,24,48)) {
message("\n>>> ", drug, " ", cc, " µM ", tt, " h")
plot_heatmap_show(drug, cc, tt)
}
}
}
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
Version | Author | Date |
---|---|---|
64e2d5a | sayanpaul01 | 2025-10-06 |
message("\n🎨 All heatmaps displayed sequentially.")
# ======================================================
# 🧬 PART 1: G-QUADRUPLEX GO TERM ENRICHMENT HEATMAP
# ======================================================
library(tidyverse)
library(data.table)
library(ComplexHeatmap)
library(circlize)
library(grid)
# 📁 Define GO enrichment input files
go_files <- list(
"CX_0.1_3" = "data/BP/Combined_Terms/GO_All_CX_0.1_3.csv",
"CX_0.1_24" = "data/BP/Combined_Terms/GO_All_CX_0.1_24.csv",
"CX_0.1_48" = "data/BP/Combined_Terms/GO_All_CX_0.1_48.csv",
"CX_0.5_3" = "data/BP/Combined_Terms/GO_All_CX_0.5_3.csv",
"CX_0.5_24" = "data/BP/Combined_Terms/GO_All_CX_0.5_24.csv",
"CX_0.5_48" = "data/BP/Combined_Terms/GO_All_CX_0.5_48.csv",
"DOX_0.1_3" = "data/BP/Combined_Terms/GO_All_DOX_0.1_3.csv",
"DOX_0.1_24" = "data/BP/Combined_Terms/GO_All_DOX_0.1_24.csv",
"DOX_0.1_48" = "data/BP/Combined_Terms/GO_All_DOX_0.1_48.csv",
"DOX_0.5_3" = "data/BP/Combined_Terms/GO_All_DOX_0.5_3.csv",
"DOX_0.5_24" = "data/BP/Combined_Terms/GO_All_DOX_0.5_24.csv",
"DOX_0.5_48" = "data/BP/Combined_Terms/GO_All_DOX_0.5_48.csv"
)
# 🧬 Define G4-related GO terms
g4_terms <- list(
"GO:0051880" = "G-quadruplex DNA binding",
"GO:0002151" = "G-quadruplex RNA binding",
"GO:0071919" = "G-quadruplex DNA formation",
"GO:1905493" = "Regulation of G-quadruplex DNA binding",
"GO:0160225" = "G-quadruplex unwinding activity",
"GO:0061849" = "Telomeric G-quadruplex DNA binding"
)
# 🔍 Function to extract p-values
get_go_values <- function(file_path) {
df <- fread(file_path)[, .(ID, p.adjust, pvalue)]
log10pval <- sapply(names(g4_terms), function(go_id) {
row <- df[ID == go_id]
if (nrow(row) == 0) return(NA)
return(-log10(row$pvalue))
})
adj_p <- sapply(names(g4_terms), function(go_id) {
row <- df[ID == go_id]
if (nrow(row) == 0) return(NA)
return(row$p.adjust)
})
return(list(log10pval = log10pval, adj_p = adj_p))
}
# 🧊 Generate matrices
matrix_list <- map(go_files, get_go_values)
go_matrix <- sapply(matrix_list, function(x) x$log10pval)
padj_matrix <- sapply(matrix_list, function(x) x$adj_p)
rownames(go_matrix) <- unname(unlist(g4_terms))
rownames(padj_matrix) <- rownames(go_matrix)
# 🎨 Define color scale
breaks <- seq(0, 20, by = 2.5)
palette <- colorRampPalette(c("white", "#fde0dd", "#fa9fb5", "#f768a1", "#c51b8a", "#7a0177", "#49006a"))(length(breaks))
col_fun <- colorRamp2(breaks, palette)
# 🔥 Draw heatmap
ht <- Heatmap(
go_matrix,
name = "-log10(p value)",
col = col_fun,
na_col = "white",
rect_gp = gpar(col = "black", lwd = 0.5),
cluster_rows = FALSE,
cluster_columns = FALSE,
row_names_gp = gpar(fontsize = 9),
column_names_gp = gpar(fontsize = 9),
column_names_rot = 45,
row_names_max_width = max_text_width(rownames(go_matrix), gp = gpar(fontsize = 9)),
cell_fun = function(j, i, x, y, width, height, fill) {
adj_p <- padj_matrix[i, j]
if (!is.na(adj_p) && adj_p < 0.05) grid.text("*", x, y, gp = gpar(fontsize = 12))
},
heatmap_legend_param = list(
title = "-log10(p value)",
at = breaks,
labels = as.character(breaks),
legend_width = unit(5, "cm"),
direction = "horizontal",
title_gp = gpar(fontsize = 10, fontface = "bold"),
labels_gp = gpar(fontsize = 9)
)
)
draw(ht, heatmap_legend_side = "top")
# ======================================================
# 🧬 PART 2: G4-BINDING GENE IDENTIFICATION
# ======================================================
library(org.Hs.eg.db)
library(AnnotationDbi)
library(dplyr)
g4_genes <- AnnotationDbi::select(
org.Hs.eg.db,
keys = "GO:0051880",
columns = c("SYMBOL", "ENTREZID"),
keytype = "GOALL"
) %>% distinct(SYMBOL, ENTREZID)
g4_genes_vec <- unique(g4_genes$SYMBOL)
g4_genes_vec
[1] "BLM" "DDX11" "NME2" "WRN" "CNBP" "LONP1" "RAD50" "XRN1" "PIF1"
[10] "DHX36"
# ======================================================
# 📊 PART 3: LOG2CPM BOXPLOTS FOR G4 GENES
# ======================================================
library(ggplot2)
library(tidyr)
library(clusterProfiler)
library(stringr)
# 📂 Load feature count matrix
boxplot1 <- read.csv("data/Feature_count_Matrix_Log2CPM_filtered.csv") %>% as.data.frame()
colnames(boxplot1) <- trimws(gsub("^X", "", colnames(boxplot1)))
# 🧬 Use G4-binding genes
g4_genes <- g4_genes_vec
# 📄 Load DEG result tables
deg_files <- list.files("data/DEGs", pattern = "Toptable_.*\\.csv", full.names = TRUE)
deg_list <- lapply(deg_files, read.csv)
names(deg_list) <- gsub("data/DEGs/Toptable_|\\.csv", "", deg_files)
# 🧮 Significance checker
is_significant <- function(gene, drug, conc, timepoint) {
condition <- paste(drug, conc, timepoint, sep = "_")
if (!condition %in% names(deg_list)) return(FALSE)
toptable <- deg_list[[condition]]
gene_entrez <- boxplot1$ENTREZID[boxplot1$SYMBOL == gene]
if (length(gene_entrez) == 0) return(FALSE)
any(gene_entrez %in% toptable$Entrez_ID[toptable$adj.P.Val < 0.05])
}
# 🧩 Prepare long-format data per gene
process_gene_data <- function(gene) {
gene_data <- boxplot1 %>% filter(SYMBOL == gene)
long_data <- gene_data %>%
pivot_longer(cols = -c(ENTREZID, SYMBOL, GENENAME), names_to = "Sample", values_to = "log2CPM") %>%
mutate(
Indv = case_when(
grepl("75.1", Sample) ~ "1",
grepl("78.1", Sample) ~ "2",
grepl("87.1", Sample) ~ "3",
grepl("17.3", Sample) ~ "4",
grepl("84.1", Sample) ~ "5",
grepl("90.1", Sample) ~ "6",
TRUE ~ NA_character_
),
Drug = case_when(
grepl("CX.5461", Sample) ~ "CX",
grepl("DOX", Sample) ~ "DOX",
grepl("VEH", Sample) ~ "VEH",
TRUE ~ NA_character_
),
Conc. = case_when(
grepl("_0.1_", Sample) ~ "0.1",
grepl("_0.5_", Sample) ~ "0.5",
TRUE ~ NA_character_
),
Timepoint = case_when(
str_detect(Sample, "_3$") ~ "3",
str_detect(Sample, "_24$") ~ "24",
str_detect(Sample, "_48$") ~ "48",
TRUE ~ NA_character_
),
Condition = paste(Drug, Conc., Timepoint, sep = "_")
)
long_data$Condition <- factor(long_data$Condition, levels = c(
"CX_0.1_3","CX_0.1_24","CX_0.1_48","CX_0.5_3","CX_0.5_24","CX_0.5_48",
"DOX_0.1_3","DOX_0.1_24","DOX_0.1_48","DOX_0.5_3","DOX_0.5_24","DOX_0.5_48",
"VEH_0.1_3","VEH_0.1_24","VEH_0.1_48","VEH_0.5_3","VEH_0.5_24","VEH_0.5_48"
))
significance_labels <- long_data %>%
distinct(Drug, Conc., Timepoint, Condition) %>%
rowwise() %>%
mutate(
max_log2CPM = max(long_data$log2CPM[long_data$Condition == Condition], na.rm = TRUE),
Significance = ifelse(is_significant(gene, Drug, Conc., Timepoint), "*", "")
) %>%
filter(Significance != "") %>% ungroup()
list(long_data = long_data, significance_labels = significance_labels)
}
# 🎨 Plot per gene
for (gene in g4_genes) {
data_info <- process_gene_data(gene)
p <- ggplot(data_info$long_data, aes(x = Condition, y = log2CPM, fill = Drug)) +
geom_boxplot(outlier.shape = NA) +
scale_fill_manual(values = c("CX"="#0000FF","DOX"="#e6d800","VEH"="#FF00FF")) +
geom_point(aes(color = Indv), size = 2, alpha = 0.5, position = position_jitter(width = -1, height = 0)) +
geom_text(data = data_info$significance_labels, aes(x = Condition, y = max_log2CPM + 0.5, label = Significance),
inherit.aes = FALSE, size = 6, color = "black") +
ggtitle(paste("Log2CPM Expression of", gene)) +
labs(x = "Treatment", y = "log2CPM") +
theme_bw() +
theme(plot.title = element_text(size = rel(2), hjust = 0.5),
axis.title = element_text(size = 15, color = "black"),
axis.text.x = element_text(size = 10, color = "black", angle = 90, hjust = 1))
print(p)
}
sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] clusterProfiler_4.10.1 data.table_1.17.0 RColorBrewer_1.1-3
[4] circlize_0.4.16 ComplexHeatmap_2.18.0 lubridate_1.9.4
[7] forcats_1.0.0 stringr_1.5.1 purrr_1.0.4
[10] readr_2.1.5 tidyverse_2.0.0 org.Hs.eg.db_3.18.0
[13] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2
[16] Biobase_2.62.0 BiocGenerics_0.48.1 ggplot2_3.5.2
[19] tibble_3.2.1 tidyr_1.3.1 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] rstudioapi_0.17.1 jsonlite_2.0.0 shape_1.4.6.1
[4] magrittr_2.0.3 magick_2.8.6 farver_2.1.2
[7] rmarkdown_2.29 GlobalOptions_0.1.2 fs_1.6.3
[10] zlibbioc_1.48.2 vctrs_0.6.5 memoise_2.0.1
[13] Cairo_1.6-2 RCurl_1.98-1.17 ggtree_3.10.1
[16] htmltools_0.5.8.1 gridGraphics_0.5-1 sass_0.4.10
[19] bslib_0.9.0 plyr_1.8.9 cachem_1.1.0
[22] whisker_0.4.1 igraph_2.1.4 lifecycle_1.0.4
[25] iterators_1.0.14 pkgconfig_2.0.3 gson_0.1.0
[28] Matrix_1.6-1.1 R6_2.6.1 fastmap_1.2.0
[31] GenomeInfoDbData_1.2.11 clue_0.3-66 digest_0.6.34
[34] aplot_0.2.5 enrichplot_1.22.0 colorspace_2.1-0
[37] patchwork_1.3.0 rprojroot_2.0.4 RSQLite_2.3.9
[40] labeling_0.4.3 timechange_0.3.0 httr_1.4.7
[43] polyclip_1.10-7 mgcv_1.9-3 compiler_4.3.0
[46] bit64_4.6.0-1 withr_3.0.2 doParallel_1.0.17
[49] BiocParallel_1.36.0 viridis_0.6.5 DBI_1.2.3
[52] ggforce_0.4.2 MASS_7.3-60 rjson_0.2.23
[55] HDO.db_0.99.1 tools_4.3.0 scatterpie_0.2.4
[58] ape_5.8-1 httpuv_1.6.15 glue_1.7.0
[61] nlme_3.1-168 GOSemSim_2.28.1 promises_1.3.2
[64] shadowtext_0.1.4 cluster_2.1.8.1 reshape2_1.4.4
[67] fgsea_1.28.0 generics_0.1.3 gtable_0.3.6
[70] tzdb_0.5.0 hms_1.1.3 tidygraph_1.3.1
[73] XVector_0.42.0 ggrepel_0.9.6 foreach_1.5.2
[76] pillar_1.10.2 yulab.utils_0.2.0 later_1.3.2
[79] splines_4.3.0 tweenr_2.0.3 treeio_1.26.0
[82] lattice_0.22-7 bit_4.6.0 tidyselect_1.2.1
[85] GO.db_3.18.0 Biostrings_2.70.3 knitr_1.50
[88] git2r_0.36.2 gridExtra_2.3 xfun_0.52
[91] graphlayouts_1.2.2 matrixStats_1.5.0 stringi_1.8.3
[94] lazyeval_0.2.2 workflowr_1.7.1 ggfun_0.1.8
[97] yaml_2.3.10 evaluate_1.0.3 codetools_0.2-20
[100] ggraph_2.2.1 qvalue_2.34.0 ggplotify_0.1.2
[103] cli_3.6.1 munsell_0.5.1 jquerylib_0.1.4
[106] Rcpp_1.0.12 GenomeInfoDb_1.38.8 png_0.1-8
[109] parallel_4.3.0 blob_1.2.4 DOSE_3.28.2
[112] bitops_1.0-9 viridisLite_0.4.2 tidytree_0.4.6
[115] scales_1.3.0 crayon_1.5.3 GetoptLong_1.0.5
[118] rlang_1.1.3 cowplot_1.1.3 fastmatch_1.1-6
[121] KEGGREST_1.42.0