Section 16 Paralogs in Complexes MS-Data

In this script, the information from protein complexes stability and subunits variability calculated at MS data, for the different dataset is used to calculate, the proportion of variable subunits and stable subunits in term of their stoichiometries that have paralogs. We show also here that, as already observed for the RNASeq data, paralogs are usually predominantly present in complexes that have variable stoichiometries during neurogenesis. We also address each single paralog substitution in terms of relative fold-changes.

In particular in the code below, for each dataset considered, we run the script GetParalogs.R that outputs different plot on the stability and variability of paralogs inside protein complexes in the different dataset. Plus it returns a file Experiment_Condition_PutativeSwitch.csv that contains all the putative switch for the different datasets. The *_PutativeSwitch.csv files, are for annotating purpose only, are not further processed down in the analysis. These files contains comparisons between paralog pairs across different organism during neurogenesis, we can consider them as a “paralog paired” version of the subunit_stability files.

#Change Directory 
setwd("../ComplexScript/")

#Identifier
identifier <- "SYMBOL"

#Organism experims, and organism must be in the same order.
experims <- c("Djuric","Drerio","Frese","MouseTMT")
organism <- c("hsapiens","drerio","rnorvegicus","mmusculus")

#Output Dir
Dir <- "../out/complex_coexpr/"

#Run here <-----
#Files Dir
Files <- list.files(Dir)

#Dataset information
DataDF <- as.data.frame(cbind(experims,organism))

#Take Subunits Stabilities Files
SubunitsStabilities <- sort(Files[grep("subunits_stability",Files)])
ComplexStabilities <- sort(Files[grep("complex_stability",Files)])

DF <- as.data.frame(cbind(SubunitsStabilities,ComplexStabilities))
expLab <- (sapply(DF[,"SubunitsStabilities"],function(x){strsplit(as.character(x),"_")[[1]][1]}))
DF$exp <- expLab
DF$organism <- sapply(DF$exp,function(x){DataDF[DataDF$exp==x,"organism"]})
DF$identifier <- rep(identifier,nrow(DF))
DF[,c(1,2)] <- apply(DF[,c(1,2)],2,function(x){paste(Dir,x,sep = "")})

In this case the files that we need are already here, in the folder, so we don’t need to run this script again.

#Change Directory 
setwd("../ComplexScript/")

#Run Script Command Line Args
for (R in c(1:nrow(DF)))
{
  cmd <- paste("/gsc/biosw/src/R-4.0.3/bin/Rscript GetParalogs.R",DF[R,1],DF[R,2],DF[R,4],DF[R,5])
  system(cmd)
}
  • The results plot are located inside ComplexScript/Plots directory.
  • The resulting datasets are located inside ComplexScript/Out directory.