Section 15 Complexes Co-Expression across Species

library(ggplot2)
library(ggrepel)
library(dplyr)
library(proteomicstools)

#LoadFunctions
source("../ComplexScript/complexes_function.R")

In this script we run the complexes co-expression analysis, in order to evaluate complexes (and subunits) co-expression along different dataset We run the function for MS neuronal differentiation data

We use the information for each file that is coming from the DataInfo.txt file. In this files all the important information for each file are stored. This will allow the use of the function ComplexesAndSubunitsCoexpression() on each dataset to calculate stoichiometry coexpression of the different subunits.

#Load data info for each dataset < ----
DataInfo <- read.table("../Data/Dataset/DataInfo.txt",sep = "\t",header = T,stringsAsFactors = F)
DataInfo
##                                                             filename
## 1 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 2 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 3 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 4           ../Data/Dataset/processed/Frese_et_al_2017_processed.csv
## 5           ../Data/Dataset/processed/Frese_et_al_2017_processed.csv
## 6          ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 7          ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 8          ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 9             ../Data/Dataset/processed/ZebrafishNeurogProcessed.txt
##        Id.col              fold.change               fdr.col condition.col
## 1   Gene.name         logFC.DIV10.DIV3  adj.P.Val.DIV10.DIV3    condition3
## 2   Gene.name          logFC.DIV3.DIV0   adj.P.Val.DIV3.DIV0    condition1
## 3   Gene.name         logFC.DIV10.DIV0  adj.P.Val.DIV10.DIV0    condition2
## 4      SYMBOL            Log.DIV5.DIV1  Log.DIV5.DIV1.pvalue    condition1
## 5      SYMBOL           Log.DIV14.DIV1 Log.DIV14.DIV1.pvalue    condition2
## 6 Gene.Symbol            logFC.NPC.iPS     adj.P.Val.NPC.iPS    condition1
## 7 Gene.Symbol            logFC.Neu.iPS     adj.P.Val.Neu.iPS    condition2
## 8 Gene.Symbol            logFC.Neu.NPC     adj.P.Val.Neu.NPC    condition3
## 9    genename Log.Ratio.H.L.normalized          fdrtool.pval     condition
##      organism    out.label sep     complex.name ID.type species  condition
## 1   mmusculus Out/MouseTMT \\t   mouseGeneNames  SYMBOL      Mm DIV10.DIV3
## 2   mmusculus Out/MouseTMT \\t   mouseGeneNames  SYMBOL      Mm  DIV3.DIV0
## 3   mmusculus Out/MouseTMT \\t   mouseGeneNames  SYMBOL      Mm DIV10.DIV0
## 4 rnorvegicus    Out/Frese   ,     ratGeneNames  SYMBOL      Rn  DIV5.DIV1
## 5 rnorvegicus    Out/Frese   ,     ratGeneNames  SYMBOL      Rn DIV14.DIV1
## 6    hsapiens   Out/Djuric   ,   humanGeneNames  SYMBOL      Hs    NPC.iPS
## 7    hsapiens   Out/Djuric   ,   humanGeneNames  SYMBOL      Hs    Neu.IPS
## 8    hsapiens   Out/Djuric   ,   humanGeneNames  SYMBOL      Hs    Neu.NPC
## 9      drerio   Out/Drerio \\t Drerio.Gene.name  SYMBOL      Dr  Neur.Stem
##                                           paralogs.file
## 1   ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 2   ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 3   ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 4 ../Data/Paralogs/rnorvegicus_SYMBOL_paralogs_v102.txt
## 5 ../Data/Paralogs/rnorvegicus_SYMBOL_paralogs_v102.txt
## 6    ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 7    ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 8    ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 9      ../Data/Paralogs/drerio_SYMBOL_paralogs_v102.txt

In this specific chunks of code, a crucial role is played by the function ComplexesAndSubunitsCoexpression. This specific functions is located inside the complexes_function.R script file. Takes in input an S4 object of class Complex, coming out from the output of a ComplexSet function, plus a dataframe were every protein is represented in a row. It also takes in input the column name were the different fold-changes between condition are present, the condition name of the comparison and what is the column name where the ID are located.

#For each dataset
for (i in 1:nrow(DataInfo)) 
{
  #GetData information form DataInfo File
  filename <- DataInfo[i,"filename"]
  Id.col <- DataInfo[i,"Id.col"]
  sep <- DataInfo[i,"sep"];if(sep=="\\t"){sep<-"\t"}
  fold.change <- DataInfo[i,"fold.change"]
  fdr.col <- DataInfo[i,"fdr.col"]
  condition.name <- DataInfo[i,"condition"]
  organism <- DataInfo[i,"organism"]
  out.label <- DataInfo[i,"out.label"]
  ID.type <- DataInfo[i,"ID.type"]
  OutName <- DataInfo[i,"out.label"]
  species <- DataInfo[i,"species"]
  paralogs.file <- DataInfo[i,"paralogs.file"]
  
  #Load Paralogs
  Paralogs <- ParalogsSet(organism,ID.type,filename = paralogs.file)
  
  #Output folder
  OutName <- unlist(strsplit(OutName,"/"))[2]
  OutName <- paste("../out/complex_coexpr/",OutName,sep = "")
  
  #Take Complexes
  Complexes.data <- ComplexSet(organism,ID.type)
  
  #Load Data, maintain only rows with values 
  MSdata <- read.delim(filename,sep = sep,header = T)
  MSdata <- MSdata[!is.na(MSdata[,fold.change]),]
  
  #Run Complexes Co-expression
  Complexes.coexpr <- ComplexesAndSubunitsCoexpression(Complexes.data,MSdata,idcol = Id.col,condition.name,fold.change)
  
  #save the datasets
  complex_stability <- Complexes.coexpr$Complex
  subunit_stability <- Complexes.coexpr$Subunit
  MeanComplex <- Complexes.coexpr$MeanComplex
  
  #Add Paralogs Information
  subunit_stability$has.Paralogs <- hasParalogs(Paralogs,subunit_stability$Protein.ID)
  
  #write file
  write.csv(complex_stability,paste(OutName,condition.name,"complex_stability.csv",sep = "_"),row.names = FALSE)
  write.csv(subunit_stability,paste(OutName,condition.name,"subunits_stability.csv",sep = "_"),row.names = FALSE)
  write.csv(MeanComplex,paste(OutName,condition.name,"complex_MeanFC.csv",sep = "_"),row.names = FALSE)
}

This script will create for each dataset and condition 3 different files:

  • DataName_complex_MeanFC.csv that contains the mean expression value inside each complex
  • DataName_complex_stability.csv the complex co-expression values
  • DataName_subunits_stability.csv the coxepression values for single subunits inside each complex.