Section 15 Complexes Co-Expression across Species
library(ggplot2)
library(ggrepel)
library(dplyr)
library(proteomicstools)
#LoadFunctions
source("../ComplexScript/complexes_function.R")
In this script we run the complexes co-expression analysis, in order to evaluate complexes (and subunits) co-expression along different dataset We run the function for MS neuronal differentiation data
- Danio rerio Neuron/Stem MS data
- Human in vitro neurogenesis from (Djuric et al., 2017)
- Rat in vitro neurogenesis from (Frese et al., 2017)
- Mouse in vitro neurogenesis TMT10 data.
We use the information for each file that is coming from the DataInfo.txt
file. In this files all the important information for each file are stored. This
will allow the use of the function ComplexesAndSubunitsCoexpression()
on each
dataset to calculate stoichiometry coexpression of the different subunits.
#Load data info for each dataset < ----
<- read.table("../Data/Dataset/DataInfo.txt",sep = "\t",header = T,stringsAsFactors = F)
DataInfo DataInfo
## filename
## 1 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 2 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 3 ../Data/Dataset/270519_MouseNeuron_TMT10_contrast_updatedNames.txt
## 4 ../Data/Dataset/processed/Frese_et_al_2017_processed.csv
## 5 ../Data/Dataset/processed/Frese_et_al_2017_processed.csv
## 6 ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 7 ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 8 ../Data/Dataset/processed/Djuric_et_al_2017_processed.csv
## 9 ../Data/Dataset/processed/ZebrafishNeurogProcessed.txt
## Id.col fold.change fdr.col condition.col
## 1 Gene.name logFC.DIV10.DIV3 adj.P.Val.DIV10.DIV3 condition3
## 2 Gene.name logFC.DIV3.DIV0 adj.P.Val.DIV3.DIV0 condition1
## 3 Gene.name logFC.DIV10.DIV0 adj.P.Val.DIV10.DIV0 condition2
## 4 SYMBOL Log.DIV5.DIV1 Log.DIV5.DIV1.pvalue condition1
## 5 SYMBOL Log.DIV14.DIV1 Log.DIV14.DIV1.pvalue condition2
## 6 Gene.Symbol logFC.NPC.iPS adj.P.Val.NPC.iPS condition1
## 7 Gene.Symbol logFC.Neu.iPS adj.P.Val.Neu.iPS condition2
## 8 Gene.Symbol logFC.Neu.NPC adj.P.Val.Neu.NPC condition3
## 9 genename Log.Ratio.H.L.normalized fdrtool.pval condition
## organism out.label sep complex.name ID.type species condition
## 1 mmusculus Out/MouseTMT \\t mouseGeneNames SYMBOL Mm DIV10.DIV3
## 2 mmusculus Out/MouseTMT \\t mouseGeneNames SYMBOL Mm DIV3.DIV0
## 3 mmusculus Out/MouseTMT \\t mouseGeneNames SYMBOL Mm DIV10.DIV0
## 4 rnorvegicus Out/Frese , ratGeneNames SYMBOL Rn DIV5.DIV1
## 5 rnorvegicus Out/Frese , ratGeneNames SYMBOL Rn DIV14.DIV1
## 6 hsapiens Out/Djuric , humanGeneNames SYMBOL Hs NPC.iPS
## 7 hsapiens Out/Djuric , humanGeneNames SYMBOL Hs Neu.IPS
## 8 hsapiens Out/Djuric , humanGeneNames SYMBOL Hs Neu.NPC
## 9 drerio Out/Drerio \\t Drerio.Gene.name SYMBOL Dr Neur.Stem
## paralogs.file
## 1 ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 2 ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 3 ../Data/Paralogs/mmusculus_SYMBOL_paralogs_v102.txt
## 4 ../Data/Paralogs/rnorvegicus_SYMBOL_paralogs_v102.txt
## 5 ../Data/Paralogs/rnorvegicus_SYMBOL_paralogs_v102.txt
## 6 ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 7 ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 8 ../Data/Paralogs/hsapiens_SYMBOL_paralogs_v102.txt
## 9 ../Data/Paralogs/drerio_SYMBOL_paralogs_v102.txt
In this specific chunks of code, a crucial role is played by the function
ComplexesAndSubunitsCoexpression
. This specific functions is located inside
the complexes_function.R
script file. Takes in input an S4 object of class
Complex, coming out from the output of a ComplexSet
function, plus a dataframe
were every protein is represented in a row. It also takes in input the column
name were the different fold-changes between condition are present, the
condition name of the comparison and what is the column name where the ID are
located.
#For each dataset
for (i in 1:nrow(DataInfo))
{#GetData information form DataInfo File
<- DataInfo[i,"filename"]
filename <- DataInfo[i,"Id.col"]
Id.col <- DataInfo[i,"sep"];if(sep=="\\t"){sep<-"\t"}
sep <- DataInfo[i,"fold.change"]
fold.change <- DataInfo[i,"fdr.col"]
fdr.col <- DataInfo[i,"condition"]
condition.name <- DataInfo[i,"organism"]
organism <- DataInfo[i,"out.label"]
out.label <- DataInfo[i,"ID.type"]
ID.type <- DataInfo[i,"out.label"]
OutName <- DataInfo[i,"species"]
species <- DataInfo[i,"paralogs.file"]
paralogs.file
#Load Paralogs
<- ParalogsSet(organism,ID.type,filename = paralogs.file)
Paralogs
#Output folder
<- unlist(strsplit(OutName,"/"))[2]
OutName <- paste("../out/complex_coexpr/",OutName,sep = "")
OutName
#Take Complexes
<- ComplexSet(organism,ID.type)
Complexes.data
#Load Data, maintain only rows with values
<- read.delim(filename,sep = sep,header = T)
MSdata <- MSdata[!is.na(MSdata[,fold.change]),]
MSdata
#Run Complexes Co-expression
<- ComplexesAndSubunitsCoexpression(Complexes.data,MSdata,idcol = Id.col,condition.name,fold.change)
Complexes.coexpr
#save the datasets
<- Complexes.coexpr$Complex
complex_stability <- Complexes.coexpr$Subunit
subunit_stability <- Complexes.coexpr$MeanComplex
MeanComplex
#Add Paralogs Information
$has.Paralogs <- hasParalogs(Paralogs,subunit_stability$Protein.ID)
subunit_stability
#write file
write.csv(complex_stability,paste(OutName,condition.name,"complex_stability.csv",sep = "_"),row.names = FALSE)
write.csv(subunit_stability,paste(OutName,condition.name,"subunits_stability.csv",sep = "_"),row.names = FALSE)
write.csv(MeanComplex,paste(OutName,condition.name,"complex_MeanFC.csv",sep = "_"),row.names = FALSE)
}
This script will create for each dataset and condition 3 different files:
DataName_complex_MeanFC.csv
that contains the mean expression value inside each complexDataName_complex_stability.csv
the complex co-expression valuesDataName_subunits_stability.csv
the coxepression values for single subunits inside each complex.