Unverified Commit ab6a9e22 authored by Romain Reuillon's avatar Romain Reuillon Committed by GitHub
Browse files

Merge pull request #17 from samthiriot/plot-evolutions-1

Basic examples of plots of evolutions using RTask
parents bc349b42 788197cb
......@@ -35,4 +35,6 @@ Note: if you are asked for a password for "gitlab.openmole.org" when pushing you
- [Sensitivity-Screening analysis](sensitivity/morris): a method to quickly analyze which inputs are influential on large spaces of parameters.
- [Global Sensitivity Analysis](sensitivity/saltelli): a variance based sensitivity analysis of model output
- [Test Functions for NSGA2](nsga2-test-functions): reference functions to double check the correctedness of the NSGA2 algorithm, and also view examples of usage of NSGA2
- [Visualization for Genetic Algorithms based on R](genetic-algos-visu): examples of production of graphs automatically after the exploration in many forms: images, videos or interactice diagrams
# Genetic Algos Visualizations based on R
## Motivation
When using [Genetic Algorithms in OpenMole](https://next.openmole.org/Genetic+Algorithms.html) for calibration or optimization, it is important for the user to understand what happens over time.
Graphing the evolution of evolutionary methods is also necessary for communication.
Basic OpenMole features include the exportation of the populations during the evolution as CSV files.
The web interface of OpenMOLE also helps you to plot these CSV files easily in the user interface.
A simple way for an user to create graphs is to use the [RTask](https://next.openmole.org/R.html) which enables the usage of R
from OpenMOLE. The first execution of such a task is slow, because OpenMOLE has to download data from the web and build a local container
with the necessary programs. But later executions reuse the same container and are quicker.
Producing graphs automatically after exploration and simulation experiments is part of the methodological recommandations for *reproducible research*. The intuition is that doing graphs by hand augments the risks of human error: how to be sure the graph you are including in your paper or presentation is really the one based on the last version of your model, the last simulations, the last exploration of the space of parameters? Producing the graphs automatically along with the simulation results guarantees you a perfect and systematic consistency.
## Optimization Problem
To illustrate visualization, we proposed in the [example_of_optimization.oms](./example_of_optimization.oms) a simple function
to optimize, and a simple and quick optimization.
This file is imported in other example workflows.
So any modification you would produce in this file would impact all the other examples.
## Plot Last Iteration
![Last Pareto Front](./example_results/last_Pareto.png)
The simplest graphic we can do is to plot the Pareto front in the last iteration of the genetic algorithm.
The example workflow [plot_last_iteration.oms](./plot_last_iteration.oms) provides such a [RTask](https://next.openmole.org/R.html)
which lists the CSV files produced by a simulation, takes the most recently produced one (and not the highest iteration number which might come from an older exploration), and graphs it.
In this simple example we only graph the 2 dimensionnal space of objectives.
The RTask returns the produced file as an OpenMOLE File variable.
We then use the [File Copy Hook](https://next.openmole.org/Hooks.html#Hooktocopyafile) to copy this file produced during the execution into a local file.
Note that after the success of the execution, you will have to refresh the list of files on the left to view the new files.
## Plot Every Iteration
![Iteration 1](./example_results/iteration0001.png) ![Iteration 10](./example_results/iteration0010.png) ![Iteration 100](./example_results/iteration0100.png)
The next easier solution is to plot the Pareto front of every iteration.
We use the [ggplot](https://ggplot2.tidyverse.org/) and [gganimate](https://gganimate.com/articles/gganimate.html) R packages
which will first analyze the entire file (and adapt the axes accordingly),
then plot each iteration as a different frame.
In this example we save every image for each iteration into a file.
The default format is PNG, but you can use the corresponding parameters to switch to one of the various formats offered by the [animate function](https://gganimate.com/reference/animate.html), including jpg, png or vectorial formats more suitable for publication such as EPS or PDF.
All the images are stored into a directory inside the RTask.
The directory is returned as a File OpenMOLE variable.
We can use the [File Copy Hook](https://next.openmole.org/Hooks.html#Hooktocopyafile) to copy the entire directory locally.
After refreshing the list of files in the OpenMOLE interface, you can download the directory containing all the pictures to your computer and browse them easily.
## Render a Video with Evolution
The next step is to concatenate the PNG images produced in the previous example into a video.
We demonstrate the production of a MP4 video with a strong compression, in order to save bandwith and energy consumption.
The example [plot_iterations_as_mp4video.oms](./plot_iterations_as_mp4video.oms) installs [the ffmpeg sofware](https://ffmpeg.org/) inside the RContainer to encode the video.
We call it from the RTask using the standard command line, so you can tune it easily.
After refreshing the list of files on the left of the OpenMOLE interface, you have to download the video file to view it on your computer.
Here is [an example of result](./example_results/iterations_video.mp4)
## Render an interactive Plotly Graph
The previous results provide formats of graphs which can be easily embbed into presentations or publications.
But interactice graphs are better for the user to understand what happens during the evolution.
We demonstrate the usage of the well known [Plotly](https://plotly.com/) which we call from [R](https://www.r-project.org/) using the wonderful [ggplotly](https://plotly.com/ggplot2/extending-ggplotly/) library.
In this example, the task creates an interactive widget inside a HTML file.
To make it easier for the user, we create a zip archive with all the necessary files.
Download it, extract it on your computer, and open the html file with a web browser to visualize it.
## Display Pairs
![Pairs](./example_results/pairs.png)
An easy way to discover the dynamics of any code is to plot a diagram showing the relationship between all the variables of a dataset.
We show a simplistic example in [plot_pairs.oms](./plot_pairs.oms) based on the [amazing GGally R package](http://ggobi.github.io/ggally/) which produces this graph into a PNG file.
## Next Steps
These examples are only a way to show you how you can use standard and powerful R packages to graph the evolution of Genetic Algorithms
in OpenMole. Feel free to forge your own examples and add them to this collaborative repository.
//model inputs
val x = Val[Double]
//model outputs
val f1 = Val[Double]
val f2 = Val[Double]
// about the current experiment
val relativePath = "results/SchafferN2"
// the test function
val testFunctionSchafferN2=
ScalaTask("""
val f1 = if (x <= 1) {
-x*1.0
} else if (x <= 3) {
x - 2.0
} else if (x <= 4) {
4.0 - x
} else {
x - 4.0
}
val f2 = Math.pow(x - 5, 2);
""") set (
inputs += x,
outputs += f1,
outputs += f2
)
// the optimisation algorithm under test
val evolutionSchafferN2 =
NSGA2Evolution(
genome = Seq(
x in (-5.0, 10.0)
),
objective = Seq(f1, f2),
evaluation = testFunctionSchafferN2,
parallelism = 8,
termination = 1000
)
val envMultiThread = LocalEnvironment(4, name="multithread")
// compute evolution on the test Function
(evolutionSchafferN2 on envMultiThread hook (workDirectory/relativePath, keepAll=false)
// then plot the last Pareto front
) -- (taskPlotLastParetoFront set ( directoryWithResults := workDirectory/relativePath, countInputs := 1) hook CopyFileHook(last_pareto, workDirectory/"last Pareto front SchafferN2.png" ) )
// this variable will transmit the path where the CSV files to graph will be found
val directoryWithResults = Val[File]
// variables used to parameter the graphing function
val filesHaveHeaders = Val[Int]
val countInputs = Val[Int]
val graphWidth = Val[Int]
val graphHeight = Val[Int]
val framerate = Val[Int]
// this variable will contain the file with the graphical rendering of the last PAreto front
val video = Val[File]
val taskPlotAsVideo = RTask("""
library(ggplot2)
library(gganimate)
colnames <- if (countInputs == 2) c("iteration", "x", "y", "f1", "f2") else c("iteration", "x", "f1", "f2")
coltypes <- if (countInputs == 2) c("integer", "numeric", "numeric", "numeric", "numeric") else c("integer", "numeric", "numeric", "numeric")
names(coltypes) <- colnames
directoryWithResultsName <- "mydirectory"
pop <- NULL
i <- 1
while (TRUE) {
# TODO check creation time of the file
filename <- paste(directoryWithResultsName,"/population",i,".csv", sep="");
if (!file.exists(filename)) {
break
}
#print(filename)
popraw <- read.csv(header = FALSE, col.names=colnames, colClasses=coltypes, file=filename)
#print(head(popraw))
pop <- if (is.null(pop)) popraw else rbind(pop, popraw)
i <- i + 1
}
# the ggplot
p <- ggplot(pop, aes(x=f1,y=f2)) + geom_point()
# render with gganimate
gganimation <- p + transition_states(iteration) + transition_time(iteration) #+ labs(title="iteration: {iteration}")
# ... first render individual PNG frames which are always of use
animate(gganimation,
renderer=file_renderer("/tmp/rendered", overwrite=T, prefix="iteration"),
height=graphHeight, width=graphWidth,
device='png')
# render as mp4
print("rendering as a video")
system(paste("ffmpeg -y -framerate", framerate, "-i /tmp/rendered/iteration%04d.png -c:v libx264 -r", framerate," /tmp/render.mp4"))
""",
install = Seq(
// update the list of available packages
"fakeroot apt-get update ",
// required; attempts to update dbus to a newer version would require permissions we do not have
"DEBIAN_FRONTEND=noninteractive fakeroot apt-mark hold dbus",
"""echo "dbus hold" | fakeroot dpkg --set-selections""",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y r-cran-ggplot2 r-cran-gganimate r-cran-ggally r-cran-plotly r-cran-zip",
// install external tools in the VM for rendering
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y ffmpeg",
), //
libraries = Seq() // were installed with the binary version earlier
) set (
inputFiles += (directoryWithResults, "mydirectory"),
outputFiles += ("/tmp/render.mp4", video),
inputs += filesHaveHeaders.mapped,
inputs += countInputs.mapped,
inputs += graphWidth.mapped,
inputs += graphHeight.mapped,
inputs += framerate.mapped,
filesHaveHeaders := 1,
countInputs := 2,
graphWidth := 600,
graphHeight := 600,
framerate := 5,
)
// import from the other file an example of optimization
import _file_.example_of_optimization._
// run the optimization
( evolutionSchafferN2 on envMultiThread hook (workDirectory/relativePath, keepAll=false)
) -- (
// then run the plotting function
taskPlotAsVideo set (
// ... which will read all the results from this file
directoryWithResults := workDirectory / relativePath,
// ... analyze them knowing there is only one input in the files
countInputs := 1
) hook CopyFileHook(video, workDirectory/"iterations_video.mp4" )
)
// this variable will transmit the path where the CSV files to graph will be found
val directoryWithResults = Val[File]
// variables used to parameter the graphing function
val filesHaveHeaders = Val[Int]
val countInputs = Val[Int]
// this variable will contain the file with the graphical rendering of the last PAreto front
val plotlyPage = Val[File]
val taskPlotAsVideo = RTask("""
library(ggplot2)
library(plotly)
library(htmlwidgets)
colnames <- if (countInputs == 2) c("iteration", "x", "y", "f1", "f2") else c("iteration", "x", "f1", "f2")
coltypes <- if (countInputs == 2) c("integer", "numeric", "numeric", "numeric", "numeric") else c("integer", "numeric", "numeric", "numeric")
names(coltypes) <- colnames
directoryWithResultsName <- "mydirectory"
pop <- NULL
i <- 1
while (TRUE) {
# TODO check creation time of the file
filename <- paste(directoryWithResultsName,"/population",i,".csv", sep="");
if (!file.exists(filename)) {
break
}
#print(filename)
popraw <- read.csv(header = FALSE, col.names=colnames, colClasses=coltypes, file=filename)
#print(head(popraw))
pop <- if (is.null(pop)) popraw else rbind(pop, popraw)
i <- i + 1
}
# the ggplot
pop$index <- 1:nrow(pop) # add an index to avoid animation of points which are not at all similar
p <- ggplot(pop, aes(x=f1,y=f2,l1=x)
) + geom_point(
alpha=0.7, colour = "#51A0D5", aes(frame=iteration,ids=index)
) + labs(
x = "f1",
y = "f2",
title = "Evolution of the Pareto front with NSGA2"
) + theme_classic()
# render as plotly
dir.create("/tmp/plotly")
fig <- ggplotly(p)
saveWidget(fig, selfcontained=F, file = "/tmp/plotly/evolution.html")
lf <- list.files("/tmp/plotly/", recursive=T, include.dirs=F)
print(lf)
# zip resulting files
library(zip)
setwd("/tmp/plotly/")
zip("/tmp/evolution.zip", lf)
print("done")
""",
install = Seq(
// update the list of available packages
"fakeroot apt-get update ",
// required; attempts to update dbus to a newer version would require permissions we do not have
"DEBIAN_FRONTEND=noninteractive fakeroot apt-mark hold dbus",
"""echo "dbus hold" | fakeroot dpkg --set-selections""",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y r-cran-ggplot2 r-cran-gganimate r-cran-ggally r-cran-plotly r-cran-zip",
// install external tools in the VM for rendering
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y ffmpeg",
), //
libraries = Seq() // were installed with the binary version earlier
) set (
inputFiles += (directoryWithResults, "mydirectory"),
outputFiles += ("/tmp/evolution.zip", plotlyPage),
inputs += filesHaveHeaders.mapped,
inputs += countInputs.mapped,
filesHaveHeaders := 1,
countInputs := 2
)
// import from the other file an example of optimization
import _file_.example_of_optimization._
// run the optimization
( evolutionSchafferN2 on envMultiThread hook (workDirectory/relativePath, keepAll=false)
) -- (
// then run the plotting function
taskPlotAsVideo set (
// ... which will read all the results from this file
directoryWithResults := workDirectory / relativePath,
// ... analyze them knowing there is only one input in the files
countInputs := 1
) hook CopyFileHook(plotlyPage, workDirectory/"evolutionPlotly.zip" )
)
// this variable will transmit the path where the CSV files to graph will be found
val directoryWithResults = Val[File]
// variables used to parameter the graphing function
val filesHaveHeaders = Val[Int]
val countInputs = Val[Int]
val graphWidth = Val[Int]
val graphHeight = Val[Int]
val graphFormat = Val[String]
// this variable will contain the file with the graphical rendering of the last PAreto front
val pngForIterations = Val[File]
val taskPlotEvereyIteration = RTask("""
library(ggplot2)
library(gganimate)
colnames <- if (countInputs == 2) c("iteration", "x", "y", "f1", "f2") else c("iteration", "x", "f1", "f2")
coltypes <- if (countInputs == 2) c("integer", "numeric", "numeric", "numeric", "numeric") else c("integer", "numeric", "numeric", "numeric")
names(coltypes) <- colnames
directoryWithResultsName <- "mydirectory"
pop <- NULL
i <- 1
while (TRUE) {
# TODO check creation time of the file
filename <- paste(directoryWithResultsName,"/population",i,".csv", sep="");
if (!file.exists(filename)) {
break
}
#print(filename)
popraw <- read.csv(header = FALSE, col.names=colnames, colClasses=coltypes, file=filename)
#print(head(popraw))
pop <- if (is.null(pop)) popraw else rbind(pop, popraw)
i <- i + 1
}
# the ggplot
p <- ggplot(pop, aes(x=f1,y=f2)) + geom_point()
# render with gganimate
gganimation <- p + transition_states(iteration) + transition_time(iteration)# + labs(title="iteration: {iteration}")
# ... first render individual PNG frames which are always of use
animate(gganimation,
renderer=file_renderer("/tmp/rendered", overwrite=T, prefix="iteration"),
height=graphHeight, width=graphWidth,
device=graphFormat)
""",
install = Seq(
// update the list of available packages
"fakeroot apt-get update ",
// required; attempts to update dbus to a newer version would require permissions we do not have
"DEBIAN_FRONTEND=noninteractive fakeroot apt-mark hold dbus",
"""echo "dbus hold" | fakeroot dpkg --set-selections""",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y r-cran-ggplot2 r-cran-gganimate r-cran-ggally r-cran-plotly r-cran-zip",
// install external tools in the VM for rendering
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y ffmpeg",
), //
libraries = Seq() // were installed with the binary version earlier
) set (
inputFiles += (directoryWithResults, "mydirectory"),
outputFiles += ("/tmp/rendered", pngForIterations),
inputs += filesHaveHeaders.mapped,
inputs += countInputs.mapped,
inputs += graphWidth.mapped,
inputs += graphHeight.mapped,
inputs += graphFormat.mapped,
filesHaveHeaders := 1,
countInputs := 2,
graphWidth := 600,
graphHeight := 600,
graphFormat := "png"
)
// import from the other file an example of optimization
import _file_.example_of_optimization._
// run the optimization
( evolutionSchafferN2 on envMultiThread hook (workDirectory/relativePath, keepAll=false)
) -- (
// then run the plotting function
taskPlotEvereyIteration set (
// ... which will read all the results from this file
directoryWithResults := workDirectory / relativePath,
// ... analyze them knowing there is only one input in the files
countInputs := 1
) hook CopyFileHook(pngForIterations, workDirectory/"iterations as graphs" )
)
// this variable will transmit the path where the CSV files to graph will be found
val directoryWithResults = Val[File]
// variables used to parameter the graphing function
val filesHaveHeaders = Val[Int]
val countInputs = Val[Int]
val graphWidth = Val[Int]
val graphHeight = Val[Int]
val graphFormat = Val[String]
// this variable will contain the file with the graphical rendering of the last PAreto front
val lastPareto = Val[File]
val taskPlotLastParetoFront = RTask("""
library(ggplot2)
colnames <- if (countInputs == 2) c("iteration", "x", "y", "f1", "f2") else c("iteration", "x", "f1", "f2")
coltypes <- if (countInputs == 2) c("integer", "numeric", "numeric", "numeric", "numeric") else c("integer", "numeric", "numeric", "numeric")
names(coltypes) <- colnames
directoryWithResultsName <- "mydirectory"
# ensure check the directory exists
if (!file.exists(directoryWithResultsName)) { stop(paste("ERROR: the directory", directoryWithResultsName, "does not exists!")) }
# get the most recent file (will be the last result)
allfiles <- file.info(list.files(directoryWithResultsName, full.names = T))
lastfilename <- rownames(allfiles)[which.max(allfiles$mtime)]
if (!file.exists(lastfilename)) { stop(paste("ERROR: no file found in", directoryWithResultsName) ) }
print(lastfilename)
# read the last file
pop <- read.csv(header = filesHaveHeaders>0, col.names=colnames, colClasses=coltypes, file=lastfilename)
print(paste("there are", nrow(pop), "points on the last Pareto front"))
# plot
g <- ggplot(pop, aes(x=f1,y=f2)) + geom_point()
dpi <- 72
ggsave(filename="/tmp/last_pareto",
device=graphFormat,
plot=g,
width=graphWidth/dpi, height=graphHeight/dpi)
""",
install = Seq(
// update the list of available packages
"fakeroot apt-get update ",
// required; attempts to update dbus to a newer version would require permissions we do not have
"DEBIAN_FRONTEND=noninteractive fakeroot apt-mark hold dbus",
"""echo "dbus hold" | fakeroot dpkg --set-selections""",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y r-cran-ggplot2 r-cran-gganimate r-cran-ggally r-cran-plotly r-cran-zip",
// install external tools in the VM for rendering
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y ffmpeg",
), //
libraries = Seq() // were installed with the binary version earlier
) set (
inputFiles += (directoryWithResults, "mydirectory"),
outputFiles += ("/tmp/last_pareto", lastPareto),
inputs += filesHaveHeaders.mapped,
inputs += countInputs.mapped,
inputs += graphWidth.mapped,
inputs += graphHeight.mapped,
inputs += graphFormat.mapped,
filesHaveHeaders := 1,
countInputs := 2,
// in pixels
graphWidth := 600,
graphHeight := 600,
graphFormat := "png"
)
// import from the other file an example of optimization
import _file_.example_of_optimization._
// run the optimization
( evolutionSchafferN2 on envMultiThread hook (workDirectory/relativePath, keepAll=false)
) -- (
// then run the plotting function
taskPlotLastParetoFront set (
// ... which will read all the results from this file
directoryWithResults := workDirectory / relativePath,
// ... analyze them knowing there is only one input in the files
countInputs := 1,
// ... parameters of the graph
graphWidth := 500,
graphHeight := 500
) hook CopyFileHook(lastPareto, workDirectory/"last Pareto front.png" )
)
// this variable will transmit the path where the CSV files to graph will be found
val directoryWithResults = Val[File]
// variables used to parameter the graphing function
val filesHaveHeaders = Val[Int]
val countInputs = Val[Int]
// this variable will contain the file with the graphical rendering of the last PAreto front
val pngForPairs = Val[File]
val taskPlotEvereyIteration = RTask("""
library(ggplot2)
library(GGally)
colnames <- if (countInputs == 2) c("iteration", "x", "y", "f1", "f2") else c("iteration", "x", "f1", "f2")
coltypes <- if (countInputs == 2) c("integer", "numeric", "numeric", "numeric", "numeric") else c("integer", "numeric", "numeric", "numeric")
names(coltypes) <- colnames
directoryWithResultsName <- "mydirectory"
pop <- NULL
i <- 1
while (TRUE) {
# TODO check creation time of the file
filename <- paste(directoryWithResultsName,"/population",i,".csv", sep="");
if (!file.exists(filename)) {
break
}
#print(filename)
popraw <- read.csv(header = FALSE, col.names=colnames, colClasses=coltypes, file=filename)
#print(head(popraw))
pop <- if (is.null(pop)) popraw else rbind(pop, popraw)
i <- i + 1
}
# the ggplot
n <- ncol(pop)
ggsave(filename="/tmp/pairs.png", ggpairs(pop), width=n*6, height=n*6, dpi = 150, units="cm", limitsize=F)
""",
install = Seq(
// update the list of available packages
"fakeroot apt-get update ",
// required; attempts to update dbus to a newer version would require permissions we do not have
"DEBIAN_FRONTEND=noninteractive fakeroot apt-mark hold dbus",
"""echo "dbus hold" | fakeroot dpkg --set-selections""",
// install the libs required for the compilation of R packages
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y libssl-dev libcurl4-openssl-dev libudunits2-dev",
// install required R packages in their binary version (quicker, much stable!)
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y r-cran-ggplot2 r-cran-gganimate r-cran-ggally r-cran-plotly r-cran-zip",
// install external tools in the VM for rendering
"DEBIAN_FRONTEND=noninteractive fakeroot apt-get install -y ffmpeg",
), //