Commit 036ff4e5 authored by Rita Sousa's avatar Rita Sousa

Updated the NFR Tool to use the telemetry of NuvlaBox (which gives the...

Updated the NFR Tool to use the telemetry of NuvlaBox (which gives the consumption of Nodes), the Docker image of the dataClay and the Docker API (which gives the consumption of Workers)
parent fcbdec3c
......@@ -91,4 +91,8 @@ buildNumber.properties
# Ignore all local history of files
.history
app/.idea/
*/.idea/
**/.idea/
dataclay
# Pre-requisites
# README #
- The metrics are measured through perf_event_open so the appropriate package must be installed depending on the environment. You should try to use perf from the command line and see what packages are required and then install them.
More information about perf in:
https://perf.wiki.kernel.org/index.php/Tutorial
http://www.brendangregg.com/perf.html
- Before run NFRTool, check the value in /proc/sys/kernel/perf_event_paranoid and change to **-1**.
This demonstration shows the NFR tool (for time and energy dimensions) performing various tasks, such as:
- listen to NuvlaBox telemetry (to obtain the use of Node resources)
- pull information from the Docker API (to obtain the use of Worker resources) and write that information (the time and energy metrics) in the dataClay
- alert/publish the Global Resource Manager (GRM) in case of NFR violations
**Important: the dataclay folder is a Git submodule and must be initialized by calling**
```
git submodule init
git submodule update
```
**IMMEDIATELY AFTER cloning this repository**
# Structure
- The **app** folder contains a demo for the NFR Tool time and energy
- The **dataclay** folder should correspond to the GitLab repo https://gitlab.bsc.es/elastic-h2020/elastic-sa/nfr-tool/dataclay
# SETUP
To use this demo, a connection to dataClay is mandatory. Before establishing this connection, run the **setupForDataclay.sh** script to configure some necessary settings.
You can check usage by executing:
```
./setupForDataclay.sh -h
```
In this demo, a fake ElasticSystem is created. To have several fake Workers (Docker containers) available, you should run:
```
cat /proc/sys/kernel/perf_event_paranoid
sudo sysctl -w kernel.perf_event_paranoid=-1
cd app/fakeWorkers/
docker build -t fake_docker_workers .
```
Then, there are two options:
1. If you want to execute a specific fake Worker, you must send the number of the fakeworker as a parameter:
```
docker run -e fakeWorker=NUM_FAKEWORKER -d --rm -it --name fakeworkerNUM_FAKEWORKER fake_docker_workers
```
where NUM_FAKEWORKER must be a number between 1 and 6.
2. But if you simply run:
```
docker run -d --rm -it --name fakeworker fake_docker_workers
```
you will create a Worker with a low CPU consumption, between 20-30%.
# Demo usage
Assuming that the dataClay model has already been started and registered, the stubs have already been obtained and that the stubs have already been installed in the local maven repository, just run the demo, which includes Time and Energy requirements:
After creating the desired fake Workers (attention to not overload the system), execute in nfrtool-time-and-energy/app directory
```
./run_nfrtool.sh
cd ..
./getWorkersPid.sh
```
to update some files with the necessary configurations to create the fake Workers.
The master branch has the code to run on NVIDIA Jetson AGX Xavier.
If you need to run NFRTool on an Intel machine, before running the demo (previous snippet), switch to the "machine_x86" branch:
Start the dataClay, destroying any dataClay instance that could be active.
If this Node is the **"master"** node, run:
```
docker-compose -f dataclay-master-docker-compose.yml down -v
docker-compose -f dataclay-master-docker-compose.yml up --build
```
git checkout machine_x86
If this Node is one of the **backend** nodes, execute:
```
docker-compose -f dataclay-backend-docker-compose.yml down -v
docker-compose -f dataclay-backend-docker-compose.yml up --build
```
**NOTE:** dcinitializer should only start when all dataClay nodes are up!
Then start the Global Resource Manager and the NFR Tools on the different Nodes, executing:
```
docker-compose down -v
docker-compose up --build
```
# Acknowledgements
This work has been supported by the EU H2020 project ELASTIC, contract #825473.
target
#Ignore directory
stubs/
cfgfiles/
# Default ignored files
/shelf/
/workspace.xml
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml
# Editor-based HTTP Client requests
/httpRequests/
nfrtool-demo
\ No newline at end of file
FROM bscdataclay/client:alpine
ENV WORKING_DIR=/demo
ARG DC_SHARED_VOLUME=/srv/dataclay/shared
ARG DEFAULT_NAMESPACE=defaultNS
ARG DEFAULT_USER=xavier-rit
ARG DEFAULT_PASS=defaultPass
ARG DEFAULT_STUBS_JAR=/demo/stubs.jar
ARG DEFAULT_STUBS_PATH=/demo/stubs
ENV DC_SHARED_VOLUME=${DC_SHARED_VOLUME} \
DATACLAYCLIENTCONFIG=${WORKING_DIR}/client.properties \
DATACLAYGLOBALCONFIG=${WORKING_DIR}/global.properties \
DATACLAYSESSIONCONFIG=${WORKING_DIR}/session.properties \
NAMESPACE=${DEFAULT_NAMESPACE} \
USER=${DEFAULT_USER} \
PASS=${DEFAULT_PASS} \
STUBSPATH=${DEFAULT_STUBS_PATH} \
STUBS_JAR=${DEFAULT_STUBS_JAR}
#PIDS_TO_MONITOR=${PIDS_TO_MONITOR}
#sed '/ENV WORKING_DIR/s/$/\nARG PIDS_TO_MONITOR=123 456 678/' app/Dockerfile
#sed '3s/$/\nARG PIDS_TO_MONITOR=123 456 678/' app/Dockerfile
WORKDIR ${WORKING_DIR}
# Install maven:
RUN apk --no-cache --update add maven openjdk11 mosquitto-clients curl
ENV PATH=/usr/lib/jvm/java-11-openjdk/bin/:${PATH}
# Copy files
COPY . ${WORKING_DIR}
VOLUME ${DC_SHARED_VOLUME}
ENTRYPOINT ["./dataclay_init.sh"]
CMD mvn clean compile exec:java -Dexec.cleanupDaemonThreads=false \
-Dexec.mainClass="app.NFRTool" \
-Dexec.args="wordcount --mode demo --ip 192.168.60.68 --agx" \
Go to /dataclay directory and initialize dataclay
(Start dataClay)
$ ./start_dataclay.sh
# Start dataClay without using Docker image
Follow the next step to initialize dataclay
Go to /dataclay directory
(Start dataClay)
```
./start_dataclay.sh
```
(Register model located in `model` folder into started dataClay instance)
$ ./register_model.sh
```
./register_model.sh
```
(Get stubs of your registered model)
$ ./get_stubs.sh
```
./get_stubs.sh
```
Go to /app directory
......@@ -14,7 +27,11 @@ Go to /app directory
Change the StubsClasspath in cfgfiles/session.propeties to the correct path were the stubs generated by ./get_stubs.sh were created
To be able to compile the app outside a container, install the stubs in the local maven repository
$ ./install_mvn_stubs.sh
```
./install_mvn_stubs.sh
```
Finnaly, run the demo which includes Time and Energy requirements
$ ./run_nfrtool.sh
\ No newline at end of file
```
./run_nfrtool.sh
```
Account=ElasticUser
Password=ElasticPass
DataSets=ElasticDS
DataSetForStore=ElasticDS
StubsClasspath=/home/luis/Documents/elastic/new_nfrtool/dataclay/stubs/
Account=xavier-rit
Password=defaultPass
DataSets=defaultDS
DataSetForStore=defaultDS
StubsClasspath=../dataclay/stubs
#!/bin/sh
set -e
#CONTRACT_ID_FILE=${DC_SHARED_VOLUME}/${NAMESPACE}_contractid
########################### create cfgfiles ###########################
#echo ${LOGICMODULE_HOST}
#echo ${DATACLAYCLIENTCONFIG}
printf "HOST=${LOGICMODULE_HOST}\nTCPPORT=${LOGICMODULE_PORT_TCP}" > ${DATACLAYCLIENTCONFIG}
echo "Account=${USER}
Password=${PASS}
DataSets=${DATASET}
DataSetForStore=${DATASET}
StubsClasspath=${STUBSPATH}" > ${DATACLAYSESSIONCONFIG}
######################################################
# Wait for dataclay to be alive (max retries 10 and 5 seconds per retry)
dataclaycmd WaitForDataClayToBeAlive 10 5
# Wait for contract id in shared volume
#while [ ! -f ${CONTRACT_ID_FILE} ]; do echo "Waiting for contract ID at ${CONTRACT_ID_FILE}..."; sleep 5; done
# Get stubs
mkdir -p ${STUBSPATH}
dataclaycmd GetStubs ${USER} ${PASS} ${NAMESPACE} ${STUBSPATH}
# Package stubs
jar cvf ${STUBS_JAR} -C ${STUBSPATH} .
# Install stubs in local repository to use it as a pom dependency
mvn install:install-file -Dfile=${STUBS_JAR} -DgroupId=nfrtool \
-DartifactId=dataclay-stubs -Dversion=latest -Dpackaging=jar -DcreateChecksum=true
# Execute command
exec "$@"
FROM gcc:4.9
ENV fakeWorker 3
COPY . /DockerFakeWorkers
WORKDIR /DockerFakeWorkers/
#RUN gcc -o fakeworker1 fakeworker.c
# fakeworker1 N=500000
# fakeworker2 N=50000
# fakeworker3 N=5000
RUN sed -i 's/500000/50000/' fakeworker.c
RUN gcc fakeworker.c -o fakeworker2
RUN sed -i 's/50000/5000/' fakeworker.c
RUN gcc fakeworker.c -o fakeworker3
RUN sed -i 's/5000/500000/' fakeworker.c
RUN gcc fakeworker.c -o fakeworker1
# fakeworker4 NUM_THREADS 6
# fakeworker5 NUM_THREADS 4
# fakeworker6 NUM_THREADS 2
RUN sed -i 's/6/4/' fakethreadworker.c
RUN gcc fakethreadworker.c -o fakeworker5 -lpthread
RUN sed -i 's/4/2/' fakethreadworker.c
RUN gcc fakethreadworker.c -o fakeworker6 -lpthread
RUN sed -i 's/2/6/' fakethreadworker.c
RUN gcc fakethreadworker.c -o fakeworker4 -lpthread
RUN [ "sh", "-c", "echo ${fakeWorker}" ]
CMD ["sh", "-c", "./fakeworker${fakeWorker}"]
#FROM alpine as build-env
#RUN apk add --no-cache build-base
#WORKDIR /app
#COPY . .
#RUN gcc -o fakeworker1 fakeworker.c
#FROM alpine
#COPY --from=build-env /app/fakeworker1 /app/fakeworker1
#
#WORKDIR /app
#CMD ["app/fakeworker1"]
......@@ -9,31 +9,32 @@ pthread_t tid[NUM_THREADS];
void *function(void *var)
{
pthread_t id = pthread_self();
for (;;)
{
int n = 500000, i;
unsigned long long fact = 1;
for (i = 1; i <= n; ++i)
{
fact *= i;
}
//printf("%lu\n",id);
usleep(1);
}
pthread_t id = pthread_self();
for (;;)
{
int n = 500000, i;
unsigned long long fact = 1;
for (i = 1; i <= n; ++i)
{
fact *= i;
}
printf("%lu\n",id);
usleep(1);
}
}
int main()
{
int i;
int i;
for (i = 0; i < NUM_THREADS; i++)
{
pthread_create(&(tid[i]), NULL, &function, NULL);
}
for (i = 0; i < NUM_THREADS; i++)
{
pthread_join((tid[i]), NULL);
}
return 0;
for (i = 0; i < NUM_THREADS; i++)
{
pthread_create(&(tid[i]), NULL, &function, NULL);
}
for (i = 0; i < NUM_THREADS; i++)
{
pthread_join((tid[i]), NULL);
}
return 0;
}
......@@ -6,14 +6,15 @@
int main()
{
for (;;)
{
int i;
unsigned long long fact = 1;
for (i = 1; i <= N; i++)
{
fact *= i;
}
usleep(1);
}
for (;;)
{
int i;
unsigned long long fact = 1;
for (i = 1; i <= N; i++)
{
fact *= i;
}
usleep(1);
}
}
#!/bin/bash
# Needed when this script is executed by NFRTool.java
cd fake_worker
# fakeworker1 N=500000
# fakeworker2 N=50000
# fakeworker3 N=5000
sed -i 's/500000/50000/' fakeworker.c
gcc fakeworker.c -o fakeworker2
sed -i 's/50000/5000/' fakeworker.c
gcc fakeworker.c -o fakeworker3
sed -i 's/5000/500000/' fakeworker.c
gcc fakeworker.c -o fakeworker1
# fakeworker4 NUM_THREADS 6
# fakeworker5 NUM_THREADS 4
# fakeworker6 NUM_THREADS 2
sed -i 's/6/4/' fakethreadworker.c
gcc fakethreadworker.c -o fakeworker5 -lpthread
sed -i 's/4/2/' fakethreadworker.c
gcc fakethreadworker.c -o fakeworker6 -lpthread
sed -i 's/2/6/' fakethreadworker.c
gcc fakethreadworker.c -o fakeworker4 -lpthread
#!/bin/sh
set -e
DELIMITER=" "
LIST_PIDS=""
LIST_IPS=""
for container_id in $(docker ps -q -f name=fakeworker)
do
NEW_PID=$(docker inspect -f '{{ .State.Pid }}' $container_id)
if [ -z "$NEW_PID" ]
then
continue
fi
LIST_PIDS=$LIST_PIDS$NEW_PID$DELIMITER
NEW_IP=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $container_id)
if [ -z "$NEW_IP" ]
then
LIST_IPS=$LIST_IPS"null"$DELIMITER
fi
LIST_IPS=$LIST_IPS$NEW_IP$DELIMITER
done
echo $LIST_PIDS > pidsToMonitor.txt
echo $LIST_IPS > ipsToMonitor.txt
#!/bin/bash
cd ../dataclay/stubs
cd ../dataclay/stubs
jar cvf ../stubs.jar .
mvn install:install-file -Dfile=../stubs.jar -DgroupId=es.bsc.compss -DartifactId=nfrtool-dataclay-stubs -Dversion=2.0 -Dpackaging=jar -DcreateChecksum=true
172.17.0.3 172.17.0.2
......@@ -4,7 +4,9 @@
<facet type="jpa" name="JPA">
<configuration>
<setting name="validation-enabled" value="true" />
<datasource-mapping />
<datasource-mapping>
<factory-entry name="nfrtool-demo" />
</datasource-mapping>
<naming-strategy-map />
</configuration>
</facet>
......@@ -74,5 +76,7 @@
<orderEntry type="library" name="Maven: com.ibm.wala:com.ibm.wala.cast.java.ecj:1.5.0" level="project" />
<orderEntry type="library" name="Maven: org.eclipse.jdt:org.eclipse.jdt.core:3.10.0" level="project" />
<orderEntry type="library" name="Maven: org.json:json:20190722" level="project" />
<orderEntry type="library" name="Maven: com.rabbitmq:amqp-client:5.10.0" level="project" />
<orderEntry type="library" name="Maven: org.slf4j:slf4j-api:1.7.30" level="project" />
</component>
</module>
\ No newline at end of file
......@@ -15,14 +15,14 @@
</properties>
<dependencies>
<dependency>
<groupId>es.bsc.compss</groupId>
<artifactId>nfrtool-dataclay-stubs</artifactId>
<version>2.0</version>
<groupId>nfrtool</groupId>
<artifactId>dataclay-stubs</artifactId>
<version>latest</version>
</dependency>
<dependency>
<groupId>es.bsc.dataclay</groupId>
<artifactId>dataclay</artifactId>
<version>2.1</version>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
......@@ -35,9 +35,9 @@
<version>${amqp-client.version}</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20190722</version>
<groupId>org.eclipse.paho</groupId>
<artifactId>org.eclipse.paho.client.mqttv3</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>
<build>
......
#!/bin/bash
pid=$1
time=$2
(perf stat -x , -p $pid sleep $time) > metrics$pid.log 2>&1
awk -v pide="$pid" -v time="$time" -F "," 'BEGIN { printf "{pid:%d,",pide}
NR==1{printf "%s:%.3f,",$4,($1/(1000*time))}
NR==2{printf "%s:%d,",$3,$1}
NR==3{printf "%s:%d,",$3,$1}
NR==4{printf "%s:%d,",$3,$1}
NR==5{printf "%s:%d,",$3,$1}
NR==6{printf "%s:%d,",$3,$1}
$3 ~ /instructions/{Ins=$1} $3 ~ /cycles/ {Cy=$1} END {printf "insn-per-cycle:%.3f,", Ins/Cy}
NR==7{printf "%s:%d,",$3,$1}
NR==8{printf "%s:%d,",$3,$1}
$3 ~ /branch-misses/{Bm=$1} $3 ~ /branches/ {B=$1} END {printf "branch-missed-perc:%.2f}\n", Bm/B*100}' metrics$pid.log
rm metrics$pid.log
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
#!/bin/bash
# Needed when this script is executed by TimeMonitor.java and EnergyMonitor.java
cd probes
gcc probeEnergyAGX.c -o probeEnergyAGX -lpthread
gcc probeTimeAGX.c -o probeTimeAGX -lpthread
#!/bin/bash
# Needed when this script is executed by TimeMonitor.java and EnergyMonitor.java
#cd probes
#if [ $USER = "ricardo" ]
#then
gcc probeEn.c -o probeEnergy -lpthread
#else
# gcc probeEnergy.c -o probeEnergy -lpthread
#fi
gcc probeTime.c -o probeTime -lpthread
......@@ -8,5 +8,5 @@ if [ $current_arch = "aarch64" ];
then
java -jar target/nfrtool-demo-2.1.jar testApp --mode demo --ip 127.0.0.1 --agx
else
java -jar target/nfrtool-demo-2.1.jar testApp --mode demo --ip 127.0.0.1
java -jar target/nfrtool-demo-2.1.jar testApp --mode demo --ip 127.0.0.1 --etr 19.0 --ttr 0.34
fi
......@@ -47,7 +47,7 @@ public class ActivationWorkersManager implements Runnable {
if (activeWorkers.size() == 0){
System.out.println("No active Workers. Let's activate another Worker.");
if(activeWorkers.reactivateWorker("NaW")){ // No active Workers
rm.updateActiveWorkersEverywhere();
//rm.updateActiveWorkersEverywhere();
} else {
System.out.println("1- There are no healthy workers to reactivate! No workers were activated!");
}
......@@ -60,7 +60,7 @@ public class ActivationWorkersManager implements Runnable {
if(actualCPUUsage < cpuReactivationThreshold && !activeWorkers.hasNan()){
System.out.printf("Current CPU usage of Node is %.2f. Let's activate another Worker.\n",actualCPUUsage);
if(activeWorkers.reactivateWorker("time")){
rm.updateActiveWorkersEverywhere();
//rm.updateActiveWorkersEverywhere();
} else {
System.out.println("2- There are no healthy workers to reactivate! No workers were activated!");
}
......
......@@ -28,6 +28,7 @@ import org.json.JSONArray;
import org.json.JSONObject;
import es.bsc.compss.nfr.model.Worker;
import es.bsc.compss.nfr.model.Node;
public class ActiveWorkersMap {
......@@ -37,7 +38,6 @@ public class ActiveWorkersMap {
// This map records small history of the metrics and it is used by
// DataclayWritingManager to write average metric in Dataclay
private Map<Integer, JSONArray> activeWorkersMetricsHistory;
private final int monitoringPeriod;
private boolean updateEnergy = false;
private boolean historyMetricsFilled = false;
......@@ -45,9 +45,8 @@ public class ActiveWorkersMap {
private final int COMPUTING_UNITS = Runtime.getRuntime().availableProcessors();
ActiveWorkersMap(List<Worker> workerList, int monitoringPeriod) {
ActiveWorkersMap(List<Worker> workerList) {
this.workerList = workerList;
this.monitoringPeriod = monitoringPeriod;
this.activeWorkerMetrics = new ConcurrentHashMap<>();
this.activeWorkersMetricsHistory = new ConcurrentHashMap<>();
}
......@@ -89,32 +88,35 @@ public class ActiveWorkersMap {
return null;
}
public JSONObject getViolationMessage(String dimension) {
int workerPid = getFirstActiveWorkerPid();
Worker w = getWorkerByPid(workerPid);
public String getViolationMessage(Node n, String dimension) {
//int workerPid = getFirstActiveWorkerPid();
//Worker w = getWorkerByPid(workerPid);
// FIXME: Find better way
String nodeIP = "localhost";
if(!w.getNode().getIpEth().equals("") || w.getNode().getIpEth() != null){
nodeIP = w.getNode().getIpEth();
} else if(!w.getNode().getIpLte().equals("") || w.getNode().getIpLte() != null){
nodeIP = w.getNode().getIpLte();
} else if(!w.getNode().getIpWifi().equals("") || w.getNode().getIpWifi() != null){
nodeIP = w.getNode().getIpWifi();
if(!n.getIpEth().equals("") || n.getIpEth() != null){
nodeIP = n.getIpEth();
} else if(!n.getIpLte().equals("") || n.getIpLte() != null){
nodeIP = n.getIpLte();
} else if(!n.getIpWifi().equals("") || n.getIpWifi() != null){
nodeIP = n.getIpWifi();
}
String appUuid = w.getApplication().getUuid();
//String appUuid = w.getApplication().getUuid();
JSONObject obj = new JSONObject();
obj.put("dimension",dimension);
obj.put("nodeIP",nodeIP);
obj.put("COMPSsAppUuid",appUuid);
return obj;
//obj.put("COMPSsAppUuid",appUuid);
return obj.toString();
}
public int getMonitoringPeriodByWorkerPid(int workerPid){
return getWorkerByPid(workerPid).getApplication().getMonitoringPeriod();
}
public void put(int workerPid, JSONObject metrics) {
try{
// Control the length of the history array (5 seconds = 5 metrics records)
if (activeWorkersMetricsHistory.get(workerPid).length() == monitoringPeriod) {
int actualMonitoringPeriod = getMonitoringPeriodByWorkerPid(workerPid);
if (activeWorkersMetricsHistory.get(workerPid).length() == actualMonitoringPeriod) {
activeWorkersMetricsHistory.get(workerPid).remove(0);
}
// Correct the CPU usage by process to get in the whole "system", not by core
......
......@@ -17,174 +17,41 @@
package app;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.ServerSocket;
import java.net.Socket;
import app.mqtt_callback.MqttCallBack;
import org.json.JSONObject;
import app.service.SendService;
import es.bsc.compss.nfr.model.Worker;
import org.eclipse.paho.client.mqttv3.MqttException;
import org.eclipse.paho.client.mqttv3.MqttClient;
import org.eclipse.paho.client.mqttv3.MqttMessage;
public class EnergyMonitor implements Runnable {
private ResourceManager rm;
private ActiveWorkersMap activeWorkers;
private float energyThreshold;
private Socket socketEnergy;
private String version;
private String powerName;
private static final int SERVER_ENERGY_PORT = 8687;
private final static String QUEUE_NAME = "products_queue";
EnergyMonitor(ResourceManager rm, ActiveWorkersMap activeWorkersMetrics, String version) {
EnergyMonitor(ResourceManager rm) {
this.rm = rm;
this.activeWorkers = activeWorkersMetrics;
energyThreshold = (float) 0.0;
this.version = version;
}
private void getNodeEnergyCapacity() {
if (activeWorkers.size() > 0) {
int workerPid = activeWorkers.getFirstActiveWorkerPid();
Worker worker = activeWorkers.getWorkerByPid(workerPid);
energyThreshold = worker.getNode().getEnergyThreshold();
}
}
public void run() {