A Reference Architecture for
Distributed Software
Deployment
PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Technische Universiteit Delft,
op gezag van de Rector Magnificus prof. ir. K.C.A.M. Luyben,
voorzitter van het College voor Promoties,
in het openbaar te verdedigen
op maandag 3 juni 2013 om 15:00 uur door
Sander VAN DER BURG
ingenieur informatica
geboren te Numansdorp
Dit proefschrift is goedgekeurd door de promotor:
Prof. dr. A. van Deursen
Copromotor: Dr. E. Visser
Samenstelling promotiecommissie:
Rector Magnificus voorzitter
Prof. dr. A. van Deursen Delft University of Technology, promotor
Dr. E. Visser Delft University of Technology, copromotor
Prof. dr. R. di Cosmo University of Paris-Diderot
Prof. dr. ir. D. H. J. Epema Delft University of Technology
Eindhoven University of Technology
Prof. dr. Y. Tan Delft University of Technology
Dr. E. Dolstra LogicBlox Inc.
Dr. S. Malek George Mason University
The work in this thesis has been carried out at the Delft University of Tech-
nology, under the auspices of the research school IPA (Institute for Program-
ming research and Algorithmics). The research was financially supported by
the Netherlands Organisation for Scientific Research (NWO)/Jacquard project
638.001.208, PDS: Pull Deployment of Services.
Copyright © 2013 Sander van der Burg
Cover: Lions Gate Bridge Vancouver, British Columbia, Canada
Printed and bound in The Netherlands by CPI Wöhrmann Print Service.
ISBN 978-94-6203-335-1
Preface
I vividly remember the first PhD defence ceremony that I have attended of
a fellow villager, which was coincidentally also at Delft University of Tech-
nology. At the same time, I just started studying for a master’s degree in
Computer Science at the same university. Although the ceremony made a
lasting impression on me, I concluded that becoming a PhD student was a
career step I would definitely never make however, it turned out that I was
wrong.
I have always had a deep interest in programming and software technology
from a very young age. For example, I wrote my very first computer program
in the BASIC programming language on a Commodore 64.
At some point, I became interested in do-it-yourself (DIY) construction of
Linux distributions, also known as a project called Linux from Scratch (LFS)
1
,
as I could never get satisfied with any of the common Linux distributions.
The Linux from Scratch (LFS) project basically provides a book describing
how to construct your own Linux system using nothing but source packages.
Although it was fun to get a fully customised system working, it turned out
that maintaining and upgrading such customised systems was complicated,
time consuming and a big burden.
In the final year of my master’s study, I met Eelco Vissor, my co-promotor.
Initially, I was looking for a Model Driven Engineering (MDE) thesis project,
but after I told him about my interest in Linux from Scratch, he proposed
me to take a look at a different project of a former PhD student of him (Eelco
Dolstra), who had developed a package manager called Nix with very uncom-
mon, yet very powerful features.
In the beginning, I was sceptic about Nix. However, when I read Eelco
Dolstra’s PhD thesis I learned more about the concepts and started liking
it more and more. I also got involved with the rest of the Nix community
that had already attracted quite some developers from various places over the
world. Eventually, I have decided to focus myself on using Nix and to change
NixOS (a Linux distribution based on Nix) to have the features that I wanted,
instead of doing everything myself.
Moreover, my master’s thesis project was also a collaboration with an in-
dustry partner, namely Philips Research. At Philips, I met Merijn de Jonge
who had been involved with the early stages of the development of Nix. As
part of my master thesis project, I have developed the first Disnix prototype,
allowing developers to deploy service-oriented systems in network of ma-
chines, with a Philips system as a case study.
Although the initial prototype showed us a number of promising results, it
also raised many additional questions that had to be answered. Fortunately,
Philips and our research group had been working on a research project pro-
1
http://www.linuxfromscratch.org
iii
posal, called the Pull Deployment of Services (PDS) project. When I was
approached to become a PhD researcher in the PDS project, it was simply an
offer I could not refuse. Therefore, I broke the promise that I had made to
myself years ago.
A B O U T T H I S T H E S I S
This thesis is the result of the work I have been doing for the last 4 years (or
perhaps a bit longer). In the early stages of my research I had a very good
start, getting a research paper accepted already in the first month. However,
as with many PhD research projects, you have your ups (when your papers
get accepted quite easily after peer reviewing) and downs (when papers get
rejected for reasons that are not obvious at all).
However, one of things my promotor Arie van Deursen said about peer
reviewing is: “Acceptance of a paper is nice, but rejections are there for you
to learn”. One comment of a particular reviewer that really kept me thinking
was the difference between engineering and science, since we as researchers
are supposed to focus ourselves on the latter aspect, although our field of
research lies within the former. Luckily, I was able to find a definition in the
literature:
Research in engineering is directed towards the efficient accomplishment of spe-
cific tasks and towards the development of tools that will enable classes of tasks
to be accomplished more efficiently. [Wegner, 1976]
The open question that remains is: How can that vision be accomplished?
In this thesis, I have been trying to do that by describing the research that I
have done, but also by giving relevant background information and describ-
ing how a technique can be applied and implemented. Often, adoption of
research in practice lies in the details and the way the knowledge is trans-
ferred.
I also had to make several “compromises” as opinions differ among the
people I have worked with on a few tiny details. I even had to make a very
big compromise with myself during this project. People that know me and
have read this thesis, will probably notice the two conflicting aspects I am
referring to, but I leave that as an open exercise to the reader.
F U RT H E R I N F O R M AT I O N
Apart from this thesis and the publications that have contributed to it, the
tools developed as part of this thesis are available as Free and Open-Source
software through the Nix project
2
. Moreover, readers that are more interested
in the Nix project or any of its core concepts, may also find it interesting to
read Eelco Dolstra’s PhD thesis [Dolstra, 2006].
2
http://nixos.org
iv
A C K N O W L E D G E M E N T S
This thesis could not have come about without the help of many people. Fore-
most, I wish to thank Eelco Dolstra, the author of Nix. He was involved in
nearly every aspect of this thesis and he was always sharp enough to spot
minor inconsistencies in my writings that I overlooked. Moreover, I wish to
thank my co-promotor Eelco Visser, who gave me the opportunity to do this
research. His guidance and advice was most helpful. My promotor Arie van
Deursen helped me with many general aspects of research, such as putting a
contribution into the right context and how to critically look at scientific work.
Another name I should definitely mention is Merijn de Jonge, who wrote my
project proposal and was heavily involved in the early stages of my research
project. He gave me the inspiration to look at architectural concepts.
I would also like to thank the members of my reading committee, Roberto
di Cosmo, Sam Malek, Dick Epema, and Yao-hua Tan for reviewing my thesis.
They gave me some insights I would not have previously thought of, and I am
very happy that I was able to incorporate the majority of these new insights
into this thesis.
Other important people that have contributed to this thesis are Daniel M.
German, Julius Davies and Armijn Hemel. Together, we wrote the paper that
contributed to Chapter 10 by combining our efforts in both software deploy-
ment and license compliance engineering.
Of course, there are many others that have contributed to my efforts in
some way or another. Rob Vermaas, a fellow Nix-er and office mate (me, Eelco
Dolsta and him shared the same room) did many practical contributions to
Nix and many of the Nix subprojects. The three of us made a very interesting
expedition on the island of O’ahu (part of Hawaii) in a SUV.
Moreover, the other members of our SLDE subgroup were also most help-
ful. Our projects have some overleap on a practical side and we always had
very enjoyable daily stand up meetings in the coffee corner. Therefore, I
would like to thank Danny Groenewegen, Vlad Vergu, Gabriël Konat, Marin
Bravenboer (from whom I borrowed the template for this thesis), Maartje de
Jonge, Lennart Kats (who gave me a lot of advice on the practical details of
publishing a thesis), Sander Ver molen, and Zef Hemel. Coincidentally, the
latter four persons including me all seem to fit in my tiny car.
There are also various people at Philips, our project’s industry partner, that
I would like to thank. Erwin Bonsma was the developer responsible for the
build infrastructure of the Philips medical imaging platform and helped me
a lot with some practical aspects. I would like to thank Nico Schellingerhout,
Jan van Zoest, and Luc Koch for their continuous interest in my work and
giving me the opportunity to conduct experiments at Philips Healthcare. Wim
van der Linden was one of the developers of the SDS2 case study, who was
always willing to give me some technical advice when I needed it.
The #nixos IRC channel was a meeting point in Cyberspace where I got in
touch with many more people contributing to Nix from all over the world that
I all would like to thank. Particularly, Ludovic Courtès, Shea Levy, Nicolas
Preface v
Pierron and Lluís Battle i Rossell were developers I have been frequently in
touch with and gave me a lot inspiration that contributed to several parts of
this thesis.
I can also tell my readers that you should never underestimate the power
of secretaries. Both Esther van Rooijen (from our university) and Cora Baltes-
Letsch (from Philips) were very helpful in arranging many practical aspects
of my employment that saved me a lot of time. Besides my direct colleagues,
contacts and project partners, there have been many more colleagues from
our research group that have contributed to my work in some way or another,
either by just having a look at what I was doing or by having a discussion. I
have used Alberto González Sánchez thesis cover design as the source of the
layout for my cover. Thanks everyone!
Finally, I would like to thank my parents Hennie and Truus, and my brother
Dennis for their continuous support. Sometimes, when you are so busy doing
“complicated” things, you will get sloppy when it comes to simple things in
life. They have been most helpful in supporting me with the latter aspect.
Sander van der Burg
May 06, 2013
Numansdorp
vi
Contents
I Introduction 1
1 Introduction 3
1.1 Software deployment complexity . . . . . . . . . . . . . . . . . . 4
1.1.1 Early history . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Operating systems and high level languages . . . . . . . 5
1.1.3 Component-based systems . . . . . . . . . . . . . . . . . 6
1.1.4 Service-oriented systems and cloud computing . . . . . 8
1.2 Hospital environments . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Device-orientation . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Service-orientation . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.3 Agility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.4 Genericity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7 Outline of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8 Origin of chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Background: Purely Functional Software Deployment 23
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 The Nix package manager . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 The Nix store . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Specifying build expressions . . . . . . . . . . . . . . . . 25
2.2.3 Composing packages . . . . . . . . . . . . . . . . . . . . . 27
2.2.4 Building packages . . . . . . . . . . . . . . . . . . . . . . 28
2.2.5 Runtime dependencies . . . . . . . . . . . . . . . . . . . . 30
2.2.6 Nix profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.7 Atomic upgrades and rollbacks . . . . . . . . . . . . . . . 33
2.2.8 Garbage collection . . . . . . . . . . . . . . . . . . . . . . 33
2.2.9 Store derivations . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 NixOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 NixOS configurations . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Service management . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 NixOS modules . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.4 Composing a Linux system . . . . . . . . . . . . . . . . . 41
vii
2.4 Hydra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4.1 Defining Hydra build jobs . . . . . . . . . . . . . . . . . . 43
2.4.2 Configuring Hydra build jobs . . . . . . . . . . . . . . . 44
2.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 A Reference Architecture for Distributed Software Deployment 49
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Reference architectures . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Functional requirements . . . . . . . . . . . . . . . . . . . 50
3.3.2 Non-functional requirements . . . . . . . . . . . . . . . . 52
3.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4.1 Nix: Deploying immutable software components . . . . 54
3.4.2 NixOS: Deploying complete system configurations . . . 54
3.4.3 Disnix: Deploying services in networks of machines . . 54
3.4.4 Dynamic Disnix: Self-adaptive deployment of services
in networks of machines . . . . . . . . . . . . . . . . . . . 54
3.4.5 DisnixOS: Combining Disnix with complementary in-
frastructure deployment . . . . . . . . . . . . . . . . . . . 55
3.4.6 Dysnomia: Deploying mutable software components . . 55
3.4.7 grok-trace: Tracing source artifacts of build processes . . 55
3.5 Architectural patterns . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.1 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.2 Purely functional batch processing style . . . . . . . . . 56
3.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6.1 Hydra: Continuous building and integration . . . . . . . 58
3.6.2 webdsldeploy: Deploying WebDSL applications in net-
works of machines . . . . . . . . . . . . . . . . . . . . . . 58
3.7 An architectural view of the reference architecture . . . . . . . . 59
3.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
II Service Deployment 63
4 Distributed Deployment of Service-Oriented Systems 65
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Motivation: SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.5 Deployment process . . . . . . . . . . . . . . . . . . . . . 71
4.3 Disnix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
viii
4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.2 Building a service . . . . . . . . . . . . . . . . . . . . . . 75
4.3.3 Composing intra-dependencies . . . . . . . . . . . . . . . 77
4.3.4 Services model . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.5 Infrastructure model . . . . . . . . . . . . . . . . . . . . . 80
4.3.6 Distribution model . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4.1 Deployment process . . . . . . . . . . . . . . . . . . . . . 82
4.4.2 Disnix Service . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.3 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.4 Additional tools . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.1 Component-specific . . . . . . . . . . . . . . . . . . . . . 93
4.7.2 Environment specific . . . . . . . . . . . . . . . . . . . . . 94
4.7.3 General approaches . . . . . . . . . . . . . . . . . . . . . 94
4.7.4 Common practice . . . . . . . . . . . . . . . . . . . . . . . 96
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5 Self-Adaptive Deployment of Service-Oriented Systems 97
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.1 StaffTracker (PHP/MySQL version) . . . . . . . . . . . . 98
5.2.2 StaffTracker (Web services version) . . . . . . . . . . . . 99
5.2.3 ViewVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.4 SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Annotating Disnix models . . . . . . . . . . . . . . . . . . . . . . 101
5.3.1 Services model . . . . . . . . . . . . . . . . . . . . . . . . 102
5.3.2 Infrastructure model . . . . . . . . . . . . . . . . . . . . . 103
5.4 Self-adaptive deployment . . . . . . . . . . . . . . . . . . . . . . 103
5.4.1 Infrastructure generator . . . . . . . . . . . . . . . . . . . 105
5.4.2 Infrastructure augmenter . . . . . . . . . . . . . . . . . . 106
5.4.3 Distribution generator . . . . . . . . . . . . . . . . . . . . 107
5.5 Implementing a quality of service model . . . . . . . . . . . . . 108
5.5.1 Candidate host selection phase . . . . . . . . . . . . . . . 108
5.5.2 Division phase . . . . . . . . . . . . . . . . . . . . . . . . 111
5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.6.1 StaffTracker (PHP/MySQL version) . . . . . . . . . . . . 113
5.6.2 StaffTracker (Web services version) . . . . . . . . . . . . 113
5.6.3 ViewVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.6.4 SDS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.5 General observations . . . . . . . . . . . . . . . . . . . . . 114
5.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.8 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Contents ix
III Infrastructure Deployment 119
6 Distributed Infrastructure Deployment 121
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
121
6.2 Convergent deployment: Cfengine . . . . . . . . . . . . . . . . . 123
6.2.1 Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2 Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3 Other deployment approaches . . . . . . . . . . . . . . . . . . . 126
6.4 Congruent deployment . . . . . . . . . . . . . . . . . . . . . . . . 127
6.4.1 Specifying a network configuration . . . . . . . . . . . . 127
6.5 Implementation of the infrastructure deployment process . . . 130
6.6 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.8 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.10 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7 Automating System Integration Tests for Distributed Systems 135
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2 Single-machine tests . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2.1 Specifying and running tests . . . . . . . . . . . . . . . . 137
7.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3 Distributed tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.3.1 Specifying networks . . . . . . . . . . . . . . . . . . . . . 140
7.3.2 Complex topologies . . . . . . . . . . . . . . . . . . . . . 142
7.4 Service-oriented tests . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.4.1 Writing test specifications . . . . . . . . . . . . . . . . . . 144
7.4.2 Deploying a virtual network . . . . . . . . . . . . . . . . 146
7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.5.1 Declarative model . . . . . . . . . . . . . . . . . . . . . . 147
7.5.2 Operating system generality . . . . . . . . . . . . . . . . 148
7.5.3 Test tool generality . . . . . . . . . . . . . . . . . . . . . . 148
7.5.4 Distributed coverage analysis . . . . . . . . . . . . . . . . 149
7.5.5 Continuous builds . . . . . . . . . . . . . . . . . . . . . . 150
7.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
IV General Concepts 155
8 Atomic Upgrading of Distributed Systems 157
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2 Atomic upgrading in Nix . . . . . . . . . . . . . . . . . . . . . . 158
8.2.1 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.2.2 Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
x
8.2.3 Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.3 Distributed atomic upgrading . . . . . . . . . . . . . . . . . . . . 160
8.3.1 Two phase commit protocol . . . . . . . . . . . . . . . . . 160
8.3.2 Mapping Nix deployment operations . . . . . . . . . . . 161
8.4 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.4.1 Supporting notifications . . . . . . . . . . . . . . . . . . . 162
8.4.2 The Hello World example . . . . . . . . . . . . . . . . . . 163
8.4.3 Adapting the Hello World example . . . . . . . . . . . . 163
8.4.4 Other examples . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9 Deploying Mutable Components 169
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.2 The Nix deployment system . . . . . . . . . . . . . . . . . . . . . 170
9.3 Disnix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.4 Mutable components . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.5 Dysnomia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.5.1 Managing mutable components . . . . . . . . . . . . . . 173
9.5.2 Identifying mutable components . . . . . . . . . . . . . . 174
9.5.3 Extending Disnix . . . . . . . . . . . . . . . . . . . . . . . 175
9.6 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10 Discovering License Constraints using Dynamic Analysis of Build
Processes 181
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.2 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . 182
10.3 General Approach: Tracing Build Processes . . . . . . . . . . . . 184
10.3.1 Reverse-Engineering the Dependency Graph . . . . . . . 184
10.3.2 Why We Cannot Use Static Analysis . . . . . . . . . . . . 185
10.3.3 Dynamic Analysis through Tracing . . . . . . . . . . . . 187
10.4 Method: Tracing the Build Process through System Calls . . . . 187
10.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10.4.2 Producing the Trace . . . . . . . . . . . . . . . . . . . . . 189
10.4.3 Producing the Build Graph . . . . . . . . . . . . . . . . . 190
10.4.4 Using the Build Graph . . . . . . . . . . . . . . . . . . . . 193
10.4.5 Coarse-grained Processes . . . . . . . . . . . . . . . . . . 193
10.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.5.1 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.5.2 Usefulness . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.6 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Contents xi
V Applications 201
11 Declarative Deployment of WebDSL applications 203
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
11.2 WebDSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.3 Examples of using the WebDSL language . . . . . . . . . . . . . 205
11.3.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.3.2 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.3.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.4 Deploying WebDSL applications . . . . . . . . . . . . . . . . . . 208
11.4.1 Deploying the WebDSL compiler . . . . . . . . . . . . . . 208
11.4.2 Building a WebDSL application . . . . . . . . . . . . . . 208
11.4.3 Deploying WebDSL infrastructure components . . . . . 209
11.5 webdsldeploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.5.1 Single machine deployment . . . . . . . . . . . . . . . . . 209
11.5.2 Distributing infrastructure components . . . . . . . . . . 210
11.5.3 Implementing load-balancing . . . . . . . . . . . . . . . . 210
11.5.4 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.6.1 Building and configuring a WebDSL application . . . . . 212
11.6.2 Generating a logical network specification . . . . . . . . 214
11.6.3 Configuring WebDSL infrastructure aspects . . . . . . . 216
11.7 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
11.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12 Supporting Impure Platforms: A .NET Case Study 219
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
12.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
12.3 Global Assembly Cache (GAC) . . . . . . . . . . . . . . . . . . . 220
12.4 Deploying .NET applications with Nix . . . . . . . . . . . . . . . 221
12.4.1 Using Nix on Windows . . . . . . . . . . . . . . . . . . . 221
12.4.2 Build-time support . . . . . . . . . . . . . . . . . . . . . . 221
12.4.3 Run-time support . . . . . . . . . . . . . . . . . . . . . . . 223
12.4.4 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.5 Deploying .NET services with Disnix . . . . . . . . . . . . . . . 229
12.5.1 Porting the StaffTracker example . . . . . . . . . . . . . . 229
12.6 Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
12.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
VI Conclusion 235
13 Conclusion 237
13.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . 237
13.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
xii
13.2.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.2.2 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . 240
13.2.3 Genericity . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
13.2.4 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.2.5 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.3 Research questions revisited . . . . . . . . . . . . . . . . . . . . . 241
13.4 Recommendations for future work . . . . . . . . . . . . . . . . . 244
13.4.1 Network topologies . . . . . . . . . . . . . . . . . . . . . 244
13.4.2 Infrastructure-level self-adaptability . . . . . . . . . . . . 245
13.4.3 More declarative system integration testing of service-
oriented systems . . . . . . . . . . . . . . . . . . . . . . .
245
13.4.4 Test scalability . . . . . . . . . . . . . . . . . . . . . . . . 246
13.4.5 Patter ns for fully distributed atomic upgrades . . . . . . 247
13.4.6 Implementing a license calculus system . . . . . . . . . . 247
13.4.7 More sophisticated state deployment . . . . . . . . . . . 247
13.4.8 Extracting deployment specifications from codebases . . 247
13.4.9 Case studies with service composability . . . . . . . . . 248
13.4.10 Software deployment research in general . . . . . . . . . 248
Bibliography 251
Samenvatting 265
Curriculum Vitae 271
Titles in the IPA Dissertation Series 273
Contents xiii
xiv
Part I
Introduction
1
1
Introduction
Software has become increasingly important in our society. Computer pro-
grams are virtually everywhere and used to control critical functions, such as
the stock market, aircrafts, pacemakers, and other medical devices. Through-
out the years the way we use software has changed dramatically. Originally,
computer programs were designed for large expensive dedicated machines
with very limited access. Nowadays, computer programs are available as ser-
vices through the Internet and accessible by nearly anyone from almost any
place.
Furthermore, software systems have become bigger and more complicated,
while it is desired to develop them in short development cycles and to have
them work correctly. Because of these reasons, the software engineering disci-
pline has become an important academic research discipline. Software engi-
neering has been defined as:
Software Engineering (SE) is the application of a systematic, disciplined, quan-
tifiable approach to the development, operation, and maintenance of software,
and the study of these approaches; that is, the application of engineering to
software [The Joint Task Force on Computing Curricula, 2004].
The software engineering process consists of many activities, such as spec-
ifying requirements, creating a design, writing code, and testing. One of the
activities that is typically overlooked within the research community and un-
derestimated by practitioners is the software deployment process. A definition
of software deployment is:
Software deployment refers to all the activities that make a software system
available for use [Carzaniga et al., 1998].
Examples of activities that belong to the software deployment process are
building software from source code, packaging software, installing software
packages, activation of software components, and upgrading. For various rea-
sons, deployment processes turn out to be very complicated and error prone.
For the latest generation of software, which is offered as services through the
Internet, many new challenges have arisen in addition to software deployment
challenges that have arisen in the past.
This thesis is about dealing with software deployment complexities for the
latter generation of software systems. We define a reference architecture that
contains tools solving various deployment tasks, which can be integrated with
domain-specific components that solve deployment aspects for a specific do-
main. From this reference architecture, a concrete architecture for a domain-
specific deployment system can be derived offering fully automatic, reliable,
3
Figure 1.1 The Electrologica X1
reproducible, and efficient deployment. Although the idea of a deployment
architecture is not completely new, our reference architecture builds on top
of the purely functional deployment model [Dolstra, 2006], which has several
unique advantages compared to conventional deployment solutions.
1.1 S O F T WA R E D E P L O Y M E N T C O M P L E X I T Y
As mentioned previously, software deployment processes of software systems
have become very complicated, error prone, and tedious. Compared to many
other subfields within the software engineering domain (e.g. programming
languages), software deployment has not been a research subject within the
software engineering community until 1997 [van der Hoek et al., 1997].
1.1.1 Early history
When the first digital computers appeared in the 1940s and 1950s, software
deployment was a problem that did not exist. Most computer programs were
written directly in machine code for a given machine architecture. Sometimes,
a given machine architecture did not exist yet, as it still had to be constructed,
and only existed on paper.
In these days, writing programs was a ver y complex, challenging and te-
dious job. Apart from decomposing a problem in smaller sub problems that
could be programmed, a programmer also had to deal with many side issues,
such as managing hardware resources and properly accessing hardware pe-
ripherals, such as a printer. A famous example of developing programs in
such a style is described in Edsger Dijkstra’s PhD thesis [Dijkstra, 1959], in
4
Figure 1.2 Commodore 64 combo showing a BASIC prompt
which a real-time interrupt handler was developed for the Electrologica X1
(shown in Figure 1.1), one of the first computers built in the Netherlands.
Because programs were written directly in machine code for a given archi-
tecture, programs were not portable and were mostly discarded and rewritten
when newer computer models appeared. For these reasons, a program was
deployed only once.
1.1.2 Operating systems and high level languages
Manually managing system resources and peripherals was a very tedious
job. For that reason, the concept of the operating system was developed in the
late 1950s, providing a set of programs that manages computer hardware re-
sources and provides common services for application software [Tanenbaum
and Woodhull, 1997]. As a result, programmers no longer had to care very
much about the inner workings of hardware and programmers could focus
themselves more on solving actual problems.
Furthermore, writing machine code was also very tedious, error prone,
and difficult to comprehend by humans. Therefore, higher level languages
were developed. FORTRAN was among the first high level programming lan-
guages. Also, operating systems were eventually developed using higher level
languages. UNIX is a prominent example implemented in the C programming
language, which is still widely used nowadays.
The Commodore 64 (shown in Figure 1.2), which gave me my first pro-
gramming experience, is a good example of having an operating system pro-
viding an integrated high-level language interpreter. Apart from an operating
system and BASIC programming language interpreter stored in the ROM, a
complete program including its dependencies were stored on a disk or tape.
Because of these developments, it became easier to develop applications as
operating systems hide complexity and high level languages increased pro-
ductivity. As a consequence, the process of getting applications to work
Chapter 1. Introduction 5
Figure 1.3 The NixOS Linux distribution running the KDE Plasma desktop
(i.e. deployment) became harder. For example, in order to be able to use
a computer program implemented in a higher level language, we need the
right version of the compiler or interpreter with a compatible operating sys-
tem. If either of these components is not present or incompatible, the program
may not work as expected or may not work at all. Fortunately, the versions of
operating systems or programming language compilers did not change that
frequently to make software deployment a real burden.
1.1.3 Component-based systems
The early 1980s were the era in which the graphical desktop environment
slowly gained acceptance. Also the component-based software engineering
(CBSE) discipline was introduced. Software programs were no longer self-
contained units, but started reusing components (such as libraries) developed
by third-parties, of which some of them already resided on the target ma-
chines [Szyperski et al., 2002].
CBSE greatly improved programmer productivity. For example, to develop
a program for a desktop environment, developers no longer had to implement
GUI widgets over and over again. Furthermore, the ability to share software
components also reduced the amount of required disk space and the amount
of RAM that programs require.
Although CBSE provides a number of great advantages, it also introduced
additional complexity and challenges to software deployment processes. In
order to be able to run a software program, all dependencies must be present
and correct and the program must be able to find them.
It turns out that there are all kinds of things that can go wrong while
6
deploying a componentised system. A dependency may be missing, or a pro-
gram requires a newer version of a specific component. Newer components
may be incompatible with a program. Sometimes incompatibility is inten-
tional, but it also happens accidentally due to a bug on which a program may
rely. For example, in the Microsoft Windows community this phenomenon
is known as the “DLL-hell” [Pfeiffer, 1998]. Similar to Windows DLLs, this
phenomenon occurs in many different contexts as well, such as the “JAR-hell”
for Java programs.
In UNIX-like systems such as Linux, the degree of sharing of components
through libraries is raised to almost a maximum. For these kind of systems,
it is crucial to have deployment tooling to properly manage the packages
installed on a system. In Linux distributions, the package manager is a key
aspect and a distinct feature that sets a particular Linux distribution apart
from another. There are many package mangers around such as RPM [Foster-
Johnson, 2003], dpkg, portage, pacman, and Nix (which we extensively use in
our research).
Figure 1.3 shows a screenshot of the KDE Plasma desktop
1
as part of the
NixOS Linux distribution. At the time of writing this thesis, the base package
of KDE 4.8 without any add-ons is composed of 173 software packages
2
.
Apart from the challenges of deploying a system from scratch, many prob-
lems arise when systems are upgraded. Upgrading is necessary because it is
too costly and time consuming to redeploy them from scratch over and over
again. In most cases, upgrading is a risky process, because files get modi-
fied and overwritten. An interruption or crash may have disastrous results.
Furthermore, an upgrade may not always give the same result as a fresh in-
stallation of a system.
In addition to technical challenges, there are also several important non-
functional challenges while deploying systems using off-the-shelf components.
An important non-functional requirement may be the licenses under which
third party components are released. Nowadays, many Free and Open-Source
components [Free Software Foundation, 2012b; Open Source Initiative, 2012]
are incorporated into commercial products
3
. Although most of these compo-
nents can be downloaded for free from the Internet, they are not in the public
domain. They are in fact copyrighted and distributed under licenses ranging
from simple permissive ones (allowing one to do almost anything including
keeping modifications secret) to more complicated “copyleft” licenses [Free
Software Foundation, 2012a] imposing requirements on derived works. Not
obeying these licenses could result in costly lawsuits by copyright holders.
Therefore, it is important to know exactly how a system is composed from
1
http://www.kde.org
2
This number is determined by running the following command-line instruction: nix-store
-qR /var/run/current-system/sw/bin/startkde | wc -l
3
The meaning of the Free and Open-Source terms are often misunderstood. They do not refer
to software which is available for free (a.k.a. gratis) or software for which only the source code is
available. Both are about software, which may be freely used for any goal, distributed, studied,
modified and even sold for any price. I wrote a blog post about this for more clarification. [van der
Burg, 2011f]
Chapter 1. Introduction 7
Figure 1.4 A collection of web applications and services
what components and under what licenses these components are distributed,
which is not trivial for deployment processes performed in an ad-hoc manner.
1.1.4 Service-oriented systems and cloud computing
The rise of the Internet and the World-Wide-Web (WWW) in the mid 1990s
caused a paradigm shift in the way some applications were offered to end-
users. Applications no longer had to be installed on the machines of end
users, but were accessed by using a web browser, also called a thin client. The
actual computer system processing user tasks was no longer located on the
client machine, but on a server hosting the web application.
Furthermore, the Internet protocols and file formats changed the way ap-
plications were developed and constructed. The service-oriented computing
(SOC) paradigm became a popular way to rapidly develop, low-cost, interop-
erable, evolvable, and massively distributed applications [Papazoglou et al.,
2007].
The SOC-vision is often realised through implementing a Service Oriented
8
Architecture (SOA) in which a system is decoupled into “services”, which
are autonomous, platform-independent entities that can be described, pub-
lished, discovered, and combined which can perform functions ranging from
answering simple requests to sophisticated business processes.
Cloud computing is a relatively new term coined for a new generation of
software systems where applications are divided into sets of composite ser-
vices hosted on leased, highly distributed platforms [Grundy et al., 2012].
Cloud computing providers offer services, that can be roughly divided in
three categories:
Infrastructure-as-a-Service (IaaS) is a service model providing computer
hardware resources, typically in the form of virtual machines. Cus-
tomers get charged based on the amount of system resources they are
using. IaaS typically provides simpler and more flexible hardware man-
agement.
Platform-as-a-Service (PaaS) is (in most cases) a solution stack providing
an application server, which includes an operating system, execution en-
vironment, database, web server and other relevant infrastructure com-
ponents. Application developers can develop and run their software ap-
plications on a PaaS platform without the cost and complexity of buying
and managing the underlying hardware and infrastructure components.
Software-as-a-Service (SaaS) provides application software to end-users,
which can be typically accessed by using thin clients. For these kinds
of services, all resources including the deployment of the service compo-
nents and underlying infrastructure are managed by the service provider.
SaaS applications may also utilise PaaS or IaaS services to provide their
hardware resources and infrastructure components.
Although there is no clear consensus what cloud computing exactly means,
in this thesis we regard it as a continuation of the service-oriented paradigm,
in which services are offered through the Internet, including all their required
resources such as hardware, software and their configuration management
issues.
The latest generation of software systems typically offered as a service on
the Internet, offer a number of additional benefits compared to traditional
desktop applications, described in the previous subsection. End users are
no longer bothered by deployment issues and they always have access to the
latest versions of their applications including their data, which are accessible
through their web browser from virtually any place.
Figure 1.4 shows a collection of prominent web applications and services,
such as Twitter
4
, Amazon
5
and Facebook
6
. Although these applications seem
relatively simple software systems, they are in fact very complicated and het-
erogeneous, as they are composed of many distributable software components
4
http://www.twitter.com
5
http://www.amazon.com
6
http://www.facebook.com
Chapter 1. Introduction 9
developed by various parties, implemented using various programming lan-
guages and various types of components, deployed on several kinds of ma-
chines (having different characteristics) and on various places in the world.
All these services are cooperating together to achieve a common goal.
The web seems to have hidden most deployment issues to end-users, but
this does not mean that the software deployment problem is solved or not im-
portant anymore. Although the software deployment problems have moved
from end-user machines to data-centers, all the new possibilities of service-
orientation introduce even more challenges to the deployment process of soft-
ware systems, next to a number of existing complexities that have arisen in
the past. Since software components are distributed across various machines,
more machines need to be installed, configured and upgraded. Distributable
components have to connect to other services and may be dependent on each
other. Breaking such dependencies may render a software system partially or
completely unavailable.
Furthermore, there are all kinds of non-functional deployment properties
that must be met, such as performance and privacy concerns. For example,
if we deploy a component to a server with insufficient system resources, the
entire service may not attract any users. It is also undesirable to deploy a
privacy-sensitive dataset in a zone with public access.
1.2 H O S P I TA L E N V I R O N M E N T S
The research done in this thesis was carried out in a NWO/Jacquard pro-
ject abbreviated PDS (Pull Deployment of Services), which is a collaboration
project between Delft University of Technology and Philips Healthcare. The
research goal of this thesis is inspired by the medical domain. As with many
other domains, software systems in the medical domain are becoming service-
oriented. Apart from the software deployment challenges described in the
previous section, hospital environments have a number of distinct character-
istics and deployment challenges that require extra care.
1.2.1 Background
Hospitals are complex organisations, requiring the coordination of specialists
and support staff operating complex medical equipment, involving large data
sets, to take care of the health of large numbers of patients. The use of in-
formation technology for diagnosis and for storage and access of patient data
is of increasing importance. Hospitals are evolving into integrated informa-
tion environments, where patient data, ranging from administrative records
to high-density 3D images, should be accessible in real time at any place in
the hospital.
Data is used by people in different roles such as doctors, nurses, analysts,
administrators, and patients. Each user group uses different portions of the
data for different purposes and at different locations, requiring careful ad-
ministration and application of access rights. Data is also accessed by medical
10
Figure 1.5 Hospital environment complexity in a nutshell
equipment and software, for example, to improve diagnosis by combining in-
formation from multiple sources. A picture symbolising this complexity is
shown in Figure 1.5.
1.2.2 Device-orientation
The infrastructure of typical hospital environments is currently mostly device-
oriented. That is, the components implementing a workflow are statically de-
ployed to fixed devices, which leads to overcapacity due to suboptimal usage
of resources. Resources are reserved for particular workflows, even if not
used. This leads to inflexibility in reacting to events, a multitude of deploy-
ment and maintenance scenarios, and it requires users to go to the device that
supports a particular task. Because of these problems, the medical world is
changing to a service-oriented environment in which the access to services is
decoupled from the physical access to particular devices. That is, users should
be able to access data and perform computations from where they are, instead
of having to go to a particular device for realising a task.
However, the information technology infrastructure of hospitals is hetero-
geneous and consists of thousands of electronic devices, ranging from work-
stations to medical equipment such as MRI scanners. These devices are con-
nected by wired and wireless networks with complex topologies with differ-
ent security and privacy policies applicable to different nodes. These devices
have widely varying capabilities in terms of processing speed, graphical ren-
dering performance, storage space and reliability, and so on.
1.2.3 Service-orientation
To support running applications in such an environment, hospital machines
have to be treated as a dynamic cloud, where components of the applica-
tion are automatically deployed to machines in the cloud with the required
capabilities and connectivity. For instance, when starting a CPU-intensive
application (e.g., a viewer for 3D scans) on a sufficiently powerful PC, the
computation component of the application would be deployed automatically
Chapter 1. Introduction 11
to the local machine. On the other hand, if we ran it on a underpowered PDA,
this component would be deployed to a fast server sufficiently close enough
to the PDA in the network topology.
This kind of cloud deployment requires two things. First, it is necessary to
design applications in a way that allows them to be distributed across different
nodes dynamically, and to create a model of applications that describe their
components and the dataflows between them. These components can then
be mapped onto nodes in the cloud with the appropriate quality-of-service
characteristics.
Second, given a mapping from components to machines in the cloud, it
is necessary to deploy each component to its selected machine. Software de-
ployment in a heterogeneous environment is inherently difficult. Moreover,
maintaining such installations is even more difficult, because of the growing
amalgam of versions and variants of the software in combination with chang-
ing requirements. The practice of software deployment of complex medical
software in hospital environments is based on ad-hoc mechanisms, making
software deployment a semi-automatic process requiring significant human
intervention. Thus, it is essential that deployment is automatic and reliable;
the deployment of a component to a node should not interfere with other
applications or other versions of the component running on the node.
1.3 C H A L L E N G E S
What may have become evident after reading the previous sections is that
changing non-functional requirements and various developments in software
engineering have significantly improved the quality and availability of soft-
ware systems, reduced the amount of required system resources and increased
the productivity of programmers. But the negative side effect is that software
deployment processes of software systems have become increasingly more
difficult. Especially for the latest generation of software systems offered as
services, a number of important challenges have to be dealt with.
1.3.1 Complexity
Software systems are becoming larger, which often implies that many compo-
nents must be deployed, for which it is required to perform many deployment
tasks, such as building, installing and activating. Apart from the large num-
ber of tasks that must be performed, components may also be implemented in
various programming languages and designed for various types of operating
systems. For large systems, it is practically infeasible to perform deployment
processes manually.
Moreover, we have also seen that most software systems are rarely self-
contained. They have many dependencies which must be present and correct,
both at build-time and run-time. To ensure correctness, it is required that
developers and system administrators must exactly know what the depen-
dencies of a component are, which is not always obvious. As a consequence,
12
it often happens that dependencies are missed. For these reasons, systems
are sometimes deployed from scratch rather than upgraded, because upgrade
actions may break a system.
Apart from correctness to the deployment process, a number of non-functional
requirements may also be very important, such as performance, privacy and
the licenses under which components are governed. The hospital environ-
ment challenges described in Section 1.2 can be considered a special case of
service-orientation, with a higher diversity of devices with varying capabili-
ties and more kinds of variants of components which can be combined using
various means.
1.3.2 Reliability
Besides the fact that dependencies may be missed during an installation or an
upgrade, most deployment tasks are typically carried out in an imperative and
destructive manner. For example, during an upgrade, files may be removed,
adapted or overwritten. These kind of operations are difficult to undo, and
an interruption of such a process may leave a system in an inconsistent state.
Moreover, an upgrade does not always yield the same result as a fresh instal-
lation and may give unexpected results.
Another undesirable side-effect of imperative upgrades is that there may
be a large time-window in which the system is partially or completely un-
available to end-users.
In addition to the reliability of deployment steps, systems deployed in a
distributed environment may also crash and render parts of a system unavail-
able. In such cases, we must redeploy a system, which is practically infeasible
to do in a short time window for systems deployed in an imperative and
ad-hoc manner.
1.3.3 Agility
In addition to the fact that software deployment complexity has increased, we
also want to develop and improve software systems faster. Whereas in the old
days, a programmer had to almost completely know the design of a system
in advance (e.g. waterfall methods) before it can be developed, nowadays
agile methods [Beck et al., 2001] are more common, with short development
cycles of about two weeks, in which an initial prototype is developed and
continuously improved to become the final product. After each development
cycle, a new version of a software system must be deployed, so that end-users
can try a new version of the application.
Agile software deployment has manifested itself in a term known as Dev-
Ops, an emerging set of principles, methods and practices for communication,
collaboration and integration between software development and IT opera-
tions professionals [Pant, 2009], because traditionally developers and IT oper-
ators performed separate roles, which could be better performed if these were
more integrated.
Chapter 1. Introduction 13
Furthermore, software systems are not only deployed in production envi-
ronments, but also in test environments so that integration tests can be per-
formed. Because deployment is very complicated (especially for distributed
systems), people refrain from doing continuous integration tests for these
kinds of systems and most importantly if these systems are tested they
are only tested on single machines and not in a distributed setting. Many
types of bugs and other issues only arise when software systems are tested in
real distributed settings [Rutherford et al., 2008].
Finally, once a system has been successfully tested in a test environment,
it may also be difficult to guarantee that we can reproduce the exact same
configuration in a production environment.
1.3.4 Genericity
Traditionally, software deployment processes were performed semi-
automatically and in an ad-hoc fashion. For large and complex systems, semi-
automated deployment is infeasible and therefore automation is required to
perform these processes faster and to make them less error prone.
Nowadays, there are many tools that can be used to automate software
deployment tasks. However, they have various drawbacks. Some tools are de-
signed for specific types of components, such as Java EE applications [Akker-
man et al., 2005], and other tools for specific environments, such as grid com-
puting [Caron et al., 2006]. There are also several general deployment solu-
tions, such as the Software Dock [Hall et al., 1999]. Apart from their limited
scope and applicability, these tools imperatively modify the configuration of
systems, so that it is hard to reproduce a configuration elsewhere and to roll
back changes.
Modern software systems, such as service-oriented systems, may be com-
posed of services implemented in various programming languages, using var-
ious components technologies and deployed on machines in a network run-
ning various types of operating systems. Using existing deployment tools
make it difficult to support heterogeneous software systems and to ensure
important quality attributes, such as reliability.
1.4 P R O B L E M S TAT E M E N T
Software system are developed with certain goals, features and behaviour in
mind by developers. Once a software system is to be used by end-users, it has
to be made available for use in the consumer environment or for the latest
generation of systems in a cloud environment hosted in a data-center. It is
important that the software system behaves exactly the way the developers
intended.
The software deployment vision that we explore in this thesis is to com-
pletely automate deployment processes of the latest generation of software
systems and to perform such processes in a reliable, reproducible and efficient
manner. The intention is that software systems can be deployed frequently
14
and on demand. For example, we can reproduce deployment scenarios in
various environments, either physical hardware, cloud environments or vir-
tual machines for end-users or for testing. We have seen that this vision is
hard to achieve due to the complexities that newer developments have intro-
duced.
Nix [Dolstra et al., 2004; Dolstra, 2006], which has a purely functional de-
ployment model, is a deployment system developed by Eelco Dolstra as part
of his PhD research, which intends to realise the vision of automated, com-
plete, reliable and reproducible deployment. The Nix package manager is a
model-driven deployment tool, which builds packages from Nix expressions.
Furthermore, it borrows concepts from purely functional programming lan-
guages [Hudak, 1989], such as Haskell, to make deployment more reliable.
Most notably, Nix is used as a basis for NixOS [Dolstra and Löh, 2008;
Dolstra et al., 2010], a GNU/Linux distribution built around the Nix pack-
age manager. Just as Nix realises packages from declarative specifications,
NixOS a Linux distribution which derives complete system configurations
from declarative specifications.
Nix and NixOS are solutions for several deployment aspects in our deploy-
ment vision. However, they are designed for deployment of single systems.
Furthermore, they are general solutions not taking non-functional aspects of
a domain into account, which cannot be solved in a generic manner.
In order to support the latest generations of systems offered as services,
we have to support a number of distributed deployment aspects as well as
a number of non-functional deployment properties of a certain domain. As
modern systems are distributed, heterogeneous and have other requirements
that cannot be solved in a generic manner, it is impossible to develop a sin-
gle deployment tool that is suitable for all domains. Instead, we intend to
realise our deployment vision by means of a reference architecture [Bass et al.,
2003; Taylor et al., 2010] describing a family of domain-specific deployment
tools, containing components implementing various deployment concepts in
which domain-specific extensions can be integrated. A reference architecture
is defined by Taylor et al. [2010] as:
The set of principal design decisions that are simultaneously applicable to mul-
tiple related systems, typically within an application domain, with explicitly
defined points of variation.
Our reference architecture can be used to realise a concrete architecture
for a domain-specific deployment tool, capable of automating deployment
processes in a reliable, reproducible, and efficient manner taking domain-
specific deployment aspects into account.
Although the idea of using software architecture concepts to automate de-
ployment processes is not entirely new (for example it has been done in the
IBM Alpine project [Eilam et al., 2006]) the major difference of our approach
is that we use the purely functional deployment model as a basis. This offers
a number of advantages, but also a number of challenges, mainly due to a
number of important non-functional requirements that are desired to met.
Chapter 1. Introduction 15
1.5 R E S E A R C H Q U E S T I O N S
The goal of this thesis is to design a reference architecture for distributed
software deployment. To implement the components of this reference archi-
tecture, we have to provide answers for the following research questions:
Research Question 1
How can we automatically deploy heterogeneous service-oriented systems in a
network of machines in a reliable, reproducible and efficient manner?
The Nix package manager has been developed as a generic package man-
agement tool for deploying packages on single systems. Service-oriented sys-
tems are composed of distributable components (also called services) using
various technologies, which may be interconnected and can be deployed in
networks of machines having different operating systems, architectures and
underlying infrastructure components. In order to automate the deployment
of these kind of systems, additional specifications and tools are required, that
properly utilise the purely functional deployment concepts of Nix.
Apart from the ser vice components that constitute a service-oriented sys-
tems, we also have to deploy the underlying infrastructure components, such
as application servers and a DBMS as well as complete operating systems
supporting essential system services. Furthermore, system configurations can
also be deployed in various environments, such as physical machines, virtual
machines hosted in a cloud environment or hybrid variants. Furthermore, in
order to be flexible, it is desirable to be able to reproduce configurations in
various environments, such as moving network of machines from one IaaS
provider to another.
Research Question 2
How can we efficiently deal with deployment related events occurring in a net-
work of machines?
After a service-oriented system has been deployed in a network of ma-
chines, it is not guaranteed that it will stay operational, as many events may
occur. For example, a machine may crash, a network link may break, a new
machine offering additional system resources can be added to a network or
a machine may be upgraded. In such cases, redeployment is needed to keep
the overall system operational and to meet all the required non-functional
deployment properties.
Furthermore, manually redeploying distributed systems in such cases can
be ver y labourious and error prone. Therefore, it is desirable to explore
whether we can automatically redeploy in case of such events.
It is also possible to derive many infrastructure properties dynamically, by
using a discovery service which reveals machine characteristics and by using
deployment planning algorithms, which dynamically map services to machines
with the right characteristics. Characteristics can be both technical or non-
functional and specific to the domain in which the system is to be used. In
16
order to support these non-functional properties, we need to craft an extended
deployment tool which properly supports these requirements.
Research Question 3
How can we declaratively perform distributed system integration tests?
As mentioned earlier, systems should not only be deployed in production
environments, but also in test environments so that system integration tests
can be performed. A good practice used for non-distributed componentised
systems, is to do continuous integration and testing, i.e. for a change in the
source code repository the package gets build and tested on a build server.
For many GNU/Linux source packages, a test procedure can be executed by
running: make check on the command-line.
For distributed systems, it is desirable to have an equivalent procedure,
but this vision is difficult to achieve, because deployment is very difficult and
expensive. As a consequence, people typically refrain from testing systems in
distributed environments, which is undesirable. Certain errors do not mani-
fest themselves on single systems. For all these reasons, it is desirable to find
a solution for this problem.
Research Question 4
How can we automatically and reliably deploy mutable software components in
a purely functional deployment model?
Nix is known as the purely functional deployment model, because it bor-
rows concepts from purely functional programming languages. As a result,
one if its characteristics is that components which have been built are made
immutable, to make deployment deterministic and reproducible. Unfortu-
nately, not all types of components can be managed in such a model, such as
databases. Apart from an initialisation step on first usage, these type of com-
ponents still need to be deployed manually, which can be a burden in a large
network of machines. In order to provide a solution for this complexity, it is
desirable to have a solution that deploys mutable components with semantics
close to Nix.
Research Question 5
How can we determine under which license components are governed and how
they are used in the resulting product?
We have also explained that license governance is a ver y important non-
functional requirement, while deploying a software system. It is undesirable
to deploy a system which violates the licenses under which the third party
components are governed, as this may result in costly lawsuits by copyright
holders. Therefore, we must exactly know what files are used to produce a
specific artifact, such as an executable, and we must know how they are used.
Chapter 1. Introduction 17
X
X
X
X
X
X
X
X
X
X
RQ
Chapter
4 5 6 7 8 9 10 11 12
1
2
3
4
5
6
Table 1.1 Overview of the research questions and the covering chapters
Research Question 6
How can we apply the components of the reference architecture?
The goal of the reference architecture for distributed software deployment
is a means to realise a concrete architecture for a domain-specific deploy-
ment tool. Therefore, it is desirable to explore how we can implement such a
domain-specific tool and what lessons we can learn from it.
Furthermore, the Nix package manager (also known as the purely func-
tional deployment model) is typically applied in somewhat idealised environ-
ments in which components can be easily isolated. However, in practice, it
may happen that large existing codebases must be deployed using compo-
nents from the reference architecture in closed environments which cannot be
completely isolated. In such environments, adopting components from our
reference architecture is a non-trivial process and sometimes compromises
must be made. Therefore, it is important to explore the application of these
components in impure environments and to derive additional solutions and
compromises.
1.6 A P P R O A C H
Table 1.1 provides an overview of questions that were investigated and their
corresponding chapters. To answer the research questions, we first thoroughly
explore the concepts of Nix and NixOS which ser ve as the foundations of the
reference architecture, as their concepts are quite uncommon. From our re-
quirements and these concepts, we derive the structure of our reference archi-
tecture for distributed software deployment including a number of important
quality attributes.
The next step is to investigate how we can extend the concepts of Nix to
service-oriented systems, that is, systems composed of components that can
be distributed across machines in a network. Then we explore another impor-
tant aspect, namely the deployment of machine configurations in a network
using the same deployment properties, such as reliabilility, reproducibility
and efficiency.
18
After exploring two distributed deployment aspects that must be captured
in our reference architecture, we try to improve and implement a number of
general orthogonal concepts. For reliability, it is desirable to have distributed
atomic upgrades. We must also explore how to manage components that
cannot be managed through the purely functional model. Moreover, it is
important to find a solution to the license governance problem.
Then we investigate how we can apply the components of our reference ar-
chitecture to realise a concrete architecture for a domain-specific deployment
tool as well as how we can apply these components in impure environments.
Finally, we reflect on the reference architecture and we investigate how well
we have achieved our desired quality attributes.
1.7 O U T L I N E O F T H I S T H E S I S
Chapter 2 gives background information about Nix, which is also known as
the purely functional deployment model, and two relevant applications built
on top of it; NixOS, an entire Linux distribution utilising Nix and Hydra, a Nix-
based continuous build and integration service. This background information
serves as an important foundation of this thesis and is later used in Chapter 3
to derive a reference architecture for distributed software deployment, which
can be used to create a concrete architecture for a domain-specific deployment
tool.
The following two chapters cover service deployment. Chapter 4 describes
the first component of the reference architecture: Disnix, a tool that can be
used to deploy services in networks of machines. Disnix extends Nix with
additional models and provides management of inter-dependencies, which are
dependencies among services which may reside on different machines in the
network. Chapter 5 describes a self-adaptive deployment extension to Disnix,
which automatically redeploys service-oriented systems in case of an event.
The next two chapters cover infrastructure deployment aspects. Chapter 6
describes DisnixOS an extension to Disnix which uses Charon, a NixOS-based
deployment tool that manages complete system configurations in a network
of machines, to provide complementary infrastructure deployment. Chapter 7
demonstrates how the network models of the previous chapter can be used for
creating efficient networks of virtual machines and we show how to perform
distributed system integration tests in these virtual networks.
The following two chapters describe several important general ingredients
of the reference architecture. Chapter 8 describes how the distributed atomic
upgrading property is achieved in our deployment tools.
Chapter 9 describes Dysnomia, a proof-of-concept tool, for deploying mu-
table software components. Currently, Nix only support automated deploy-
ment component of immutable components, which never change after they
have been built. Databases, which are mutable, cannot be automatically de-
ployed using Nix. Dysnomia provides a solution for this problem and can be
integrated in Disnix.
Chapter 1. Introduction 19
In Chapter 10, we provide a solution for the licensing problem of software
components, by dynamically tracing build processes performed by Nix. These
traces can be used to produce graphs and can reveal what files are used during
a build and how these files are combined into the resulting artifacts, such as
an executable. This information can be used to determine whether a particular
deployment scenario does not violate a specific license policy.
The last chapters describe a number of applications using tools part of the
reference architecture. Chapter 11 demonstrates how the reference architec-
ture can be used to provide a deployment solution for a particular applica-
tion domain. We have developed an application specific deployment tool for
WebDSL [Visser, 2008], a domain-specific language for developing web appli-
cations with a rich data model. Chapter 12 describes experiences from our
collaboration with Philips Healthcare, in which we show how to apply pieces
of the reference architecture in impure environments.
1.8 O R I G I N O F C H A P T E R S
Parts of this thesis originate from a number of peer-reviewed publications:
The SCP 2012 paper: “Disnix: A toolset for distributed deployment”
[van der Burg and Dolstra, 2012], as well as the earlier WASDeTT 2010
version of this paper [van der Burg and Dolstra, 2010d], covering devel-
opment aspects and experimental applications, serve as the "backbone"
of this thesis. Their sections have been integrated in many chapters of
this thesis.
Section 1.2 of this introduction is based on Sections 1-3 of the ICSE
Cloud 2009 paper: Software Deployment in a Dynamic Cloud: From Device
to Service Orientation in a Hospital Environment [van der Burg et al., 2009].
Chapter 3 includes several small snippets taken from Section 6 of the
SCP 2012 paper.
Chapter 4 is based on the SEAA 2010 paper: Automated Deployment
of a Heterogeneous Service-Oriented System [van der Burg and Dolstra,
2010a] combined with Sections 1-6 of SCP 2012 paper, augmented with
additional details and clarifications.
Chapter 5 is an extension of the SEAMS 2011 paper: A Self-Adaptive
Deployment Framework for Service-Oriented Systems [van der Burg and
Dolstra, 2011].
The DisnixOS section of Chapter 6 is based on Section 7 of the SCP
paper.
Chapter 7 is an extension of the ISSRE 2010 paper: Automating Sys-
tem Tests Using Declarative Virtual Machines [van der Burg and Dolstra,
2010b].
20
Chapter 8 is loosely based on the HotSWUp 2008 paper: Atomic Up-
grading of Distributed Systems [van der Burg et al., 2008], updated to
reflect the current implementation and extended to cover both service
and infrastructure deployment.
Chapter 9 is an adaption of the HotSWUp 2012 paper: A Generic Ap-
proach for Deploying and Upgrading Mutable Software Components [van der
Burg, 2012].
Chapter 10 is an adaption of the Technical Report: Discovering Soft-
ware License Constraints: Identifying a Binary’s Sources by Tracing Build Pro-
cesses [van der Burg et al., 2012].
Chapter 11 contains snippets taken from Section 6 and 7 from the WAS-
DeTT 2010 paper.
The following technical report has not been directly used as a basis for
any chapter, but has contributed significantly to this thesis. In this paper, we
have integrated the deployment and testing disciplines for distributed sys-
tems, which is a vision we intend to realise in this thesis:
The technical report: Declarative Testing and Deployment of Distributed
Systems [van der Burg and Dolstra, 2010c].
The testing aspects of this paper have been continued in the ISSRE paper.
Chapter 1. Introduction 21
22
2
Background: Purely Functional Software
Deployment
A B S T R A C T
The goal of this thesis is to design a reference architecture for distributed
software deployment, utilising the Nix package manager which implements the
purely functional deployment model. Nix offers several unique advantages
over conventional deployment approaches. Because this deployment model is
relatively unknown, this chapter explains the background concepts. The con-
cepts described in this chapter are earlier scientific contributions and updated
to reflect the implementation as it is today.
2.1 B A C K G R O U N D
The Nix package manager has been developed as part of Eelco Dolstra’s PhD
thesis [Dolstra, 2006; Dolstra et al., 2004] and has been designed to overcome
certain limitations of conventional package managers and deployment tools,
such as RPM [Foster-Johnson, 2003], by borrowing concepts from purely func-
tional programming languages, such as Haskell [Hudak, 1989].
A big drawback of conventional deployment tools is that most of them do
not manage packages in isolation, but instead their contents are stored in global
directories, such as: /usr/lib and /usr/include on Linux or in C:\Windows\System32
on Microsoft Windows platforms. The contents of these directories are shared
with other components.
Global directories introduce several problems to the reliability and repro-
ducibility of deployment processes. For example, dependency specifications
of packages could be incomplete, while the package may still be successfully
built on a particular machine, because a missing dependency can still be im-
plicitly found. Furthermore, it is also possible that the deployment of a pack-
age may imperatively modify or remove files belonging to another package,
which could break software installations.
Another major drawback is that conventional package managers use nom-
inal dependency specifications. For example, an RPM package may specify a
dependency by using a line, such as: Requires: openssl >= 1.0.0 to indicate that
a package requires OpenSSL version 1.0.0 or later. A nominal specification
has several drawbacks. For example, the presence of an OpenSSL package
conforming to this specification, may not always give a working deployment
scenario. An example is that OpenSSL may be compiled with an older version
of GCC having an Application Binary Interface (ABI) which is incompatible
with the given package.
23
Upgrades of systems are also potentially dangerous and destructive, as these
imperative operations cannot be trivially undone and most importantly, when
upgrading, a system is temporarily inconsistent because packages have files
belonging to both the old version and new version at the same time. It may
also very well happen that an upgrade of an older component does not give
the same result as a fresh installation.
2.2 T H E N I X PA C K A G E M A N A G E R
The Nix package manager has been designed to overcome the limitations de-
scribed earlier. In Nix, every package is stored in isolation by taking all build
dependencies into account. Furthermore, dependencies are statically bound
to components so that, at runtime, different versions of dependencies do not
interfere with each other. Nix addresses dependencies using an exact depen-
dency specification mechanism, which uniquely identifies a particular version
or variant of a component. Furthermore, the unorthodox concepts of Nix pro-
vide several other useful features, such as a purely functional domain-specific
language (DSL) to declaratively build packages, the ability to perform atomic
upgrades and rollbacks in constant time, transparent source/binary deploy-
ment and a garbage collector capable of safely removing packages which are
no longer in use.
2.2.1 The Nix store
In order to achieve the vision of reliable, pure and reproducible deployment,
Nix stores components in a so called Nix store, which is a special directory in
the file system, usually /nix/store. Each directory in the Nix store is a compo-
nent stored in isolation, since there are no files that share the same name. An
example of a component name in the Nix store is: /nix/store/1dp59cdv...-hello-2.7.
A notable feature of the Nix store are the component names. The first part
of the file name, e.g. 1dp59cdv... is a SHA256 cryptographic hash [Schneider,
1996] in base-32 notation of all inputs involved in building the component.
The hash is computed over all inputs, including:
Sources of the components
The script that performed the build
Any command-line arguments or environment variables passed to the
build script
All build-time dependencies, such as the compiler, linker, libraries and
standard UNIX command-line tools, such as cat, cp and tar
The cryptographic hashes in the store paths serve two main goals:
Preventing interference. The hash is computed over all inputs involving
the build process of the component. Any change, even if it is just a
24
{stdenv, fetchurl, perl}: 1
stdenv.mkDerivation { 2
name = "hello-2.7"; 3
src = fetchurl { 4
url = mirror://gnu/hello/hello-2.7.tar.gz;
sha256 = "1h17p5lgg47lbr2cnp4qqkr0q0f0rpffyzmvs7bvc6vdrxdknngx";
};
buildInputs = [ perl ]; 5
meta = { 6
description = "GNU Hello, a classic computer science tool";
homepage = http://www.gnu.org/software/hello/;
};
}
Figure 2.1 applications/misc/hello/default.nix: A Nix expression for GNU Hello
white space in the build script, will be reflected in the hash. For all de-
pendencies of the component, the hash is computed recursively. If two
component compositions are different, they will have different paths in
the Nix store. Because hash codes reflect all inputs to build the com-
ponent the installation or uninstallation step of a component will not
interfere with other configurations.
Identifying dependencies. The hash in the component file names also pre-
vents the use of undeclared dependencies. Instead of searching for li-
braries and other dependencies in global name spaces such as /usr/lib,
we have to give paths to the components in the Nix store explicitly. For
instance, the following compiler instruction:
$ gcc -o test test.c lib.c -lglib
will fail because we have to specify the explicit path to the glib library. If
we pass the -L/nix/store/3a40ma...-glib-2.28.8/lib argument to the compiler the
command will succeed because, it is able to find a specific glib library
due to the path specification that we have provided.
2.2.2 Specifying build expressions
In order to build packages, Nix provides a purely functional domain specific
language, to declaratively specify build actions. Figure 2.1 shows a Nix ex-
pression, which describes how to build GNU Hello
1
from source code and its
build-time dependencies:
1 Essentially, the expression is a function definition, taking three argu-
ments. The function arguments represent the build-time dependencies
needed to build the component. The stdenv argument is a component
1
http://www.gnu.org/software/hello
Chapter 2. Background: Purely Functional Software Deployment 25
providing a standard UNIX utility environment offering tools such as
cat and ls. Furthermore, this component also provides a GNU build
toolchain, containing tools such as gcc, ld and GNU Make. The fetchurl
argument provides a function, which downloads source archives from
an external location. The perl argument is a package containing the Perl
language interpreter and platform
2
.
2
The remainder of the expression is a function invocation to stdenv.mkDerivation,
which can be used to declaratively build a component from source code
and its given dependencies.
3 This stdenv.mkDerivation function takes a number of arguments. The name
argument provides a canonical name that ends up in the Nix store path
of the build result.
4 The src parameter is used to refer to the source code of the package. In
our case, it is bound to a fetchurl function invocation, which downloads
the source tarball from a web server. The SHA256 hash code is used to
check whether the correct tarball has been downloaded.
5 The buildInputs parameter is used to add other build-time dependencies
to the environment of the builder, such as perl.
6 Every package may also contain meta attributes, such as a description
and a homepage from which the package can be obtained. These at-
tributes are not used during a build.
In Figure 2.1, we have not specified what steps we should perform to build
a component, which can be done for example by specifying the buildCom-
mand argument or specifying build phases. If no build instructions are given,
Nix assumes that the package can be built by invoking the standard GNU
Autotools [MacKenzie, 2006] build procedure, i.e. ./configure; make; make install.
In principle, any build tool can be invoked in a Nix expression, as long as it
is pure. In our Nix packages repository, we have many types of packages and
build tools, such as Apache Ant
3
, Cabal
4
, SCons
5
, and CMake
6
.
Apart from compiling software from source code, it is also possible to de-
ploy binary software by copying prebuilt binaries directly into the output
folder. For some component types, such as ELF executables the binary for-
mat for executables and libraries used on Linux and several other UNIX-like
systems patching is required, because normally these executables look into
default locations, such as /usr/lib, to find their run-time dependencies, which
do not work correctly because their dependencies cannot be found. For this
2
http://www.perl.org
3
http://ant.apache.org
4
http://www.haskell.org/cabal
5
http://www.scons.org
6
http://www.cmake.org
26
rec { 7
hello = import ../applications/misc/hello { 8
inherit fetchurl stdenv perl; 9
};
perl = import ../development/interpreters/perl {
inherit fetchurl stdenv;
};
fetchurl = import ../build-support/fetchurl {
inherit stdenv; ...
};
stdenv = ...;
}
Figure 2.2 all-packages.nix: A partial composition expression
purpose, a tool called PatchELF
7
has been developed, which makes it pos-
sible to change the RPATH field of an ELF executable, so that the run-time
dependencies of prebuilt binaries can be found.
2.2.3 Composing packages
As the Nix expression in Figure 2.1 defines a function, we cannot use it to
directly build a component. We need to compose the component, by calling
the function with the required function arguments.
Figure 2.2 shows a partial Nix expression, which provides compositions by
calling build functions with their required arguments:
7 Essentially, this expression is an attribute set of name and value pairs,
i.e. rec { name
1
= value
1
; ...; name
n
= value
n
; }, in which attributes are allowed
to refer to each other, hence the rec keyword. Each attribute defines a
package, while its value refers to a build function invocation.
8 The import statement is used to import an external Nix expression. For
example, the hello attribute refers to the Nix expression declaring a build
function, which we have shown earlier in Figure 2.1.
9 The arguments of this function invocation refer to the other attributes
defined in the same composition expression, meaning that they are be-
ing built first, before the GNU Hello world package gets build. The
inherit perl; argument is syntactic sugar for perl = perl;, which inherits an at-
tribute from its lexical scope. By providing a different argument, e.g. perl
= myOtherPerl; a different composition can be achieved. Because in many
cases the function arguments are identical to the attribute names in the
composition expression, Nix provides a callPackage statement, which au-
tomatically inherits all function arguments, apart from a given number
of specified exceptions.
7
http://nixos.org/patchelf
Chapter 2. Background: Purely Functional Software Deployment 27
In principle, all packages including all their dependencies, must be defined
as Nix expressions in order to reliably deploy it with the Nix package man-
ager, although there are ways to refer to files stored outside the Nix store,
which is not recommended. Because everything must be specified, writing
deployment specifications could be a burden. Fortunately, there is a reposi-
tory available, called Nixpkgs
8
, which contains about 2500 packages containing
many-well known free and open source components as well as some propri-
etary ones, such as build tools, utilities and end-user applications. The expres-
sions for these components in Nixpkgs can be reused to reduce the burden of
specifying everything.
On Linux platforms, Nixpkgs is fully bootstrapped, which means that there
are no references to components outside the Nix store. On other platforms,
such as FreeBSD, Mac OS X and Cygwin, some essential system libraries are
referenced on the host system.
Recently, we have also implemented experimental support for .NET de-
ployment on Microsoft Windows platforms. For deployment of these compo-
nent types, it is impossible to isolate certain components, such as the .NET
framework. There are some tricks to make it as pure as possible and to con-
veniently provide deployment support through Nix, although these solutions
are non-trivial. Chapter 12 contains more information about this.
2.2.4 Building packages
By using the composition expression, shown in Figure 2.2, the GNU Hello
package, including all its required build-time dependencies, can be built by
running the following command-line instruction:
$ nix-build all-packages.nix -A hello
Figure 2.3 shows all build-time dependencies of the GNU Hello package,
built on Linux. As may be obser ved, the graph is quite large and complicated.
Currently, the GNU Hello package has 163 build-time dependencies.
While performing a build of a Nix package, Nix tries to remove many
side effects from a build process. For example, all environment variables
are cleared or changed to non-existing values, such as the PATH environment
variable
9
. We have also patched various essential build tools, such as gcc, so
that they ignore impure global directories, such as /usr/include and /usr/lib. Op-
tionally, builds can be performed in a dedicated chroot() environment, which
restricts access to the root of the filesystem of the builder process to a sub di-
rectory, in which only the specified dependencies (and a number of required
files to run a build) are visible and anything else is invisible, giving even
better guarantees of purity.
Because Nix removes as many side effects as possible and due to the fact
that every component is isolated in the Nix store, dependencies can only can
be found by explicitly setting search directories, such as PATH, to contain the
8
http://nixos.org/nixpkgs
9
PATH is not cleared but set to /path-not-set to prevent the shell filling it with default values
28
Figure 2.3 Build-time graph of GNU Hello on Linux, containing all build-time de-
pendencies of GNU Hello, including the compiler’s compiler, glibc and the Linux
kernel headers.
Chapter 2. Background: Purely Functional Software Deployment 29
Nix store paths of the components that we want to use. Undeclared depen-
dencies, which may reside in global directories such as /usr/lib, cannot acciden-
tally let a build succeed.
Another advantage is that it is safe to assume that a component with the
same hash code built on another machine is equivalent, assuming that the
remote machine can be trusted
10
. By taking this knowledge into account, Nix
also supports binary deployment; if a component with the same hash code
exists elsewhere, we can download a substitute from an external location rather
than building it ourselves, significantly reducing the amount of buildtime.
Furthermore, to reduce the download time of substitutes and the overall
deployment time, Nix also has the option to generate binary patches, for ver-
sions of components that are similar to each other [Dolstra, 2005a]. Binary
patches can be applied to an older version of a component and the result of
the patching process can be stored in a new component in the Nix store, hav-
ing the hash of the newer component. Although binary patches may speed up
the deployment time for large components, it may actually increase the de-
ployment time of small components, because generating and applying patches
may take more time than downloading a fresh version.
Another implication is that if we want to build a particular package for a
different type of platform (e.g. a FreeBSD package, while our host system
is a Linux system), we can also forward a build to a system of the required
type and copy the build result back. A system architecture identifier is also
reflected in the hash code of a Nix store path, and thus does not interfere
with components build for a different architecture. As we will see later in this
thesis, this property also provides several useful features.
2.2.5 Runtime dependencies
Nix is capable of detecting runtime dependencies from a build process. This
is done by conservatively scanning a component for the hash codes of build-
time dependencies that uniquely identify a component in the Nix store. For
instance, libraries that are needed to run a specific executable are stored in
the RPATH field in the header of the ELF executable, which specifies a list of
directories to be searched by the dynamic linker at runtime
11
. The RPATH
header of the GNU Hello executable in our last example, may contain a path
to the standard C library which is: /nix/store/6cgraa4mxbg1...-glibc-2.13. Apart
from RPATH fields, Nix can also detect store paths in many other types of
files (as long as the hashes are in a detectable format), such as text files, shell
scripts and symlinks.
By scanning the hello component we know that a specific glibc component
in the Nix store is a runtime dependency of GNU Hello. Figure 2.4 shows
the runtime dependencies of the GNU Hello component in the Nix store.
10
It has some caveats as described by Dolstra and Löh [2008]
11
Nixpkgs contains a wrapped ld command, which automatically sets the RPATH of an ELF
binary so that shared libraries are statically bound to an executable.
30
/nix/store
1dp59cdvjql2...-hello-2.7
bin
hello
man
man1
hello.1
6cgraa4mxbg1...-glibc-2.13
lib
libc.so.6
p7gs4gixwdbm...-subversion-1.7.2
bin
svn
lib
libsvn...so
fwkrhfb0zavr...-openssl-1.0.0g
lib
libssl.so.1.0.0
qvvch57sx4lw...-db-4.5.20
lib
libdb-4.so
Figure 2.4 Runtime dependencies of the GNU Hello component
Although the scanning technique may sound risky, we have used this for
many packages and it turns out to work quite well.
Nix also guarantees complete deployment, which requires that there are no
missing dependencies. Thus we need to deploy closures of components un-
der the “depends on” relation. If we want to deploy component X which
depends on component Y, we also have to deploy component Y before we de-
ploy component X. If component Y has dependencies we also have to deploy
its dependencies first and so on.
2.2.6 Nix profiles
Storing components in a Nix store in isolation from each other is a nice fea-
ture, but not very user friendly. For instance, if a user wants to start the
program hello, then the full path to the hello executable in the Nix store must
be specified, e.g. /nix/store/1dp59cdv...-hello-2.7/bin/hello.
To make components more accessible to end users, Nix provides the con-
cept of Nix profiles, which are specific user environments composed of symlink
trees that abstract over store paths.
By adding the /home/user/.nix-profile/bin directory to the PATH environment
variable, which resolves indirectly to the store paths of each component, the
user can start the hello component activated in its profile by just typing: hello
on the command-line. The symlink tree that is dereferenced is illustrated in
Figure 2.5.
Chapter 2. Background: Purely Functional Software Deployment 31
/home/sander/.nix-profile
/nix/var/nix/profiles
default
default-23
default-24
/nix/store
a2ac18h4al2c...-user-env
bin
hello
y13cz8h6fliz...-user-env
1dp59cdvjql2...-hello-2.7
bin
hello
man
man1
hello.1
6cgraa4mxbg1...-glibc-2.13
lib
libc.so.6
Figure 2.5 An example Nix profile that contains the GNU Hello component
The set of available components in the environment of the user, can be
modified using the nix-env command. To add a component to the Nix envi-
ronment for instance the GNU Hello component the user can type:
$ nix-env -f all-packages.nix -i hello
By executing the command above, the package will be built and made avail-
able in the Nix profile of the user, because symlinks are made that point to
the files of the GNU Hello component.
In order to upgrade the hello component the user can type:
$ nix-env -f all-packages.nix -u hello
Uninstalling the hello component can be done by typing:
$ nix-env -f all-packages.nix -e hello
The uninstall operation does not delete the specific GNU Hello component
from the Nix store. It just removes the symlinks to the GNU Hello package
from the environment. The Hello component could be in use by another Nix
profile of another user or still being a running process, thus it is not safe to
remove them.
If it turns out that the new version of the GNU Hello component does not
meet our requirements or we regret that we have uninstalled the GNU Hello
component, we can rollback to a previous version of the user environment, by
typing:
$ nix-env -rollback
32
Because multiple variants of packages in the Nix store can safely coex-
ist without interference, ordinary users without root privileges can install
packages without affecting packages belonging to Nix profiles of other users
and (for example) inject a trojan horse. It is even possible to allow packages
coming from different sources (including untrusted sources) to safely coexist.
More information on this can be found in [Dolstra, 2005b].
2.2.7 Atomic upgrades and rollbacks
The deployment steps in Nix are atomic. At any point in time, running the
command hello from the command-line either runs the old or the new ver-
sion of GNU Hello, but never an inconsistent mix of the two. The user’s PATH
environment variable contains the path /nix/var/nix/profiles/default/bin, which indi-
rectly resolves to a directory containing symlinks to the currently “installed”
programs. Thus, /nix/var/nix/profiles/default/bin/hello resolves to the currently ac-
tive version of GNU Hello. By flipping the symlink /nix/var/nix/profiles/default
from default-23 to default-24, we can switch to the new configuration. Replacing
a symlink can be done atomically on UNIX, hence upgrading can be done
atomically.
2.2.8 Garbage collection
The Nix garbage collector, which has to be called explicitly, checks which
components have become obsolete and safely deletes them from the Nix store.
Components part of a Nix profile or referenced by another component as well
as running processes and open files are identified as garbage collector roots
and identified as being in use. Thus it is safe to run the garbage collector at
any time.
The garbage collector can be invoked by typing:
$ nix-collect-garbage
It is also possible to remove all older generations of a Nix user profile and
garbage with it, by typing:
$ nix-collect-garbage -d
This command also removes the older default-23 symlink and also removes all
the garbage with it.
2.2.9 Store derivations
Components are not directly built from Nix expressions but from lower-level
store derivation files. Figure 2.6 shows the two-stage building process of Nix
expressions.
The Nix expression language is not used directly, because it is fairly high
level and subject to evolution. Furthermore, it is difficult to uniquely identify
a Nix expression, as Nix expressions can import other Nix expressions that
are scattered across the filesystem.
Chapter 2. Background: Purely Functional Software Deployment 33
Figure 2.6 Two-stage building of Nix expressions
Derive(
[("out", "/nix/store/1dp59cdvjql2l3y0j8a492vsf3lyr6ln-hello-2.7", "", "")]
10
, [ ("/nix/store/dvrn7nnqm4d1hvm2b6hxqcfzqkjfqgsy-hello-2.7.tar.gz.drv", ["out"]) 11
, ("/nix/store/g9ciis17a2dnqxfhv40iis964l08cbl2-bash-4.2-p20.drv", ["out"])
, ("/nix/store/gi7a993hcndzdf09f2i08wqc7d91m1nn-stdenv.drv", ["out"])
]
, ["/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"] 12
, "x86_64-linux" 13
, "/nix/store/xkzklx3jbds21xgry79hk00gszhanjq0-bash-4.2-p20/bin/bash"
14
, ["-e", "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"]
, [ ("buildInputs", "") 15
, ("buildNativeInputs", "")
, ("builder", "/nix/store/xkzklx3jbds21xgry79hk00gszhanjq0-bash-4.2-p20/bin/bash")
, ("doCheck", "1")
, ("name", "hello-2.7")
, ("out", "/nix/store/1dp59cdvjql2l3y0j8a492vsf3lyr6ln-hello-2.7")
, ("propagatedBuildInputs", "")
, ("propagatedBuildNativeInputs", "")
, ("src", "/nix/store/ycm2pz2bln77gpjxvyrnh6k1hg9c4xik-hello-2.7.tar.gz")
, ("stdenv", "/nix/store/7c8asx3yfrg5dg1gzhzyq2236zfgibnm-stdenv")
, ("system", "x86_64-linux")
]
)
Figure 2.7 Structure of the GNU Hello store derivation
For all these reasons, Nix expressions are not built directly but translated to
store derivation files encoded in a more primitive language. A store derivation
describes the build action of one single component. Store derivation files are
also placed in the Nix store and have a store path as well. The advantage of
keeping store derivation files inside the Nix store gives us a identification of
source deployment objects.
Figure 2.7 shows the structure of the Nix store derivation used to build the
GNU Hello component:
10 This line contains a list of attributes containing Nix store output paths
(in which the results of the build are stored) and variable names. In this
example, we only use one output directory, which corresponds to out.
11 In this section, all the build inputs are specified, which are components
that must be built before the Nix component that produces its output in
out.
12 This line specifies the path to the default builder script
13 This line specifies for which system architecture the component must
be built. If this system identifier does not match the host’s system ar-
34
chitecture, Nix can optionally forward the build to an external machine,
capable of building it.
14 This line refers to a script, which performs the build procedure. The bash
shell is used to execute the given shell script and executed in an isolated
environment in which environment variables are cleared or set to unini-
tialised values. Optionally, the script is jailed in a chroot() environment
to ensure a high degree of purity.
15 The remaining lines specify all the environment variables that are passed
to the builder script, so that the build script knows certain build at-
tributes, where to find dependencies and where to store the build re-
sults.
By invoking the nix-instantiate command, it is possible to translate a specific
Nix expression into store derivation files. For instance:
$ nix-instantiate hello.nix
/nix/store/dgrsz4bsk5srrb...-hello-2.7.drv
After executing the nix-instantiate command, it returns a list of store deriva-
tion files that it has generated from it. By executing nix-store --realise command,
it is possible to execute the build action which is described in a store deriva-
tion file. For instance:
$ nix-store -realise /nix/store/dgrsz4bsk5srrb...-hello-2.7.drv
/nix/store/1dp59cdvjql2...-hello-2.7
If the build action is finished, it returns the store path to the resulting
component that it has built.
2.3 N I X O S
Nix is a tool to manage packages and can be used on a wide range of oper-
ating systems, such as various Linux distributions, FreeBSD, Mac OS X and
Microsoft Windows (through Cygwin). Nix has also been used as a basis for
NixOS, which is a GNU/Linux distribution completely managed through Nix
including all operating system specific parts, such as the kernel and configu-
ration files [Dolstra and Löh, 2008; Dolstra et al., 2010].
Due to the fact that Nix stores all packages in a special directory: /nix/store,
the NixOS distribution does not have most of the Filesystem Hierarchy Stan-
dard (FHS) [The Linux Foundation, 2012] directories such as /usr and /lib and
only a minimal /etc and /bin. The /bin directory only contains a symlink: /bin/sh
pointing to the bash shell, as this is required to make the system() library call
to work properly.
Chapter 2. Background: Purely Functional Software Deployment 35
{pkgs, ...}: 16
{
boot.loader.grub.device = "/dev/sda"; 17
fileSystems = [ 18
{ mountPoint = "/";
device = "/dev/sda1";
}
];
swapDevices = [ { device = "/dev/sda2"; } ];
19
services = { 20
openssh.enable = true;
httpd = {
enable = true;
documentRoot = "/var/www";
adminAddr = "admin@localhost";
};
xserver = {
enable = true;
desktopManager = {
default = "kde4";
kde4 = { enable = true; };
};
displayManager = {
kdm = { enable = true; };
slim = { enable = false; };
};
};
};
environment.systemPackages = [
21
pkgs.mc
pkgs.subversion
pkgs.firefox
];
}
Figure 2.8 configuration.nix: A NixOS configuration file
2.3.1 NixOS configurations
NixOS is more than just a Linux distribution managed by Nix. What sets
NixOS apart from conventional Linux distributions is that entire system con-
figurations are configured from a declarative specification invoking functions
generating configuration files and building required packages, similar to the
fact that individual Nix packages are built from declarative specifications. Or-
dinary Linux distributions are often configured by manually installing pack-
ages and manually editing configurations files, such as an Apache web ser ver
configuration.
Figure 2.8 shows an example of a NixOS configuration, capturing proper-
ties of a complete system:
16
A NixOS configuration is essentially a function providing a reference
36
to the Nixpkgs collection, through the pkgs function argument. The re-
mainder of the file is an attribute set, capturing machine specific prop-
erties.
17
This attribute defines configuration settings of the GRUB
12
bootloader
specifying that the Master Boot Record (MBR) partition is on the first
SCSI harddrive.
18
The fileSystems attribute set defines properties of file systems, such as
what their mount points are. In our example, we have specified that
our root partition corresponds to the first partition on the first SCSI
harddrive.
19 The swapDevices attribute set contains a list of swap partitions, which is
in our case the second partition on the first SCSI harddrive.
20 The services attribute enables a number of system ser vices and pro-
vides their configuration properties. In our example, we have enabled
OpenSSH, the Apache Webserver and an X.Org server configured to use
the KDE Plasma desktop and the KDM login manager
21 Also, system packages are defined here, such as Mozilla Firefox and
Midnight Commander. These packages are automatically added to the
PATH of all users and also appear automatically in the KDE Plasma Desk-
top’s program launcher menu, if .desktop files are included (specifica-
tions which KDE and GNOME use to define entries in their program
launcher menu).
By storing a NixOS configuration file in /etc/nixos/configuration.nix and by run-
ning nixos-rebuild switch, a complete system configuration is built by the Nix
package manager and activated. Because of the distinct properties of the Nix
package manager, we can safely upgrade a complete system; this is because
the complete system configuration and all its dependencies are safely stored
in the Nix store and no files are overwritten or removed. By running: nixos-
rebuild –rollback switch, we can do a rollback to any previous generation of the
system which has not been garbage collected yet.
Figure 2.9 shows the GRUB bootloader screen of NixOS. The bootloader
makes it possible to pick and boot into any available previous NixOS config-
uration at boot time.
2.3.2 Service management
As mentioned earlier, NixOS specifications are used to deploy a complete sys-
tem. The result of building a NixOS configuration is a Nix closure containing
all packages, configuration files and other static parts of a system. Apart from
ordinary packages, Nix can also be used to do service management, by gener-
ating configuration files and shell scripts referring to packages [Dolstra et al.,
2005].
12
http://www.gnu.org/software/grub
Chapter 2. Background: Purely Functional Software Deployment 37
Figure 2.9 The GRUB bootloader of NixOS
{ config, pkgs }:
pkgs.writeText "sshd_config" ’’
22
SendEnv LANG LC_ALL ... 23
${if config.services.sshd.forwardX11 then ’’ 24
ForwardX11 yes
XAuthLocation ${pkgs.xorg.xauth}/bin/xauth 25
’’ else ’’
ForwardX11 no
’’}
Figure 2.10 Nix expression generating an OpenSSH server configuration file
Figure 2.10 shows an example Nix expression generating a configuration
file for the OpenSSH server. The example has been taken from the OpenSSH
example described in the NixOS paper [Dolstra et al., 2010]:
22 In the body of this expression, we invoke the pkgs.writeText function gen-
erating an OpenSSH server configuration file, called sshd_config.
23 The last argument of the pkgs.writeText function invocation is a string,
which provides the actual contents of the configuration file.
24
In this string, we use antiquoted expressions to optionally enable X11
forwarding, which can be enabled by setting the services.sshd.forwardX11
option in the NixOS configuration file.
25 If the forwardX11 option is enabled, we provide the location to the xauth bi-
nary by evaluating the function that builds it. As a side effect, xauth gets
38
SendEnv LANG LC_ALL ...
ForwardX11 yes
XAuthLocation /nix/store/kyv6n69a40q...-xauth-1.0.2/bin/xauth
Figure 2.11 Example result of evaluating the OpenSSH server configuration file
{config, pkgs, ...}:
{
options = { ... };
config = { ... };
}
Figure 2.12 General structure of a NixOS module
built from source code. Due to laziness of the Nix expression language,
xauth only gets compiled if it is needed.
Figure 2.11 shows a possible result of the Nix expression, if the X11 for-
warding option is enabled. This configuration file contains an absolute path
to the xauth binary, which resides in a Nix store directory. After building the
configuration file, Nix identifies the xauth binary as a run-time dependency
(because it scans and recognises the hash code, as explained earlier in Sec-
tion 2.2.5) and ensures that this dependency is present when transferring this
configuration file to another machine.
2.3.3 NixOS modules
So far, we have shown that we can build packages as well as configuration files
by using Nix expressions. To make building of a Linux distribution manage-
able and extensible, NixOS is composed of a number of NixOS modules, each
capturing a separate concern of a system. NixOS modules are merged into
a single system configuration, which can be used to build the entire system
through its dependencies.
Figure 2.12 defines the general structure of a NixOS module. A module is a
function in which the config argument represents the full system configuration
and the pkgs argument represents the Nixpkgs collection, which can be used
for convenience. In the remainder of the expression, we define the options
a module provides (defined in the options attribute) and what configuration
settings it provides (defined in the config attribute).
Figure 2.13 shows an example capturing the structure of the OpenSSH
NixOS module:
26 The OpenSSH module defines two options: whether to enable the OpenSSH
service and whether to use X11 forwarding.
27 In the remainder of the expression, we configure the module settings,
which is done by defining the config attribute set. The mkIf function is
used to check whether the OpenSSH option is enabled in the NixOS
Chapter 2. Background: Purely Functional Software Deployment 39
{ config, pkgs, ... }:
with pkgs.lib;
{
options = {
26
services.openssh.enable = mkOption {
default = false;
description = "Whether to enable the Secure Shell daemon.";
};
services.openssh.forwardX11 = mkOption {
default = true;
description = "Whether to allow X11 connections to be forwarded.";
};
};
config = mkIf config.services.openssh.enable { 27
users.extraUsers = singleton 28
{ name = "sshd";
uid = 2;
description = "SSH privilege separation user";
home = "/var/empty";
};
jobs.openssh = 29
{ description = "OpenSSH server"; 30
startOn = "started network-interfaces"; 31
preStart = 32
’’
mkdir -m 0755 -p /etc/ssh
if ! test -f /etc/ssh/ssh_host_dsa_key; then
${pkgs.openssh}/bin/ssh-keygen ...
fi
’’;
exec = 33
let sshdConfig = pkgs.writeText "sshd_config" (elided; see Figure 2.10);
in "${pkgs.openssh}/sbin/sshd -D ... -f ${sshdConfig}";
};
networking.firewall.allowedTCPPorts = [ 22 ];
34
};
}
Figure 2.13 openssh.nix: NixOS module for the OpenSSH server daemon
configuration file provided by the user. The mkIf is a "delayed if" func-
tion, because with an ordinary if function, the evaluation of the NixOS
configuration runs into an infinite loop.
28 The extraUsers attribute is used by NixOS to create user accounts on first
startup. In our expression, it is used to create a SSH privilege seperation
user, used by OpenSSH for security reasons.
29 The jobs.openssh attribute set can be used to define an OpenSSH job for
Upstart
13
, the system’s init daemon, which is used for starting and stop-
ping system services.
13
http://upstart.ubuntu.com
40
30 The description attribute gives a description to the Upstart job, which is
displayed in the status overview when running: initctl list
31 The startOn attribute defines an upstart event. In this case, we have
specified that the job should be started after the network interfaces have
been started.
32 The preStart attribute defines a script, which is executed before the actual
job is started. In our job, we generate SSH host key on initial startup.
33 The exec attribute defines which job we want to run. In our case, we start
the OpenSSH daemon: sshd with the generated sshd_config file (described
earlier in Figure 2.10 as parameter, so that it utilises the settings that we
want.
34
The networking.firewall attribute states that TCP port 22 must be open in
the firewall to allow outside connections.
2.3.4 Composing a Linux system
The config attribute set of every module collectively defines the full system
configuration, which is computed by merging all modules together. Multiple
definitions of an option are combined, for example by merging all values into
a single list, which is done for the networking.firewall.allowedTCPPorts attribute.
The end result may be:
networking.firewall.allowedTCPPorts = [ 22 80 ];
The TCP port 80 value may appear in this list, if the HTTP service is en-
abled.
The build result of a NixOS configuration is a Nix profile (which is called
a NixOS system profile) consisting of symlinks to all static parts of the system,
such as all system packages and configuration files. A NixOS profile also con-
tains an activation script taking care of all “impure” deployment steps, such
as creating state directories: /var, starting and stopping Upstart jobs repre-
senting system services, and updating the GRUB bootloader so that users can
boot into the new system configuration.
2.4 H Y D R A
Another important Nix application is Hydra, a continuous build and integra-
tion server built on top of Nix [Dolstra and Visser, 2008]. The purpose of
Hydra is to continuously check out source code repositories and perform au-
tomatic builds and tests for those packages. Furthermore, it provides a web
application interface, which can be used to track build jobs and allow end-
users to configure jobs. An impression of the Hydra web application interface
is shown in Figure 2.14.
Typically, Hydra is attached to a cluster of build machines. As we have
explained earlier, we can also safely delegate builds to other machines in the
Chapter 2. Background: Purely Functional Software Deployment 41
Figure 2.14 A Hydra build queue
network for various reasons. In Hydra, this feature is extensively used to clus-
ter builds in order to reduce build times and to forward a build to a machine
capable of building a package for a particular platfor m. In our SERG Hydra
build environment at TU Delft
14
, we have a Hydra cluster containing Linux,
Microsoft Windows (Cygwin), Mac OS X (Darwin), FreeBSD and OpenSolaris
machines.
Because Hydra is built on top of Nix, it has several distinct advantages
compared to other continuous integration solutions, such as Hudson [Moser
and O’Brien, 2011] or Buildbot [Buildbot project, 2012]. For example, the
build environment itself is fully managed so that undeclared dependencies
cannot accidentally allow a build or test to succeed or fail, which is undesir-
able because we have no guarantees that a package will work in a different
environment. Especially for system integration tests this is a very strict re-
quirement as side effects affect the reproducibility of test cases, a property
which is difficult to guarantee using conventional deployment tools.
42
{ nixpkgs }: 35
let
jobs = rec { 36
tarball = 37
{ disnix, officialRelease }:
with import nixpkgs {};
releaseTools.sourceTarball {
name = "disnix-tarball";
version = builtins.readFile ./version;
src = disnix;
inherit officialRelease;
buildInputs = [ pkgconfig dbus_glib libxml2 libxslt dblatex tetex doxygen ];
};
build = 38
{ tarball, system }:
with import nixpkgs { inherit system; };
releaseTools.nixBuild {
name = "disnix";
src = tarball;
buildInputs = [ pkgconfig dbus_glib libxml2 libxslt ]
++ lib.optional (!stdenv.isLinux) libiconv
++ lib.optional (!stdenv.isLinux) gettext;
};
tests = 39
{ nixos, disnix_activation_scripts }:
...
in
jobs
Figure 2.15 release.nix: A Hydra build expression for Disnix
2.4.1 Defining Hydra build jobs
Figure 2.15 shows an example of a Hydra expression for Disnix, a component
from our reference architecture, which we will cover later in Chapter 4 of this
thesis:
35 In principle, a Hydra expression is a function returning an attribute set.
In the function header, the nixpkgs argument specifies that we need a
checkout of Nixpkgs.
36 In the resulting attribute set, each attribute represents a job. In our
example, we have defined three jobs, namely: tarball, build and tests. Jobs
can also depend on the result of other jobs and must be able to refer to
each other. To make this possible, the rec keyword is used.
37 Each job also defines a function, which performs Nix builds. The tarball
attribute defines a function, which generates a source tarball from a
14
http://hydra.nixos.org
Chapter 2. Background: Purely Functional Software Deployment 43
Subversion checkout of Disnix. This function takes two arguments: disnix
refers to a Subversion source code checkout of Disnix and officialRelease
indicates whether the build is an official release. Unofficial releases have
the Subversion revision number appended to their resulting packages
file names. Essentially, the build process of the tarball job executes the
configure phase and runs make dist, which generates the source tarball.
38 The build job is used to unpack the generated tarball from the tarball job
and builds it from source code. This function takes two arguments. The
tarball argument refers to a Nix store component containing a generated
source code tarball. The system attribute refers to a system identifier,
specifying for which architecture the component must be built. For ex-
ample, by providing i686-linux, the package is built for a 32-bit Intel Linux
platform.
39 The tests job is used to perform system integration testing of Disn ix. This
function takes three function arguments. The nixos argument refers to
a checkout of the NixOS source code, which we require for integration
testing. The disnix_activation_scripts argument refers to a build of the Dis-
nix activation scripts, which is a required run-time dependency for Dis-
nix. In the remainder of the expression, test cases are specified, which
we have left out Figure 2.15 for simplicity. Later in Chapter 7 we show
how we can perform distributed system integration tests.
2.4.2 Configuring Hydra build jobs
As with ordinary Nix expressions, the Hydra expression of Disnix defined
in Figure 2.15, does not specify how to compose these jobs and what kind
of dependencies it should use. The function arguments to these jobs can be
provided by configuring job sets using the Hydra web interface.
Figure 2.16 shows an example of a Disnix job configuration:
The disnix argument is configured to take the result of a Subversion ex-
port operation, which always checks out the latest revision of Disnix.
The disnix_activation_scripts argument is configured to take the latest build
result of the Hydra job that continuously builds it.
The nixos argument is configured to take the result of a Subversion ex-
port operation, which always checks out the latest revision of NixOS.
The nixpkgs argument is configured to take the result of a Subversion ex-
port operation, which always checks out the latest revision of Nixpkgs.
The officialRelease argument is configured to false, which means that the
Subversion revision number ends up in the file name.
The system argument is a list of system identifiers. By specifying a list
of identifiers, Disnix gets built for a wide range of platforms, such as
44
Figure 2.16 Configuring a Disnix jobset through the Hydra web interface
GNU/Linux, FreeBSD, Mac OS X (Darwin), OpenSolaris and Microsoft
Windows (via Cygwin).
The tarball argument is configured to take the latest build result of the
tarball job.
2.5 R E L AT E D W O R K
There is a substantial amount of related work on package management. First,
there are many package managers and software installers available working
on both binary and source level. Second, there is a substantial amount of re-
search in techniques to improve the way deployment processes are performed
by such tools.
2.5.1 Tools
Binary package managers
As mentioned earlier, most common package managers on Linux systems,
such as RPM and the Debian package manager, use nominal dependency
specifications and imperatively install and upgrade packages, often in global
namespaces, such as /usr/lib. These properties have various drawbacks, such
Chapter 2. Background: Purely Functional Software Deployment 45
as the fact that it is hard to ensure that dependency specifications are correct
and complete, and to safely upgrade systems.
Windows [Microsoft, 2012] and Mac OS X typically use monolithical de-
ployment approaches in which programs, including all their dependencies,
are bundled in a single folder, with a minimum amount of sharing. Only
highly common and trusted components are shared. Such approaches may
produce a significant overhead in terms of disk space and the amount of RAM
that program may use.
.NET applications use the Global Assembly Cache (GAC) [Microsoft, 2010a]
to prevent the “DLL hell”, which requires library assemblies to have a strong
name, composed of several nominal attributes, such as the name, version,
culture and a public/private key pair. Although the GAC provides a higher
degree of isolation compared to common deployment tools, its scope is lim-
ited to a fixed number of attributes and library assemblies only. It cannot be
used for native libraries and other types of components, such as compilers.
Source package managers
Apart from the versioning and storage issues, compared to source package
managers, such as FreeBSD ports [FreeBSD project, 2012], Nix has the advan-
tage that builds are pure and reproducible. As a result, it is relatively simple
to provide binary deployment in addition to source deployment in a transpar-
ent way, which is more difficult to achieve with conventional source package
managers, due to all side effects that may occur during a build.
2.5.2 Techniques
The Mancoosi
15
research project is also focused on investigating package man-
ager techniques and have a number of similar goals to Nix. However, their
view is a bit different compared to the “Nix-way” of deploying systems. For
example, Mancoosi techniques are implemented in common package man-
agers, such as the Debian package manager and RPM.
A number of motivating factors for the Mancoosi project to use conven-
tional package management tools as a basis over unconventional approaches
such as Nix, are disk space requirements, potential security issues with old
versions of packages that may be retained on a system, and compliance with
the Filesystem Hierarchy Standard (FHS), part of the Linux Standards Base
(LSB) [The Linux Foundation, 2010].
Nix and Mancoosi have a slightly different view on the definition of soft-
ware packages. As we have seen, Nix has an unorthodox notion of a pack-
age, since packages are deployed by direct or indirect function invocations to
stdenv.mkDerivation that builds a package from source code and its given depen-
dencies provided as function arguments. Its result is a Nix store component
with containing files and directories stored in isolation in the Nix store, which
can never be modified after the package has been built.
15
http://www.mancoosi.org
46
In the Mancoosi project, the definition of a package is derived from the way
conventional packages are structured. According to Di Cosmo et al. [2008] a
package consists of files, meta information (such as nominal dependencies
and conflicts), and maintainer scripts that imperatively modify systems, by
“glueing” files from a package to files on the host system and perform modi-
fications to the system configuration.
An important challenge are conflicts that come with packages deployed by
conventional package managers there are various reasons why problems
may occur if a combination of packages are installed which are absent other-
wise. A study on package conflict defects by Artho et al. [2012] in which the
authors have analysed Debian and Fedora bug repositories divides these
conflicts into four categories: resource conflicts (which occur if a package
provides the same files or have wrong file permissions), conflicts on shared
data and configuration files (e.g. by imperatively modifying an Apache web
server configuration resulting in invalid configuration file), other problems
(e.g. drivers installed as kernel modules), conflicts on meta-data (e.g. false
dependencies), and outdated/incorrect information on conflicting packages,
i.e. some conflicts with other packages have been resolved in newer versions.
Some conflicts will not happen in Nix and NixOS since files of a package
are always stored in isolation from each other and files belonging to a package
cannot be automatically removed or adapted after they have been built, we do
not have file or configuration conflicts on static parts of the system. However,
we cannot prevent a package to access the same variable data in the /var folder,
or that a web server uses the same TCP port at runtime.
Given the fact that the dependencies and conflicts of packages are known,
which is not always trivial to ensure or to maintain, it is also extremely dif-
ficult to install or upgrade system in such a way that all dependencies are
present and correct and that no packages are in conflict. In fact, a study on the
package management problem [Mancinelli et al., 2006] has shown that it is a
NP-complete problem suggesting that it is not doable to solve large instances
of this problem. Fortunately, by using SAT-solvers subsets of these problems
can be efficiently solved. For example, a tool called apt-pbo [Trezentos et al.,
2010] has been developed (extending Debian’s Apt) exactly for that purpose.
Similarly, a study has also shown that co-installability of packages [Di Cosmo
and Vouillon, 2011] is a NP-hard problem. Furthermore, there are many more
studies on dependencies and conflict problems, such as strong dependen-
cies [Abate et al., 2009] a package p strongly depends on package q if when-
ever p is installed then package q must also be installed.
Another major difference between Nix and the Mancoosi approach is the
fact that Mancoosi keeps “maintainer scripts” included in packages intact. For
example, a maintainer script of an Apache module may be used to update a
configuration file of an Apache web server. These scripts make it hard to
ensure reproducible deployment or to roll back an imperative operation. Nix
and NixOS on the contrary, do not use any maintainer scripts, but instead
all static parts of configuration files are generated and provided as packages
stored in isolation, instead of imperatively adapted.
Chapter 2. Background: Purely Functional Software Deployment 47
Parts of the Mancoosi projects are about developing a transactional upgrad-
ing approach for maintainer scripts, for instance, by simulating the effects of
a script first to see whether it will succeed [Cicchetti et al., 2010]. In NixOS, a
host system is declaratively configured through NixOS modules, generating
configuration files in a pure manner. As a result, it is trivial to switch to a
newer or older version by merely flipping a symlink of a Nix profile.
Furthermore, the Mancoosi approach obeys various standards commonly
used in Linux distributions, such as the Filesystem Hierarchy Standard (FHS)
part of the Linux Standards Base (LSB). Nix deviates from the FHS (and there-
fore also from the LSB). The main reason is that the FHS layout does not pro-
vide isolation (they require users to store packages in global locations, such
as /usr/lib) and the LSB requires the presence of the RPM package manager,
which may break purity and reproducibility [van der Burg, 2011g]. However,
the disadvantage of deviating from these standards, is that third party binary
software with custom installers must be patched, in order to be deployed
through Nix. For conventional package managers no modification is needed.
In Nix and NixOS, we have chosen purity and reliability over compatibility,
although there are several tools and tricks we can use to allow third party
binary software to be deployed.
GoboLinux
16
another Linux distribution not obeying the FHS shares the
same vision that the filesystem can be used to organise packages in isolation
and to achieve better deployment, although they use a nominal versioning
scheme and do not statically bind packages [van der Burg, 2011a].
2.6 C O N C L U S I O N
In this chapter, we have covered three important foundations of this the-
sis, namely the Nix package manager which is an automated model-driven
package manager, borrowing concepts from purely functional programming
languages, offering various advantages, such as reliable, efficient and repro-
ducible package builds. Moreover, we have covered NixOS, a GNU/Linux
distribution using the Nix package manager, which extends the Nix deploy-
ment features to complete system configurations. Finally, we have shown
Hydra, a continuous build and integration server built around Nix.
In the following chapter, we define a reference architecture to fully auto-
mate deployment processes for the latest generations of systems, using the
components described in this chapter, such as the Nix package manger, and
several new components, such as distributed deployment tools, which are
described in the remaining chapters.
16
http://www.gobolinux.org
48
3
A Reference Architecture for Distributed
Software Deployment
A B S T R A C T
Various developments in software engineering have increased the deployment
complexity of software systems, especially for systems offered as services
through the Internet. In this thesis, we strive to provide an automated, re-
liable, reproducible and efficient deployment solution for these systems. Nix
and several Nix applications, such as NixOS and Hydra, provide a partial so-
lution to achieve this goal, although their scope is limited to single systems. In
this chapter, we describe a reference architecture for distributed software de-
ployment that captures properties of a family of dom ain-specific deployment
systems containing various components implementing various deployment
aspects. The reference architecture can be turned into a concrete architecture
for a domain-specific deployment tool.
3.1 I N T R O D U C T I O N
In Chapter 1, we have argued that due to various developments in software
engineering, the software deployment process has become very difficult in
terms of complexity, reliability, agility and heterogeneity.
We have explained Nix, NixOS and Hydra in Chapter 2 a package man-
ager borrowing concepts from purely functional programming languages, a
Linux distribution completely managed through Nix and a continuous build
and integration server built around Nix. These tools provide a partial solution
for these deployment issues.
However, the scope of Nix and NixOS is limited to single systems. They are
not applicable to the systems composed of distributable services implemented
in various programming languages and component technologies, which must
be deployed in a network of machines, having ranging capabilities and prop-
erties.
We have argued in the introduction that systems offered as services through
the internet are more complicated than single systems. Apart from the fact
that more machines must be deployed and components across machines have
dependencies on each other, also several important non-functional deploy-
ment requirements must be met, which are specific to a domain, such as
reliability, privacy and the licenses under which components are governed.
Some of these aspects cannot be solved in a generic manner.
In this chapter, we describe a reference architecture [Bass et al., 2003; Taylor
et al., 2010] for distributed software deployment, to realise the vision of a
49
fully automated, distributed, reliable software deployment tool for distributed
systems capable of integrating with the given application domain.
The reference architecture captures useful elements for distributed software
deployment and can be turned into a concrete architecture for a domain-
specific software deployment tool, which can be used to automatically, re-
liably and efficiently deploy a system in a network of machines.
3.2 R E F E R E N C E A R C H I T E C T U R E S
In this thesis, we provide a solution to the software deployment process for the
latest generation of software systems (offered as services through the internet)
by means of a reference architecture. In the introduction we have given a
definition of a reference architecture by Taylor et al. [2010]:
The set of principal design decisions that are simultaneously applicable to mul-
tiple related systems, typically within an application domain, with explicitly
defined points of variation.
Typically, reference architectures capture elements of an entire family of
systems. In this thesis, we use it to capture elements of an entire family
of domain-specific deployment tools performing distributed software deploy-
ment. For example, it simultaneously captures aspects of a deployment tool
for service-oriented medical applications as well as a deployment tool for web
applications, although the implementation of custom components may be dif-
ferent and not all components are required to solve a particular deployment
problem for a given domain.
A concrete architecture for a domain-specific deployment tool can be de-
rived by developing a tool, which incorporates several deployment compo-
nents defined in the reference architecture and by integrating a number of
custom components.
3.3 R E Q U I R E M E N T S
In order to overcome the deployment complexities for service-oriented sys-
tems, a number of functional and non-functional requirements must be met.
3.3.1 Functional requirements
As we have explained, Nix provides a partial solution for the deployment for
service-oriented systems, however its scope must be extended to distributed
environments.
Figure 3.1 shows a small example deployment scenario of a modern service-
oriented system. The grey colored boxes denote machines in a network. As
can be seen, the system is hosted on two machines, which run different op-
erating systems the left machine represents a 64-bit Linux machine and the
right machine represents a 32-bit Windows machine.
50
Figure 3.1 An example deployment scenario of a service-oriented software system
Every machine hosts a number of software components, denoted by white
boxes and containers, denoted by white boxes with dotted borders, which are
software components hosting other components, such as web applications,
and makes them available to end-users, e.g. through a network connection.
The arrows denote dependency relationships. A component may have a de-
pendency on a component on the same system, but may also be dependent
on components deployed on other systems in the network.
To deploy these service-oriented systems, Nix and NixOS can be used de-
ploy individual components and entire system configurations on single ma-
chines. However, the following aspects are not managed:
Some components have dependencies on components residing on other
machines in the network. To deploy distributable components, we must
ensure that their dependencies are correct and present.
As with “ordinary” software components, system configurations may
also have dependencies on machines with a particular system config-
uration. For example, a filesystem directory that is hosted on another
machine and shared through a network filesystem (NFS).
We also have to deploy software components, that are hosted inside a
container, by modifying a container’s state (e.g. by executing an activa-
tion operation). Currently, NixOS manages state in an ad-hoc manner,
significantly complicating deployment of large networks of machines.
To support the latest generation of software systems offered as services, we
also have to deploy distributable com ponents, such as web services, that may
be located on arbitrary systems in a network and may have dependencies on
Chapter 3. A Reference Architecture for Distributed Software Deployment 51
other distributable components. Apart from services, we must also manage
the infrastructure the system configurations of all machines in a network.
3.3.2 Non-functional requirements
Apart from functional requirements, the design of the reference architecture
is primarily driven by a number of important quality attributes.
Reliability
Reliability is our most important quality attribute. For reliable deployment sev-
eral aspects are important. The first reliability aspect is dependency completeness
all required dependencies must be present, correct and the system must be
able to find them. Furthermore, components have to be activated in the right
order.
The second reliability aspect is immutability. Most conventional deployment
systems imperatively modify systems. For example, they overwrite existing
files or remove files, which may belong to another component. Such modifica-
tions may break a system, make it hard to undo the changes and it also makes
it harder to reproduce a configuration in a different environment. In our de-
ployment system, it is important to be pure; we want to store components in
isolation and we never want to modify them imperatively.
The third reliability aspect is atomicity. If we upgrade systems, it is desir-
able to make this operation as atomic as possible; the system should either
be in the new configuration or in the old configuration, but never somewhere
in the middle. During an upgrade, a system is temporarily inconsistent and
may produce errors. It is desirable to hide these characteristics to end users
as much as possible. Furthermore, we also want to be able to roll back, if the
new configuration does not behave as expected.
The fourth reliability aspect is recovery. After a system has been deployed,
it is not guaranteed to stay operational as various events may occur. For
example, if a system has been deployed in a distributed environment, network
links may temporarily fail or machines may crash and disappear from the
network. In such cases, redeployment is required, which is not always a
trivial process. Therefore, it is desired to provide a deployment solution that
automatically redeploys a system in such cases.
Reproducibility
As we have explained earlier in Chapter 1, apart from the deployment com-
plexities that we have to overcome, we also want to deploy systems more
frequently and in test environments, so that systems can be tested before they
are deployed in production environments. In order to be sure that a sys-
tem works in a production environment as expected, a system configuration
should be reproducible; we want a system to behave exactly as developers
have intended in the test environments.
52
Genericity
As explained earlier, modern software systems are distributed and heteroge-
neous. Components implementing a service are implemented in various pro-
gramming languages, various component technologies and are deployed on
machines having different architectures and system resources. The deploy-
ment tools realised from this reference architecture must have the ability to
cope with various types of components and systems in a generic manner.
Extensibility
As explained earlier, not every deployment aspect can be solved in a generic
manner. For example, heterogeneous systems may use various communica-
tion and discovery protocols, are composed of various types of services and
they have various deployment quality attributes.
Quality attributes are specific to a domain. For example, the notion of
reliability in a hospital environment may be different than reliability in the
financial domain. Furthermore, certain quality attributes are more important
in one domain than another. Therefore, this functionality should be provided
by the developers realising a concrete architecture for a deployment tool and
these extensions must be easily integrated.
Furthermore, not all components from the reference architecture may be
required to automate a deployment process for a specific domain. For exam-
ple, in certain scenarios it may be more desirable to manage the infrastructure
by other means, e.g., because it is a proprietary operating system, which can-
not be supported through Nix-based tools. Also, infrastructure deployment
may be sufficient, without having an explicit notion of service components,
because systems may be homogeneous, such as database clusters.
Efficiency
The last quality attribute of importance is efficiency. Software systems tend
to become larger and more complicated. As a result, an inefficient deploy-
ment process may take a large amount of time. Therefore, it is desirable to
make deployment processes as efficient as possible; only what is required to
do, should be done. However, efficiency may not break the previous quality
attributes, as they are more important.
3.4 C O M P O N E N T S
From our requirements and deployment tools that we have described in the
previous chapter, our reference architecture provides a number of compo-
nents, each providing a solution for a specific deployment aspect. These
components manifest themselves as tools and are covered in the remaining
chapters of this thesis.
Chapter 3. A Reference Architecture for Distributed Software Deployment 53
3.4.1 Nix: Deploying immutable software components
Nix serves as the fundamental basis for this research and provides automated,
reliable deployment for static or immutable components on single systems.
Deployment with Nix is model-driven. For common components, such as
compilers, libraries and end-user applications, a repository of Nix packages
called Nixpkgs can be used, which provides expressions for many frequently
used packages found on many Linux systems. Nix has been described earlier
in Chapter 2.
3.4.2 NixOS: Deploying complete system configurations
As explained earlier in Chapter 2, NixOS uses the Nix package manager to de-
ploy complete system configurations from a NixOS configuration file. NixOS
serves as the basis of our distributed infrastructure deployment tools.
3.4.3 Disnix: Deploying services in networks of machines
Nix is designed for deployment of software components on single systems
and management of local dependencies (which we call intra-dependencies).
Disnix extends Nix with additional tools and models to deploy services in
a network of machines and manages inter-dependencies dependencies be-
tween services, which may be located on different machines in the network.
Disnix is covered in Chapter 4. Disnix is also capable of (almost) atomically
upgrading a service-oriented system. This aspect is covered in Chapter 8.
3.4.4 Dynamic Disnix: Self-adaptive deployment of services in networks of machines
After a service-oriented system has been deployed in a network of machines,
various events may occur, such as a network failure or crash, which may
render a software system (partially) unavailable for use, or may significantly
downgrade its performance. In such cases, a system must be redeployed.
In order to redeploy a system with Disnix, the deployment models must be
adapted to reflect a new desirable deployment configuration, which can be
very labourious and tedious, especially if it also desirable to meet various
technical and non-functional requirements.
Dynamic Disnix provides additional features to make service-oriented sys-
tems self-adaptable from a deployment perspective. A discovery service dy-
namically generates an infrastructure model. A distribution model can be
generated from a QoS policy model, composed of chained filter functions,
which use deployment planning algorithms to map services to machines in
the network, based on technical and non-functional properties defined in the
service and infrastructure models. Dynamic Disnix is covered in Chapter 5.
54
3.4.5 DisnixOS: Combining Disnix with complementary infrastructure deployment
Disnix performs deployment of service components, but does not manage
the underlying infrastructure. For example, application servers and database
management systems (DBMS) still have to be deployed by other means. Dis-
nixOS provides complementary infrastructure deployment to Disnix, using
features of NixOS and Charon, so that both the underlying infrastructure and
services can be automatically deployed together. DisnixOS is described in
Chapter 6.
3.4.6 Dysnomia: Deploying mutable software components
Because Nix borrows purely functional programming language concepts, not
all types of components can be properly supported, such as databases. In
this thesis, we call these types of components mutable software components.
Currently, in Nix-related applications mutable components are still deployed
on an ad-hoc basis, which can be very inconvenient in networks of machines.
Dysnomia supports automated deployment of mutable software components.
Deployment steps of mutable components cannot be done generically, because
they have many forms. A plugin system is used to take care of the deployment
steps of each component type. Dysnomia provides a repository of modules for
a number of component types (Dysnomia modules) and can be easily extended.
Dysnomia is covered in Chapter 9.
3.4.7 grok-trace: Tracing source artifacts of build processes
As we have explained in the introduction, the licenses under which compo-
nents are governed is an important non-functional deployment requirement,
as not obeying these licenses properly may result in legal actions by the copy-
right holders. The grok-trace tool can be used in conjunction with Nix, in order
to dynamically trace build processes, produce a graph out of these traces to
determine how a build artifact, such as a binary, is produced. This informa-
tion can be used in conjuncion with a license analyser to determine under
what licenses the inputs are covered and how these inputs are combined into
the final product. grok-trace is covered in Chapter 10.
3.5 A R C H I T E C T U R A L PAT T E R N S
The components in the previous section automate several deployment aspects
and must be combined in a certain way to provide a fully automated deploy-
ment solution for a particular domain, having our desired quality attributes.
In this section, we describe the architectural patterns that we use for this pur-
pose.
Chapter 3. A Reference Architecture for Distributed Software Deployment 55
3.5.1 Layers
The main purpose of the reference architecture is to support distributed de-
ployment. We have identified two distributed deployment concerns, namely
distributed service deployment and distributed infrastructure deployment.
Domain-specific deployment tools typically have to address either or both
of these concerns. The result of deployment actions is typically a collection
of mutable and immutable components, which must be deployed by the Nix
and Dysnomia deployment mechanisms.
This observation shows a clear distinction between concerns in an ordered
fashion. To create a clear dependence structure, we have used an architectural
pattern from [Taylor et al., 2010] called virtual machines in which groups of
components implementing deployment aspects are layered in a strict manner.
In our reference architecture (shown later in Figure 3.2), we have three layers
capturing four concerns:
The bottom layer concerns the deployment of individual software com-
ponents on single machines.
The middle layer provides distributed deployment facilities built on top
of the tools provided in the layer underneath. Distributed deployment
concerns two aspects. First, systems may be composed of interconnected
distributable components (also called services in this thesis). Second,
in order to deploy service-oriented systems we also have to provide
infrastructure, consisting of machines with the right operating systems,
system services and capabilities.
The top layer contains high-level deployment tools for applications, which
invoke tools from the underlying layer to provide distributed service
and infrastructure deployment.
3.5.2 Purely functional batch processing style
Apart from an architecture that groups concerns in separate layers, we also
have to combine the individual components. Traditionally in the UNIX world,
most deployment tasks are provided by command-line tools, which can be
called by end-users or from scripts that combine several tools to automate
entire procedures.
We have chosen to adopt ideas of this style which is also known as the
batch processing architectural pattern by Taylor et al. [2010] for the following
reasons:
A user should be able to perform steps in the deployment process sep-
arately. For instance, a user may want to build all the components de-
fined in the models to see whether everything compiles correctly, while
he does not directly want to activate them in the network.
56
Testing and debugging is relatively easy, since everything can be in-
voked from the command line by the end user. Moreover, components
can be invoked from scripts, to easily extend the architecture.
Components are implemented with different technologies. Some tools
are implemented in the Nix expression language, because they are re-
quired to build components by the Nix package manager. Other tools
are implemented in the C and C++ programming languages for effi-
ciency and others are shell scripts, which can be used to easily glue
various tools together.
Prototyping is relatively easy. A tool can be implemented in a high level
language first and later reimplemented in a lower level language.
These principles are very common for developing UNIX applications and
inspired by Raymond [2003].
The ordinary batch processing architectural pattern has a number of draw-
backs in achieving our desired quality attributes. For example, the results of
processes are non-deter ministic and have side-effects, i.e. invoking the same
operation multiple times may yield different results. This also makes it hard
to reproduce the same execution scenario. Another big drawback is that all
the operations are executed sequentially and are hard to execute in parallel
due to side effects –, limiting the efficiency of deployment procedures.
In order to overcome these limitations, we propose our own variant on this
architectural pattern, called purely functional batch processing. This architectural
pattern has not yet been specified in the literature and neither is the purely
functional style, although Taylor et al. [2010] lists “Object-Oriented” (origi-
nating from a different class of programming languages) as an architectural
pattern.
In the purely functional batch processing style, we execute arbitrary pro-
cesses in isolated environments, in which side effects are removed, e.g. by
restricting file system access and clearing environment variables, so that only
explicitly specified dependencies can be found. Furthermore, processes al-
ways produce their results in a cache on the file system so that it only has
to be executed once, and can never interfere with other processes running in
isolated environments
1
.
The Nix builder environment and the Nix store described earlier in Chap-
ter 2 are means to implement such a purely functional batch processing
style. However, the implementation is not restricted to Nix. In the following
chapters, we will see that this pattern also applies to remote upgrades, assum-
ing that the remote machine can be trusted. Later in Chapter 9 we will also
show a different approach for mutable software components.
1
These measures give a relatively high degree of purity, but there are always situations in
which side-effects can be triggered, e.g. by generating timestamps or random numbers. However,
as we have seen in Chapter 2 this is not a problem for most tools that we use in automating
deployment.
Chapter 3. A Reference Architecture for Distributed Software Deployment 57
The purely functional batch processing pattern is a crucial means to achieve
our desired quality attributes. Advantages of this pattern are that we have rel-
atively little chance on side-effects, which ensures reliability (because depen-
dencies must always be explicitly defined and yields errors if such specifica-
tions are incomplete) and reproducibility (because no unspecified dependency
affects our results and are always executed deterministically).
Furthermore, because the results are immutable, we can also cache them
and return a result from a persistent cache if the function has been executed
before, providing efficiency. Furthermore, because we can support laziness
from purely functional languages, we only have to perform what is really
necessary and we can easily parellise tasks because the dependency closures
are always known.
The genericity and extensibility attributes are achieved, because we are
allowed to invoke arbitrary processes (which can be implemented in any pro-
gramming language) in the isolated builder environments.
Some quality attributes cannot be immediately achieved through this pat-
tern. For example, to achieve full distributed atomicity, we have some extra
requirements on the system that must be deployed, as we will see in Chap-
ter 8. Recovery is a quality attribute that must be implemented by Disnix and
Charon. We will extensively cover this aspect in Chapter 5.
3.6 A P P L I C AT I O N S
By using the components in the reference architecture and the architectural
patterns, we can implement domain-specific deployment solutions and we
can deploy systems in various environments, ranging from test environments
to production environments.
3.6.1 Hydra: Continuous building and integration
Hydra, the Nix-based continuous build and integration server described in
Chapter 2, is not part of the reference architecture, as it is not implementing
a deployment concern. Instead, it is a tool orchestrating build jobs. Since the
reference architecture builds around Nix, the resulting domain-specific de-
ployment solution can also be invoked non-interactively from a Hydra build
job. In Chapter 7, we use Hydra to automatically perform system integration
tests.
3.6.2 webdsldeploy: Deploying WebDSL applications in networks of machines
WebDSL [Visser, 2008] has been used as a case study for supporting auto-
mated distributed deployment of WebDSL applications from a high-level of
abstraction. WebDSL is a case study in domain specific language engineering
and provides various sublanguages capturing various concerns of web appli-
cations. From a WebDSL specification, a Java web application is generated.
58
Figure 3.2 A reference architecture for distributed software deployment
The webdsldeploy tool makes it possible to declaratively deploy a WebDSL
application in a network of machines, which can be hosted in both physical
networks and virtual networks of machines hosted in the environment of an
IaaS provider. The WebDSL function generates a NixOS network specifica-
tion which can be used with charon and the NixOS test driver. webdsldeploy is
described in Chapter 11.
3.7 A N A R C H I T E C T U R A L V I E W O F T H E R E F E R E N C E A R -
C H I T E C T U R E
Figure 3.2 shows an architectural view of our reference architecture for dis-
tributed software deployment:
The layered surfaces denote clusters of deployment aspects, which cor-
respond to the layers we have described in Section 3.5.1.
The ovals denote the components described in Section 3.4, which are
implemented as collections of command-line tools and Nix functions.
The labeled boxes denote models that can be provided by end-uses or
generated by other components in the reference architecture. Generated
models are produced in a purely functional manner and also cached in
the Nix store for efficiency.
Chapter 3. A Reference Architecture for Distributed Software Deployment 59
The arrows denote dependency relationships between these components.
Dependencies result in interactions between these components using the
purely functional batch processing architectural pattern.
In the top layer the application layer we have included an abstract cus-
tom application deployment tool, which can utilise components from the
lower layers to automate deployment for a particular domain. Although
not strictly part of our reference architecture, the webdsldeploy tool has
been included in the top layer as well
2
.
3.8 R E L AT E D W O R K
The idea of designing software architectures to create software deployment
automation solutions is not entirely new. A prominent example is described
by Eilam et al. [2006] (and related papers from the IBM Alpine project describ-
ing various deployment tasks [Konstantinou et al., 2009; El Maghraoui et al.,
2006]),
Their approach is mainly focused on interaction among stakeholders in
a deployment process, such as software architects, developers and system
administrators, who each write their own specifications capturing their de-
ployment concerns. These models are stored in a central repository. Various
model-to-model transformation tools are used to synthesise these stakeholder
models and to generate executable specifications that various deployment
tools can use to automate deployment tasks.
However, the Alpine architecture is not focused on completely automating
deployment or to achieve our desired quality attributes, such as reliability and
reproducibility. For example, deployment tools are invoked in ad-hoc manner
and can easily produce side-effects and destructively upgrade systems
3
.
There are more general deployment solutions capable of completely au-
tomating software deployment processes and capable of incorporating exter-
nal tools. Cfengine [Burgess, 1995] and related tools, such as Puppet [Puppet
Labs, 2011] and Chef [Opscode, 2011], can modify any file and invoke any
process on a remote machine. However, these tools execute operations in a
convergent manner, in which the end result of an imperative operation is de-
scribed and only the steps that are required to reach it are performed.
Although nobody has derived architectural concepts from these tools, we
identify the architectural pattern used to combine tools in Cfengine (and fam-
ily) as convergent batch processing”. This architectural pattern achieves a num-
ber of similar goals to the purely functional batch processing pattern, such as
genericity, since any file and tool can be invoked, and efficiency, as only oper-
ations are done that must be done. However, it does not ensure the reliability
aspect, as dependencies are not guaranteed to be complete and imperative
2
Developers of application deployment tools utilising WebDSL may want to use abstractions
on top of webdsldeploy.
3
In theory, it may also be possible to adapt the architecture to generate Nix expressions
and use our tools to deploy systems. However, in our reference architecture the desired quality
attributes come for free.
60
operations may destructively update system configurations. Furthermore, op-
erations can also affect reproducibility of system configurations, because they
depend on the machine’s current state.
The software dock [Hall et al., 1999] uses agents to implement a cooperative
deployment approach between suppliers (through release docks) and clients
(through field docks) and uses specifications in a common language, named
Deployable Software Descriptions (DSD). Although these agents implement
all the required software deployment operations, such as release and install, to
automate an entire deployment process, it is up to the agent’s implementation
to achieve intended quality attributes, such as reliability and reproducibility.
The agents are not prohibited, for example, to imperatively or destructively
update components on a particular system.
The Mancoosi Package Manager (MPM) [Abate et al., 2013] is a modular
package manager architecture with an emphasis on the upgrade component
that is considered the most difficult aspect of a package manager by the au-
thors. The MPM upgrade component can be reused in many package man-
agers (such as RPM or dpkg) as well as other software installers, such as the
Eclipse plugin manager and uses SAT-solvers to solve the problem of having
all the required dependencies present and that no conflicts occur. Moreover,
the upgrade component can also be hosted externally, such as a data-center,
and uses the Common Upgradeability Description Format (CUDF) as inter-
change format of upgrade data.
Build tools, such as Make [Feldman, 1979] and Apache Ant
4
, also allow
developers to invoke arbitrary tools and to parallelise tasks. However, they
allow side-effects to be programmed as well as files to be imperatively modi-
fied, affecting reliability and reproducibility.
Practitioners solving deployment problems typically create shell scripts
chaining a collection of operations together. These scripts use the ordinary
batch processing architectural pattern and have many drawbacks, such as the
fact that any operation can imperatively and destructively modify a system
and that operations have side effects affecting the reproducibility, because
they have access to the complete filesystem and depend on the filesystem’s
current state. Furthermore, it is also difficult to parallelise deployment tasks
or to cache inter mediate results, to make deployment processes more efficient.
3.9 C O N C L U S I O N
In this chapter, we have described a number of important functional attributes
as well as non-functional quality attributes for a family of domain-specific
deployment systems. From these properties, we have derived a reference ar-
chitecture for distributed software deployment capturing various deployment
aspects. The purely functional batch processing architectural pattern allows
us to achieve a number of important quality attributes. The reference archi-
tecture can be transformed into a concrete architecture for a domain-specific
4
http://ant.apache.org
Chapter 3. A Reference Architecture for Distributed Software Deployment 61
deployment tool by creating a tool invoking deployment components and by
integrating custom components that implement domain-specific features.
In the following chapters, we describe the individual components, general
concepts and applications of the reference architecture. In Chapter 13, we
evaluate how well we have achieved our desired quality attributes.
62
Part II
Service Deployment
63
4
Distributed Deployment of Ser vice-Oriented
Systems
A B S T R A C T
Deployment of a service-oriented system in a network of machines is often
complex and labourious. In many cases, components implementing a service
have to be built from source code for a specific target platform, transferred to
machines with suitable capabilities and properties, and activated in the right
order. Upgrading a running system is even more difficult as this may break
the running system and cannot be performed atomically. Many approaches
that deal with the complexity of a distributed deployment process only sup-
port certain types of components or specific environments, while general so-
lutions lack certain desirable non-functional properties, such as dependency
completeness and the ability to store multiple variants of components safely
next to each other. In this chapter, we explain Disnix, a deployment tool
which allows developers and administrators to reliably deploy, upgrade and
roll back a service-oriented system consisting of various types of components
in a heterogeneous environment using declarative specifications.
4.1 I N T R O D U C T I O N
The service-oriented computing (SOC) paradigm is nowadays a very popular
way to rapidly develop low-cost, interoperable, evolvable, and massively dis-
tributed applications [Papazoglou et al., 2007]. The key in realising this vision
is the Service Oriented Architecture (SOA) in which a system is decoupled
into “services”, which are autonomous, platfor m-independent entities that
can be described, published, discovered, and loosely coupled and perform
functions ranging from answering simple requests to sophisticated business
processes.
While a service-oriented system provides all kinds of advantages, such as
integrating a system into the Internet infrastructure, deploying it is often very
labourious and error-prone, because many deployment steps are performed
manually and in an ad-hoc fashion. Upgrading is even more complex. Re-
placing components may break the system, and upgrading is not an atomic
operation; i.e. while upgrading, it is observable from the outside that the sys-
tem is changing. This introduces all kinds of problems to end users, such as
error messages or inaccessible features.
Because the software deployment process of a service-oriented system in a
network is very hard, this is also a major obstacle in reaching their full poten-
tial. For instance, a system administrator might want to run every web service
65
of a system on a separate machine (say, for privacy reasons), but refrain from
doing so because it would take too much deployment effort. Similarly, actions
such as setting up a test environment that faithfully reproduces the produc-
tion environment may be very expensive.
Existing research in dealing with the complexity of the software deploy-
ment process in distributed environments is largely very context-specific; e.g. de-
signed for a particular class of components [Rutherford et al., 2002; Akkerman
et al., 2005] or specific environments (such as Grid Computing [Caron et al.,
2006]). Other approaches illustrate the use of component models and lan-
guages [Chen and Simons, 2002; Costanza, 2002] which can be used to make
deployment easier. While existing research approaches provide useful fea-
tures to reduce the complexity of a particular software deployment process
and also support various non-functional aspects, few of them deal with com-
plete heterogeneous systems consisting of multiple platforms, various types
of components and complete dependencies, i.e. services and all their depen-
dencies including their infrastructure, such as database back-ends of which
service-oriented systems may be composed.
The contribution of this chapter is a deployment tool, Disnix, which uses
models of the components of a system including all its dependencies, along
with the machines in the heterogeneous network in which the system is to be
deployed. These models are then used to automatically install or upgrade the
complete system efficiently and reliably, to support various types of services
and protocols by using an extensible approach, and to upgrade or rollback the
system almost atomically.
As a case study, we have used Disnix to automate the deployment of SDS2,
a SOA-based system for asset tracking and utilization services in hospital
environments developed at Philips Research. The goal of this case study is to
evaluate the feasibility of the proposed approach.
4.2 M O T I VAT I O N : S D S 2
As main case study, we have used a SOA-based system developed at Philips
Research, called the Service Development Support System (SDS2) [de Jonge
et al., 2007]. SDS2 is a platform that provides data abstractions over huge data
sets produced by medical equipment in a hospital environment.
4.2.1 Background
Medical devices produce very large amounts of data, such as status logs,
maintenance logs, patient alarms, images, location events and measurements.
These data sets often have poorly-defined structures, e.g. most of these data
sets are represented as logfiles without any clear structure. SDS2 is a plat-
form providing services which extract useful data from these implicit data
sets and transforms them into useful information. Furthermore, this data can
be presented to various stakeholders in a hospital environment.
66
Figure 4.1 Transformers architecture
In SDS2, web services provide the role of transformers, creating abstraction
layers for the data sets stored in the data repositories, shown in Figure 4.1.
By combining various transformers, data sets can be turned into useful and
concrete information with a well-defined interface. Data abstractions, such
as workflow analysis results, can be presented by one of the web application
front-ends to the stakeholders.
4.2.2 Features
SDS2 provides two web application front-ends serving the following stake-
holders:
Nurse. The nurse mainly uses SDS2 to check the location and statuses
of medical equipment located on a particular floor in a hospital. For
example, if a device reports an alarm, the nurse wants to investigate the
situation as quickly as possible.
Field service engineer. The field service engineer is responsible for main-
tenance of medical equipment. Apart from the locations and statuses
of medical devices, the field service engineer also wants to know what
events have happened in the past (e.g. check if it was broken) and how
a particular device is (ab)used. Furthermore, the field service engineer
may want to check out the maintenance records of a particular device
and eventually update them.
Chapter 4. Distributed Deployment of Service-Oriented Systems 67
Figure 4.2 The SDS2AssetTracker service
Hospital administrator. The hospital administrator mainly uses the sys-
tem for workflow analyses and to check out how efficiently devices are
used.
Asset Tracker service
The Asset Tracker service is mainly used by the nurse and field service engi-
neer. An impression of this web application front-end is shown in Figure 4.2.
The Asset Tracker shows a floor map with medical devices and their actual
real-time statuses. In this application, the end user can switch between floors,
pick a device from the map, check their reported events and inspect their
maintenance records. In our example, three devices are marked with a red
color symbol indicating that they are broken and require maintenance.
Utilisation service
The Utilisation service is primarily used by the hospital administrator. An
impression of this web application front-end is shown in Figure 4.3.
Similar to the Asset Tracker service, the Utilisation service shows floor
maps, but instead of showing the actual locations and statuses of devices,
it can be used to query workflow analysis results of particular devices (or
groups of devices) over a specified period of time. In our example, it shows
the intensity of travelled paths of a particular device between rooms. The
green color indicates low intensity, while a red color indicates high intensity.
68
Figure 4.3 The SDS2Utilisation service
4.2.3 Architecture
Figure 4.4 shows the architecture of SDS2. The architecture consists of three
layers:
The database layer contains databases storing event records, such as all
the status events (mobileeventlogs) and location events (mobilelocationlogs)
that medical devices produce. Furthermore, many services in the layer
below have to store or cache various records, such as device information
(hospitalinstalledbase) and maintenance records (mobileutilizationlogs). The
ejabberdDevices database is used by the ejabberd XMPP server to allow
devices to broadcast their statuses.
The transformers layer contains web services, providing a well-defined
interface to implicit data sets (MELogService, MULogService, HIBService),
as well as services which extract and synthesise data sets into more
useful information (ME2MSService), such as workflow analysis reports.
The FloorPlanService web service provides floor maps. Furthermore, there
are batch processes used to simulate status and location events
(SDS2EventGenerator), store events in databases (XMPPLogger) and merge
status and location events (SDS2EventMerger) so that their origin can be
tracked.
The presentation layer contains the web application front-ends described
in the previous subsection, which can be used by the stakeholders
Chapter 4. Distributed Deployment of Service-Oriented Systems 69
Figure 4.4 Architecture of SDS2. Rectangles denote web services and ovals rep-
resent batch processes. Databases and web applications are represented by im-
ages. Arrows denote dependency relationships.
(SDS2AssetTracker and SDS2Utilisation). The DemoPortal is a web application
that serves as an entry portal to all stakeholders, which redirects them
to the service they want to use.
It is important to notice that all components in Figure 4.4 are distributable
components (called services in this thesis). For example, the mobileeventlogs
database may be hosted on another machine than the SDS2AssetTracker web
application. Components in the architecture typically communicate with each
other using RPC or network protocols, such as SOAP (for web services),
XMPP (with ejabberd) and ordinary TCP connections (for database connec-
tions). It may even be possible that every component is located on a separate
machine, but it is also possible that all components are deployed on a single
machine.
4.2.4 Implementation
SDS2 is a heterogeneous system, as it uses various types of components as well
as various technologies:
MySQL. All the event logs produced by the devices as well as caches are
stored in MySQL databases.
Ejabberd. Events that are produced by medical devices are broadcast
through one or more Ejabberd servers.
70
Apache Tomcat and Apache Axis2. All the web service components are im-
plemented in the Java programming language using the Apache Axis2
library. For hosting the web application components, Apache Tomcat is
used.
Google Web Toolkit. The web application front-ends are implemented
using the Google Web Toolkit (GWT) using the Java programming lan-
guage. The GWT toolkit is used to compile Java code to efficient Java-
Script code.
4.2.5 Deployment process
The original deployment process of the SDS2 platform is a labourious process.
First, a global configuration file must be created, which contains URLs of all
the services of which SDS2 consists. Second, all the platform components
have to be built and packaged from source code (Apache Maven [Apache
Software Foundation, 2010] is used for this). For every machine that provides
data storage, MySQL databases have to be installed and configured. For every
Ejabberd server, user accounts have to be created and configured through a
web interface. Finally, the platform components have to be transferred to the
right machines in the network and activated.
Drawbacks
The SDS2 deployment process has a number of major drawbacks. First, many
steps are performed manually, e.g. deploying a database, and are therefore
time-consuming and error prone. Furthermore, up-to-date deployment docu-
mentation is required, which accurately describes how the deployment process
should be performed.
Moreover, some steps have to be performed in the right order; e.g. a web
service that provides access to data in a MySQL database, requires that the
database instance is started before the web service starts. Performing these
tasks in the wrong order may bring the web service in an inconsistent state,
e.g. due to failing database connections that are not automatically restored.
Upgrading is difficult and expensive: when changing the location of a serv-
ice, the global configuration file needs to be adapted and all the dependent
services need to be updated. Furthermore, it is not always clear what the
dependencies of a service are and how we can upgrade them reliably and
efficiently, i.e. we only want to replace the necessary parts without breaking
the entire system. Upgrading is also not an atomic operation; when chang-
ing parts of the system it may be obser vable by end users that the system is
changing, as the web front-ends may return incorrect results.
In heterogeneous networks (i.e. networks consisting of systems with various
architectures) the deployment process is even more complex, because we have
to compile and test components for multiple platforms.
Finally, since the deployment process is so expensive, it is also expensive
to deploy the platform in a test environment and to perform test-cases on
them. The deployment complexity is proportional to the number of target
Chapter 4. Distributed Deployment of Service-Oriented Systems 71
machines in the network if for one machine c components must be deployed
(including all their dependencies), then for x machines c ×x components must
be deployed at worst case, including all their dependencies.
Complexity
The deployment process of SDS2 is typically performed on a single system,
which is already complicated. There are various reasons why a developer or
system adm inistrator wants to deploy components from Figure 4.4 on multi-
ple machines, rather than a single machine.
For instance, a system administrator may want to deploy the DemoPortal
web application on a machine which is publicly accessible on the Internet,
while data sets, such as the mobileeventlogs, must be restricted from public ac-
cess since they are privacy-sensitive and, in some countries, the law prohibits
to store medical data outside the hospital environment.
It may also be desired that web services performing analysis jobs over data
sets, are located within the Philips Enterprise environment, since Philips has
a more powerful infrastructure to perform analysis tasks and may want to use
analysis results for other purposes.
Moreover, each database may need to be deployed on a separate machine,
since a single machine may not have enough disk space available to store all
the data sets. Furthermore, multiple redundant instances of the same compo-
nent may have to be deployed in a network, to offer load-balancing or failover.
All these non-functional aspects require a distributed deployment scenario,
which significantly increase the complexity of the deployment process of
SDS2.
Requirements
To deal with the complexity of the software deployment process of SDS2 and
similar systems, we require a solution that automatically deploys a service-
oriented system in an environment taking dependencies into account. Per-
forming the deployment process automatically is usually less time-consuming
and error-prone. Moreover, with a solution that takes all dependencies into
account we never have a breaking system due to missing dependencies and
we can also deploy the dependencies in the right order derived from the de-
pendency graph. We also want to efficiently upgrade a service-oriented system
by only replacing the changed parts and taking the dependencies into account
so that all dependencies are always present and deployed in the right order.
In order to automate the installation and upgrade, we have to capture the
services of a system and the infrastructure in declarative specifications. Using
these specifications, we can derive the deployment steps of the system, and
reproduce a previous deployment scenario in an environment of choice, such
as a test environment.
Furthermore, we require features to make the deployment process atomic,
so that we never end up with an inconsistent system in case of a failure and
can guarantee that it is not observable to end users that the system is chang-
ing.
72
Figure 4.5 Overview of Disnix
Since the underlying implementation of a service could be developed for
various platforms with various technologies, we should be able to build the
service for multiple platforms and integrate with the existing environment.
While there are solutions available that deal with these requirements, few
have a notion of complete dependencies, and most are only applicable within
a certain class of component types or environments and lack desirable non-
functional properties such as atomic upgrades. Therefore, a new approach is
required.
The technology used to implement SDS2 is very common in realising service-
oriented systems. The problems described in this section are also applicable
to other systems using the same or similar technology.
4.3 D I S N I X
To solve the complexities of the deployment process of distributed systems
such as SDS2, we have developed Disnix [van der Burg et al., 2008; van der
Burg and Dolstra, 2010a,d] (http://nixos.org/disnix), a distributed de-
ployment extension to the Nix deployment system [Dolstra et al., 2004; Dol-
stra, 2006].
As explained in Chapter 2, Nix is a package manager providing several
distinct features. It stores components in isolation from each other and pro-
vides a purely functional language to specify build actions. Nix only deals
with intra-dependencies, which are either run-time or build-time dependen-
cies residing on the same machine. Disnix provides additional features on
top of Nix to deal with distributed systems, including management of inter-
dependencies, which are run-time dependencies between components residing
on different machines. In this chapter we give an overview of Disnix and
we show how it can be used to automate the software deployment process of
SDS2.
Chapter 4. Distributed Deployment of Service-Oriented Systems 73
4.3.1 Overview
Figure 4.5 shows an overview of the Disnix system. It consists of the disnix-env
tool that takes three models as input:
The services model describes the services of which the system is com-
posed, their properties and their inter-dependencies. Usually this spec-
ification is written by the developers of the system. This model also
includes a reference to all-packages.nix, a model in which the build func-
tions and intra-dependencies of the services are specified.
The infrastructure model describes the target machines in the network on
which the services can be deployed. Usually this specification is writ-
ten by system administrators or can be generated by using a network
discovery service.
The distribution model maps the ser vices to machines in the network.
Usually, this specification is written by system administrators, or gener-
ated automatically.
As with ordinary Nix expressions, the described models are implemented
in the Nix expression language, a simple purely functional language used to
describe component build actions.
All the remote deployment operations are performed by the DisnixService,
a service that exposes deployment operations through a custom RPC protocol
and must be installed on every machine in the network. All the deployment
operations are initiated from a single machine, called the coordinator.
For example, to deploy SDS2, the developers and administrators must write
instances of the three models capturing the services of which SDS2 is com-
posed and the machines on which the components must be deployed. The
following command then suffices to deploy SDS2:
$ disnix-env -s services.nix -i infrastructure.nix \
-d distribution.nix
This command builds all components of SDS2 from source code for the in-
tended target platforms, including all their intra-dependencies, transfers them
to the selected target machines, and activates all services in the right order.
To upgrade SDS2 or change its configuration, the user modifies the models,
and runs disnix-env again. Disnix will rebuild and transfer any components
that have changed. After the transfer phase, a transition phase is started in
which obsolete services that are no longer in use are deactivated and new
services are activated. (In this phase only the changed parts of the system
are updated taking the inter-dependencies and their activation order into ac-
count.) If the activation of a particular service fails, a rollback is performed in
which the old configuration is restored.
Since the machines in the network could be of a different type than the
machine on which the deployment tool is running, the coordinator might
have to delegate the build to a machine capable of building the service for the
74
{ intraDep1, intraDep2 }: 40
{ interDep1, interDep2 }:
41
buildFunction { ... } 42
Figure 4.6 Structure of a Disnix expression
target platform (e.g. an i686-linux machine cannot compile a component for an
i686-freebsd system). When there is no dedicated build machine available for
this job, Disnix also provides the option to perform the build on the selected
target machine in the network. Disnix performs actions on remote machines
(such as transferring, building and activating) through a remote service, the
DisnixService, that must be installed on every target machine.
Usually, there is only one coordinator machine involved in deployment
processes. However, it is also possible to have multiple coordinators, which
can either deploy the same configuration or a different configuration (which
is made distinct by a profile identifier). Because of the purely functional de-
ployment model of Nix, we can safely deploy multiple variants of packages
next to each other without interference. However, at runtime it may be pos-
sible that a process from one configuration may conflict with another process
from a different configuration. Disnix does not keep track of these runtime
conflicts between different configurations, and in case of a runtime error, it
performs a rollback because the activation process fails. A system admin-
istrator has to manually check whether no conflicts occur. For reliability, it
is not recommended to deploy multiple configurations in the same environ-
ment. As a possible solution, two configurations can be merged into a single
configuration and then deployed by Disnix because all dependencies are in
the models, they can be statically verified.
4.3.2 Building a service
For every service defined in the architecture shown in Figure 4.4, we have
to write a Disnix expression, analogous to the fact that for every Nix pack-
age we have to write a Nix expression defining how it can be derived from
source code and its dependencies. Furthermore, the component must also
be configured to be able to connect to its inter-dependencies, for example by
generating a configuration file containing connection settings in a format that
the component understands.
The general structure of a Disnix expression is outlined in Figure 4.6. The
structure of a Disnix expression is almost the same as an ordinary Nix expres-
sion. A Nix expression is typically a function, whereas a Disnix expression is
a nested function defined in the Nix expression language:
40 The outer function header defines the intra-dependency arguments, which
are build-time and run-time dependencies residing on the same sys-
tem where the component is deployed. These dependencies could be
libraries, compilers and configuration files.
Chapter 4. Distributed Deployment of Service-Oriented Systems 75
{javaenv, config, SDS2Util}: 43
{mobileeventlogs}:
44
let
jdbcURL = "jdbc:mysql://"+
mobileeventlogs.target.hostname+":"+
toString mobileeventlogs.target.mysqlPort+"/"+
mobileeventlogs.name;
in
javaenv.createTomcatWebApplication rec {
45
name = "MELogService";
contextXML = ’’ 46
<Context>
<Resource
name="jdbc/sds2/mobileeventlogs" auth="Container"
type="javax.sql.DataSource"
username="${mobileeventlogs.target.mysqlUsername}"
password="${mobileeventlogs.target.mysqlPassword}"
url="${jdbcURL}" />
</Context>’’;
webapp = javaenv.buildWebService rec { 47
inherit name;
src = ../../../WebServices/MELogService; 48
baseDir = "src/main/java";
wsdlFile = "MELogService.wsdl";
libs = [ config SDS2Util ]; 49
};
}
Figure 4.7 Build expression of MELogService
41 The inner-function header defines the inter-dependency arguments, which
are run-time dependencies on services which may be located on differ-
ent machines in the network, such as web ser vices or databases.
Every inter-dependency argument is a set of attributes, containing var-
ious properties of a service defined in the services model, shown later
in Section 4.3.4. A special attribute of an inter-dependency is the tar-
gets property, which contains a list of machine properties on which the
inter-dependency is deployed. These machine properties are defined
infrastructure model, shown later in Section 4.3.5. The target attribute
points to the first targets element, which can be used for convenience.
42 In the remainder of the Disnix expression, the component is built from
source code, and the given intra-dependencies and inter-dependency
parameters.
Figure 4.7 shows a Nix expression for the MELogService service component
of SDS2, representing an Apache Tomcat web application containing a web
service archive built from Java source code. As we have explained earlier in
Section 4.2.3, the MELogService is a service providing an interface to log records
stored in a MySQL database, which can reside on a different machine. In
order to allow the service to connect to the database server, the expression
generates a so-called context XML file that specifies connection settings of the
MySQL database:
76
43 The outer function representing the intra-dependencies specifies the
config and SDS2Util components, which are libraries of the SDS2 platform,
and javaenv, which is used to compile and package Java source code.
44 The inner function, defining inter-dependencies, specifies mobileeventlogs
parameter representing the MySQL database where the data records are
stored. The inter-dependencies correspond to the arrows pointing away
from the MELogService service in Figure 4.4.
45 The remainder of the expression consists of the function call to
javaenv.createTomcatWebApplication, that composes an Apache Tomcat web
application from a Java web service archive (WAR file) and an Apache
Tomcat context XML file.
46 The context XML file is generated by using the target property from the
mobileeventlogs inter-dependency argument, that provides the connection
settings of the database server.
47 The web service archive is derived from Java source code by calling the
javaenv.buildWebService function.
48 The src parameter specifies the location of the source code, which resides
in a directory on the filesystem.
49 The libs parameter specifies the libraries, which are required to build
the service. These libraries are provided by the intra-dependency argu-
ments.
4.3.3 Composing intra-dependencies
Although a Disnix expression specifies how to derive the component from
source code and its intra- and inter-dependencies, we cannot build this serv-
ice directly. As with ordinary Nix expressions, we have to compose the com-
ponent by calling this expression with the expected function arguments. In
Disnix, we first have to compose the service locally, by calling the function
with the needed intra-dependency arguments; later, we have to provide the
inter-dependency arguments as well.
Figure 4.8 shows a partial Nix expression that composes the intra-
dependencies of the SDS2 services. As may be observed, the structure of
this expression is quite similar to the all-packages.nix composition expression
described in Section 2.2.3 of the previous Chapter. The subtle difference of
this expression is that it only captures components that belong to SDS2 and
reuses common libraries and build tools, such as Apache Axis2 and the Java
Development Kit, by refering to the Nixpkgs repository:
50 An intra-dependency compostion expression is a function taking three
arguments; The pkgs argument refers to the entire Nixpkgs collection,
providing compositions for many common packages. The system argu-
ments specifies the platform for which a service has to be built, using
Chapter 4. Distributed Deployment of Service-Oriented Systems 77
{pkgs, system, distribution}: 50
rec { 51
MELogService = import ../WebServices/MELogService { 52
inherit javaenv config SDS2Util;
};
config = import ../libraries/config { 53
inherit javaenv distribution;
};
javaenv = import ../tools/javaenv {
inherit (pkgs) stdenv jdk;
};
mobileeventlogs = ...
SDS2Util = ...
...
}
Figure 4.8 all-packages.nix: Intra-dependency composition
identifiers such as i686-linux and x86_64-freebsd. The system parameter
has also been passed as an argument to the composition function of the
stdenv component in Nixpkgs (not shown here), which is a component
used virtually by every other component; e.g. passing i686-linux as system
argument to stdenv will build every component for that particular type
of platform. The distribution argument refers to the distribution model
(shown later in Section 4.3.6), so that components can reflect over the
current deployment situation.
51 In the remainder of the expression, a recursive attribute set is specified
in which each value composes components of the SDS2 locally. As with
an ordinary Nix composition expressions, the rec keyword is required,
because components may have to refer to each other.
52 The MELogService attribute refers to the expression described in Fig-
ure 4.7 and is called with the required intra-dependency arguments (all
the intra-dependencies of the MELogService, such as SDS2Util, are com-
posed in this expression as well).
53 The config attribute composes the config component. The config compo-
nent is an intra-dependency of all SDS2 services, providing a registry
library which knows the locations of every web service. By inheriting
the distribution parameter, the entire distribution model is passed to the
build function, so that it can be properly configured to provide informa-
tion of service locations to all SDS2 services.
78
{distribution, pkgs, system}: 54
let
customPkgs = import ../top-level/all-packages.nix { 55
inherit pkgs distribution system;
};
in
rec { 56
mobileeventlogs = {
name = "mobileeventlogs";
pkg = customPkgs.mobileeventlogs;
type = "mysql-database";
};
MELogService = { 57
name = "MELogService"; 58
pkg = customPkgs.MELogService; 59
dependsOn = { 60
inherit mobileeventlogs;
};
type = "tomcat-webapplication";
61
};
SDS2AssetTracker = {
name = "SDS2AssetTracker";
pkg = customPkgs.SDS2AssetTracker;
dependsOn = {
inherit MELogService ...;
};
type = "tomcat-webapplication";
};
...
}
Figure 4.9 A partial services model for SDS2
4.3.4 Services model
Apart from specifying how each individual service should be derived from
source code and their intra-dependencies and inter-dependencies, we must
also model what services constitute a system, how they are connected to each
other (inter-dependency relationships) and how they can be activated or de-
activated. These properties are captured in a services model, illustrated in
Figure 4.9:
54 Essentially, a services model is a function taking three arguments; The
distribution arguments refers to the distribution model, the system argu-
ment provides the system architecture of a target in the infrastructure
model and the pkgs argument refers to the Nixpkgs collection.
55 Here, we import the composition expression that we have defined ear-
lier in Figure 4.8. In order to allow the composition expression to find
the Nixpkgs collection and the desired architecture, we pass them as
function arguments.
Chapter 4. Distributed Deployment of Service-Oriented Systems 79
56 The remainder of the services model is a mutually recursive attribute
set in which every attribute represents a service with its properties. As
with ordinary Nix expressions and the intra-dependency composition
expression, services may also have to refer to each other.
57 The MELogService attribute defines the MELogService service component
of SDS2. In principle, all the attributes in the services model correspond
to the rectangles, ovals and icons, shown earlier in Figure 4.4.
58 The name attribute specifies an identifier for MELogService.
59 The pkg attribute specifies a function that composes the service from
its inter-dependencies. This function has already been composed lo-
cally (by calling it with the required intra-dependency arguments) in
Figure 4.8.
60 The dependsOn attribute refers to an attribute set in which each attribute
points to a service in the services model, each representing an inter-
dependency of the MELogService component. The attribute set provides a
flexible way of composing services, e.g. mobileeventlogs = othermobileevent-
logs allows the user to compose a different inter-dependency relation-
ship of the MELogService. In principle, these attributes in dependsOn cor-
respond to the arrows, shown earlier in Figure 4.4.
61 The type attribute specifies what module has to be used for activation/de-
activation of the service. Examples of types are: tomcat-webapplication,
axis2-webservice, process and mysql-database. The Disnix interface will in-
voke the appropriate activation module on the target platfor m, e.g. the
tomcat-webapplication will activate the given service in an Apache Tomcat
web application container.
4.3.5 Infrastructure model
In order to deploy services in the network, we also have to specify what ma-
chines are available in the network, how they can be reached to perform de-
ployment steps remotely, what architecture they have (so that services can be
built for that type of platform) and other relevant capabilities, such as au-
thentication credentials and port numbers so that services can be activated or
deactivated. This information is captured in the infrastructure model, illus-
trated in Figure 4.10:
62
The infrastructure model is an attribute set in which each attribute cap-
tures a system in the network with its relevant properties/capabilities.
This line defines an attribute set specifying the properties of a machine
called test1.
80
{
test1 = { 62
hostname = "test1.net";
tomcatPort = 8080;
63
mysqlUser = "user";
mysqlPassword = "secret";
mysqlPort = 3306;
targetEPR = http://test1.net/.../DisnixService; 64
system = "i686-linux"; 65
};
test2 = {
hostname = "test2.net";
tomcatPort = 8080;
...
targetEPR = http://test2.net/.../DisnixService;
system = "x86_64-linux";
};
}
Figure 4.10 Infrastructure model for SDS2
{infrastructure}: 66
{
mobileeventlogs = [ infrastructure.test1 ]; 67
MELogService = [ infrastructure.test2 ];
SDS2AssetTracker = [
infrastructure.test1 infrastructure.test2
];
68
...
}
Figure 4.11 Partial distribution model for SDS2
64 Some attributes have reserved use, such as targetEPR, specifying the URL
of the DisnixService which performs remote deployment operations
1
.
65 The system attribute is used to specify the system architecture so that
services are built for that particular platform.
63 The other attributes can be freely chosen and capture miscellaneous
technical and non-functional properties of a machine. Further more, all
attributes defined in this model are passed to the activation scripts of
the given type, so that machine specific properties are known at activa-
tion time. For example, the tomcatPort attribute specifies the TCP port
on which Apache Tomcat listens. This property is used by the Apache
Tomcat activation script to properly activate Apache Tomcat web appli-
cations.
Chapter 4. Distributed Deployment of Service-Oriented Systems 81
4.3.6 Distribution model
Finally, we have to specify to which machines in the network we want to
distribute a specific service. These properties are defined in the distribution
model illustrated in Figure 4.11:
66 The distribution model is a function taking the infrastructure model as
parameter. The remainder of the expression is an attribute set in which
every attribute name represents a service in the services model and the
attribute value is a list of machines from the infrastructure model.
67 This attribute specifies that the mobileeventlogs database should be built,
distributed and activated on machine test1.
68 By specifying more machines in a list it is possible to deploy multiple
redundant instances of the same service for tasks such as load balanc-
ing. This attribute specifies that the SDS2AssetTracker service must be
deployed on both test1 and test2.
4.4 I M P L E M E N TAT I O N
In the previous section, we have described the structure of the models that
Disnix uses and how Disnix can be used to perform the complete deployment
process of a service-oriented system. In this section, we describe how this
deployment process is implemented.
4.4.1 Deployment process
In order to deploy a system from the models we have described earlier, we
need to build all the ser vices that are mapped to a target machine in the dis-
tribution model for the right target, then transfer the services (and all its intra-
dependencies) to the target machines and finally activate the services (and
eventually deactivate obsolete services from the previous configuration). The
disnix-env tool, which performs the complete deployment process, is composed
of several tools each performing a step in the deployment process. The archi-
tecture is shown in Figure 4.12.
Building the services
The Disnix models are declarative models. Apart from the build instructions of
the services, the complete deployment process including the order is defined
from these models and their defined intra- and inter dependencies.
In the first step of the deployment process, the disnix-manifest tool is invoked
with the three Disnix models as input parameters. The result of this tool is
a manifest file, shown in Figure 4.13, an XML file that specifies what services
1
This attribute is used by the DisnixWebService. Disnix also provides other types of inter-
faces, which use a different attribute. Consult the Disnix manual for more details [van der Burg,
2011e].
82
Figure 4.12 Architecture of disnix-env. Rectangles represent objects that are ma-
nipulated by the tools. The ovals represent a tool from the toolset performing a step
in the deployment process.
need to be distributed to which machine and a specification of the services to
be activated:
69 The distribution section captures the Disnix profiles that must be dis-
tributed to each target in the network. Disnix profiles are Nix profiles
(as described in Section 2.2.6 of the previous chapter) containing sym-
links of the contents of all the services and their intra-dependencies that
should be installed on the given target machine.
70 The activation section captures all the activation properties of each service,
such as the properties of the MELogService component.
73 The name element contents corresponds to the service name defined in
the services model.
74 The service element contents points to the Nix store component, under
which the build result is stored.
75 The target element contains properties of the target machine on which
the service must be activated. These properties are derived from the
infrastructure model.
72 The dependsOn section defines activation properties of all inter-dependencies
of the given service. In our example, it defines properties of the mo-
Chapter 4. Distributed Deployment of Service-Oriented Systems 83
<?xml version="1.0"?>
<manifest>
<distribution> 69
<mapping>
<profile>/nix/store/ik19fyzhak5ax4s8wkixg2k4bbkwyn83-test1</profile>
<target>http://test1.net/DisnixService/services/DisnixService</target>
</mapping>
<mapping>
<profile>/nix/store/grq938g52iz3vqly7cn5r9n9l6cjrsp7-test2</profile>
<target>http://test2.net/DisnixService/services/DisnixService</target>
</mapping>
</distribution>
<activation>
70
<mapping> 71
<dependsOn> 72
<dependency>
<service>/nix/store/z1i74v0hp0909fvgda4vwx68yprhpqwa-mobileeventlogs</service>
<target>
<hostname>test1.net</hostname>
<mysqlPassword>admin</mysqlPassword>
<mysqlPort>3307</mysqlPort>
<mysqlUsername>root</mysqlUsername>
<sshTarget>localhost:2223</sshTarget>
<system>i686-linux</system>
<targetEPR>http://test1.net/DisnixService/services/DisnixService</targetEPR>
<tomcatPort>8082</tomcatPort>
</target>
</dependency>
</dependsOn>
<name>MELogService</name>
73
<service>/nix/store/hg6fy5ngr39ji4q3q968plclssljrb6j-MELogService</service> 74
<target> 75
<hostname>test2.net</hostname>
<sshTarget>localhost:2222</sshTarget>
<system>i686-linux</system>
<targetEPR>http://test2.net/DisnixService/services/DisnixService</targetEPR>
<tomcatPort>8081</tomcatPort>
</target>
<targetProperty>targetEPR</targetProperty> 76
<type>tomcat-webapplication</type> 77
</mapping>
...
<mapping> 78
<dependsOn/>
<name>mobileeventlogs</name>
<service>/nix/store/z1i74v0hp0909fvgda4vwx68yprhpqwa-mobileeventlogs</service>
...
</mapping>
</activation>
<targets>
79
<target>http://test1.net/DisnixService/services/DisnixService</target>
<target>http://test2.net/DisnixService/services/DisnixService</target>
</targets>
</manifest>
Figure 4.13 A partial Disnix manifest file
bileeeventlogs database service, which is an inter-dependency of MEL-
ogSer vice.
76 The targetProperty element specifies what property in the infrastructure
model is used to provide the address of the remote Disnix Service. In
84
/nix/store
nqapqr5cyk...-glibc-2.9
lib
libc.so.6
ccayqylscm...-jre-1.6.0_16
bin
java
n7s283c5yg...-SDS2Util
share
java
SDS2Util.jar
y2ssvzcd86...-SDS2EventGenerator
bin
SDS2EventGenerator
share
java
SDS2EventGenerator.jar
Figure 4.14 Runtime dependencies of the SDS2EventGenerator
our case, the targetEPR attribute represents the address.
77 The type attribute specifies of which type the given service is. This prop-
erty is derived from the services model.
78 Likewise, all other services activation mappings are also defined in this
manifest file, such as the mobileeventlogs database.
79 The targets section lists all the Disnix Service addresses of the target
machines involved in the deployment process. These are used to tem-
porarily lock access in the transition phase.
The manifest file is basically a concrete version of the abstract distribution
model. While the distribution model specifies which services should be dis-
tributed to which machine, it does not reference the actual service that is built
from source code for the given target machine under the given conditions,
i.e. the component in the Nix store, such as /nix/store/x39a1...-StaffTracker. Be-
cause the concrete Nix store path names have to be deter mined, this results
in a side effect in which all services are built from source code for the given
target platforms. The manifest file itself is also built in a Nix expression and
stored in the Nix store.
Transferring closures
The generated manifest file is then used by the disnix-distribute tool. As ex-
plained earlier in Section 2.2.5 of the previous Chapter, Nix can detect run-
time dependencies from a build process, by scanning a component for the hash
codes that uniquely identify components in the Nix store. Because the ELF
header of the java executable contains a path to the standard C library which
is: /nix/store/nqapqr5cyk...-glibc-2.9/lib/libc.so.6, we know by scanning that a specific
Chapter 4. Distributed Deployment of Service-Oriented Systems 85
glibc component in the Nix store is a runtime dependency of java. Figure 4.14
shows the runtime dependencies of the SDS2EventGenerator service.
The disnix-distribute tool will efficiently copy the components and their intra-
dependencies to the machines in the network under the runtime-dependency
relation through remote interfaces. This goal is achieved by copying the clo-
sures of the Disnix profiles defined in the manifest file. Only the components
that are not yet distributed to the target machines are copied, while keeping
components that are already there intact. As explained earlier in Section 2.2,
due to the purely functional properties of Nix, a hash code in a Nix store path
uniquely represents a particular component variant, which we do not have to
replace if it already exists, because it is identical. Only the missing paths have
to be transferred, making the transfer phase as efficient as possible.
Copying Nix components is also a non-destructive process, as no existing
files are overwritten or removed. To perform the actual copy step to a single
target, the disnix-copy-closure tool is invoked.
Transition phase
Finally, the disnix-activate tool is executed, which performs the transition phase.
In this phase, all the obsolete services are deactivated and all the new services
are activated. The right order is derived from the inter-dependencies defined
in the services model. By comparing the manifests of the current deployment
state and the new deployment state, an upgrade scenario can be derived.
During the transition phase, every service receives a lock and unlock re-
quest to which they can react, so that optionally a service can reject an up-
grade when it is not safe or temporarily queue connections so that end users
cannot observe that the system is changing. While the transition phase is on-
going, other Disnix clients are locked out, so that they cannot perform any
deployment operations, preventing that the upgrade process is disturbed.
Then, it will recursively activate all the service distributions (and its inter-
dependencies) defined in the new configuration that are marked inactive. If
a failure occurs, the newly activated services are deactivated and the deacti-
vated ser vices are activated again. If the deactivation of a particular service
fails, the new services are deactivated and the previously deactivated services
are activated again.
If the transition method has succeeded, the manifest file generated earlier
is stored in a coordinator Nix profile of the coordinator machine for further
reference. This manifest file, is used for future deployment operations to
derive the most reliable and efficient upgrade scenario.
Using this method to upgrade a distributed system gives us two benefits.
By traversing the inter-dependency graph of a service we always have the
inter-dependencies of a service activated before activating the service itself,
ensuring that no dependency relationship get broken. Moreover, this method
only deactivates obsolete services and activates new services, which is more
efficient than deploying a new configuration entirely from scratch.
86
#!/bin/bash -e
catalinaBaseDir=@CATALINA_BASE@
case "$1" in
activate)
80
# Link all WAR files in the deployment directory, so that they will
# be activated with hot-deployment
find $(readlink -f $2/webapps) -name \
*
.war | while read i
do
# Link the web service
ln -sfn "$i" "$catalinaBaseDir/webapps/‘basename "$i"‘"
# Link the configuration files if they exist
if [ -d "$2/conf/Catalina" ]
then
mkdir -p $catalinaBaseDir/conf/Catalina/localhost
for j in $2/conf/Catalina/
*
do
ln -sfn "$j" $catalinaBaseDir/conf/Catalina/localhost/‘basename "$j"‘
done
fi
done
;;
deactivate)
81
# Remove WAR files from the deployment directory, so that they will
# be deactivated with hot deployment
find $(readlink -f $2/webapps) -name \
*
.war | while read i
do
# Remove the web service
rm -f $catalinaBaseDir/webapps/‘basename $i‘
# Also remove the configuration files if they exist
if [ -d "$2/conf/Catalina" ]
then
for j in $2/conf/Catalina/
*
do
rm -f $catalinaBaseDir/conf/Catalina/localhost/‘basename "$j"‘
done
fi
done
;;
esac
Figure 4.15 Activation module for Apache Tomcat web applications
Service activation and deactivation
Since services can have basically any form, Disnix provides activation types
which can be connected to activation modules. In essence, an activation mod-
ule is a process that takes 2 arguments, in which the former argument is either
‘activate’ or ‘deactivate’ and the latter is the Nix store path of the service that
has to be activated/deactivated.
Moreover, an activation module often needs to know certain properties of
the system, such as authentication credentials for a database or a port number
on which a certain daemon is running. Therefore, Disnix passes all the prop-
erties defined in the infrastructure model as environment variables, so that it
can be used by the activation module.
Chapter 4. Distributed Deployment of Service-Oriented Systems 87
<?xml version="1.0"?>
<distributedderivation>
<mapping>
<derivation>/nix/store/nysyc9rjayqi...-MELogService.drv</derivation>
<target>http://test1.net/DisnixService/services/DisnixService</target>
</mapping>
<mapping>
<derivation>/nix/store/8j1qya7yi1hv...-mobileeventlogs.drv</derivation>
<target>http://test2.net/DisnixService/services/DisnixService</target>
</mapping>
...
</distributedderivation>
Figure 4.16 A partial distributed derivation file
Figure 4.15 shows the code for the activation module for an Apache Tomcat
web application on Linux:
80 This section defines the activation procedure of an Apache Tomcat web
application. In order to activate an Apache Tomcat web application a
symlink to the WAR file in the Nix store is created in the webapps/ di-
rectory of Tomcat, automatically triggering a hot deploy operation. Fur-
thermore, also the generated context XML is symlink into the conf/Catali-
na/ directory, so that components requiring a database can find their
required settings.
81 This section defines the deactivation procedure. Here, the symlink to
the WAR file as well as the context XML file of a web application are
removed. This operation triggers a hot undeployment operation on
Apache Tomcat.
Similar activation modules are developed for other types, such as axis2-
webservice which will hot deploy a web service in an Axis2 container, mysql-
database which will initialise a MySQL database schema on startup or process
which will start and kill a generic process. A developer or system adminis-
trator can also implement a custom activation module used for other types
of services or use the wrapper activation module, which will invoke a wrapper
process with a standard interface included in the service. (Observe that differ-
ent platforms may require a different implementation of an activation module,
e.g. activating a web service on Microsoft Windows consists of different steps
to be performed than on UNIX based systems).
Distributed building
Disnix provides an optional feature to build services on the target machines
instead of the coordinator machine. If this option is enabled, two additional
modules are used. The disnix-instantiate tool will generate a distributed derivation
file, as shown in Figure 4.16.
This file is also an XML file similar to the manifest file, except that it maps
Nix store derivation files (low-level specifications that Nix uses to build com-
ponents from source code, described earlier in Section 2.2.9) to target ma-
88
Figure 4.17 Architecture of the DisnixService
chines in the network. Because the distributed derivation files contains abso-
lute Nix store paths to these store derivation files (and these store derivation
contain references to all other build-time derivations), they are detected as
runtime-dependencies and copied to the target machines.
It then invokes disnix-build with the derivation file as argument. This tool
transfers the store derivation files to the target machines in the network, builds
the components on the target machines from these derivation files, and finally
copies the results back to the coordinator machine. The disnix-copy-closure tool
is invoked to copy the store derivations to a target machine and to copy the
build result back.
Finally, it performs the same steps as it would do without this option en-
abled. In this case disnix-manifest will not build every ser vice on the coordinator
machine, since the builds have already been performed on the machines in the
network. Also, due to the fact that Nix uses a purely functional deployment
model, a build function always gives the same result if the input parameters
are the same, regardless of the machine that performed the build.
4.4.2 Disnix Service
Some of the tools explained in the previous subsections have to connect to
the target machines in the network to perform deployment steps remotely,
e.g. to perform a remote build or to activate/deactivate a service. Remote
access is provided by the DisnixService, which must be installed on every
target machine.
Architecture
The Disnix service consists of two major components, shown in Figure 4.17.
We have a component called the core that actually executes the deployment
operations by invoking the Nix package manager to build and store compo-
nents and the activation modules package to activate or deactivate a service.
The wrapper exposes the methods remotely through an RPC protocol of choice,
such as an SSH connection or through the SOAP protocol provided by an ex-
ternal add-on package called DisnixWebService.
There are two reasons why we have chosen to make a separation between
the interface component and the core component:
Integration. Many users have specific reasons why they choose a par-
ticular protocol e.g. due to organisation policies. Moreover, not every
Chapter 4. Distributed Deployment of Service-Oriented Systems 89
protocol may be supported on every type of machine.
Privilege separation. To execute Nix deployment operations, super-user
privileges on UNIX-based systems are required. The wrapper may
need to be deployed on an application server hosting other applications,
which could introduce a security risk if it is running as super user.
Custom protocol wrappers can be trivially implemented in various pro-
gramming languages. As illustrated in Figure 4.17, the Disnix core service
can be reached through the D-Bus protocol [Freedesktop Project, 2010b], used
on many Linux systems for inter-process communication. In order to create a
custom wrapper, a developer has to map RPC calls to D-Bus calls and imple-
ment file transport of the components. D-Bus client libraries exist for many
programming languages and platforms, including C, C++, Java and .NET and
are supported on many flavours of UNIX and Windows.
Disnix interface
In order to connect to a machine in the network, an external process is in-
voked by the tools on the deployment coordinator machine. An interface can
be configured by various means, such as specifying the DISNIX_CLIENT_INTER-
FACE environment variable. The Disnix toolset includes two clients: disnix-
client, a loopback interface that directly connects to the core service through
D-Bus; and disnix-ssh-client, which connects to a remote machine by using a
SSH connection (which is the default option).
The separate DisnixWebService package includes an additional interface called
disnix-soap-client, which uses the SOAP protocol to invoke remote operations
and perfor ms file transfers by using MTOM, an XML parser extension to in-
clude binary attachments without overhead.
A custom interface can be trivially implemented by writing a process that
conforms to a standard interface, calling the custom RPC wrapper of the Dis-
nix service.
4.4.3 Libraries
Many of the tools in the Disnix toolset require access to information stored
in models. Since this functionality is shared across several tools we imple-
mented access to these models through libraries. The libmanifest component
provides functions to access properties defined in the manifest XML file, li-
binfrastructure provides functions to access to properties defined in the infras-
tructure model and libdistderivation provides functions to access properties in a
distributed derivation file.
We rather use library interfaces for these models, since they are only ac-
cessed by tools implemented in the C programming language and do not
have to be invoked directly by end users.
90
Figure 4.18 An image produced by disnix-visualize, showing a deployment scenar io
of SDS2 services across two machines
4.4.4 Additional tools
Apart from disnix-env and the tools that compose it, the Disnix toolset includes
several other utilities that may help a user in the deployment process of a
system:
disnix-query. Shows the installed services on every machine defined in
the infrastructure model.
disnix-collect-garbage. Runs the garbage collector in parallel on every ma-
chine defined in the infrastructure model to safely remove components
that are no longer in use.
disnix-visualise. Generates a clustered graph from a manifest file to visu-
alise services, their inter-dependencies and the machines to which they
are distributed. The dot tool [Graphviz, 2010] can be used to convert the
graph into an image, such as PNG or SVG.
An example of a visualisation is shown in Figure 4.18, demonstrating
a deployment scenario of SDS2 in which services are distributed across
two machines. As you may notice, the architecture described earlier in
Figure 4.4, already looks complicated and as a result dividing services
across machines results in a significant increase of complexity.
disnix-gendist-roundrobin. Generates a distribution model from a services
and infrastructure model, by applying the round robin scheduling method
to divide services over each machine in equal portions and in circular
order.
4.5 E X P E R I E N C E
We have modelled all the SDS2 components (databases, web services, batch
processes and web application front-ends) to automatically deploy the plat-
form in a small network of consisting of four 64-bit and 32-bit Linux machines.
In addition to SDS2, we have also developed a collection of publicly avail-
able example cases, demonstrating the capabilities of Disnix. These examples
also include source code. Currently, the following public examples are avail-
able:
StaffTracker (PHP/MySQL version), a distributed system composed of sev-
eral MySQL databases and a PHP web-application front-end.
Chapter 4. Distributed Deployment of Service-Oriented Systems 91
StaffTracker (Web services version). A more complex variant of the pre-
vious example implemented in Java, in which web services provide in-
terfaces to the databases. This is the example system we have used
throughout the WASDeTT and SCP papers [van der Burg and Dolstra,
2010d].
StaffTracker (WCF version). An experimental port of the previous web
services example to .NET technology using C# as an implementation
language, Windows Communication Foundation (WCF) as underlying
technology for web services. This example is supposed to be deployed
in an environment using Microsoft IIS and SQL server as infrastruc-
ture components. More details about .NET deployment can be found in
Chapter 12.
ViewVC. An example case using ViewVC (http://viewvc.tigris.
org), a web-based CVS and Subversion repository viewer implemented
in Python. The Disnix models can be used to automatically deploy
ViewVC, a MySQL database storing commit records, and a number Sub-
version repositories.
Hello World example. A trivial example case demonstrating various ways
to compose web services. In this example, a basic composition is demon-
strated, an alternative composition by changing an inter-dependency,
and two advanced compositions, using a lookup service and load bal-
ancer, which dynamically compose services.
Disnix proxy example. A trivial example case demonstrating the use of
a TCP proxy which temporarily queue network connections during the
transition phase. This is the example we used in one of our previous pa-
pers about atomic upgrading [van der Burg et al., 2008]. Furthermore,
this is the most trivial example case using self-hosted service compo-
nents, which do not require any infrastructure components.
NixOS configurations. In this example, we show that we can also describe
complete NixOS systems as services and that we can upgrade complete
machine configurations in the network.
4.6 R E S U LT S
The initial deployment process (building the source code of all SDS2 com-
ponents, transferring components and its dependencies and activating the
services) took about 15 minutes. At Philips, this previously took hours in the
semi-manual deployment process on a single machine. (Manually deploying
an additional machine took almost the same amount of time and was usually
too much effort.)
We were also able to upgrade (i.e. replacing and moving services) a run-
ning SDS2 system and to perform rollbacks. In this process, only the changed
services were updated and deactivated/activated in the right order, which
92
only took a couple of seconds in most cases. In all the scenarios we tested
no service ran into an inconsistent state due to breaking inter-dependency
connections.
The only minor issue we ran into was that during the upgrade phase, the
web application in the user’s browser (which is not under Disnix’ control) was
not refreshed, which may break certain features. However, we have reduced
the time window in which the deployed system is inconsistent to a minimum
the installation and upgrading of packages can be completely done without
affecting a previous system configuration at all, which is hard (or sometimes
impossible) to do with conventional deployment systems. Moreover, the tran-
sition phase in which services are deactivated and activated is in most
cases only a few seconds. Chapter 8 describes a concept and prototype ex-
ample that can be used to hide this small side-effect, to allow fully atomic
upgrades.
Although Disnix can activate databases, we currently cannot migrate them
to other machines dynamically. Disnix activates a database by using a static
representation (i.e. dump) but is not able to capture and transfer the state of a
deployed database. In Chapter 9 we have developed a prototype replacement
for the Disnix activation scripts module capable of migrating state.
Disnix does not deploy infrastructure components, however. Although Java
web services, web applications and many other types of components can
be built and deployed automatically, infrastructure components such as the
Apache Tomcat application server or a MySQL database server must be de-
ployed by other means.
In the next chapter, we quantify deployment times of certain events and we
present results on the additional examples cases.
4.7 R E L AT E D W O R K
There is quite an amount of work on deployment of services in distributed
environments, which we can categorise in various groups.
4.7.1 Component-specific
A number of approaches limit themselves to specific types of components,
such as the BARK reconfiguration tool [Rutherford et al., 2002], which only
supports the software deployment life-cycle of EJBs and supports atomic up-
grading by changing the resource attached to a JNDI identifier.
Akkerman et al. [2005] have implemented a custom infrastructure on top
of the JBoss application server to automatically deploy Java EE components.
In their approach, they use a component description language to model com-
ponents and their connectivity and an ADL for modelling the infrastructure,
to automatically deploy Java EE components to the right application servers
capable of hosting the given component. The goals of these models are similar
to the Disnix services and infrastructure models; however, they are designed
specifically for Java EE environments. Moreover, the underlying deployment
Chapter 4. Distributed Deployment of Service-Oriented Systems 93
tooling do not have capabilities such as storing components safely in isolation
and performing atomic upgrades.
In [Mikic-Rakic and Medvidovic, 2002] an embedded software architecture
is described that supports upgrading parts of the system as well as having
multiple versions of a component next to each other. A disadvantage is that
the underlying technology, such as the infrastructure and run-time system,
cannot be upgraded and that the deployment system depends on the technol-
ogy used to implement the system.
Other approaches use component models and language extensions, such
as [Chen and Simons, 2002] which proposes a conceptual component frame-
work for dynamic configuration of distributed systems. In [Costanza, 2002]
the authors developed the GILGUL extension to the Java language for dy-
namic object replacement. Using such approaches requires the developers or
system administrators to use a particular framework or language.
4.7.2 Environment specific
Approaches for specific environments also exist. GoDIET [Caron et al., 2006]
is a utility for deployment of the DIET grid computing platform. It writes
configuration files, stages the files to remote resources, provides an appropri-
ately ordered and timed launch of components, and supports management
and tear-down of the distributed platform. In [Laszewski et al., 2002] the
Globus toolkit is described for performing three types of deployment scenar-
ios using predefined types of components compositions in a grid computing
environment. CODEWAN [Roussain and Guidec, 2005] is a Java-based plat-
form designed for deployment of component-based applications in ad-hoc
networks.
4.7.3 General approaches
Several generic approaches have been developed. The Software Dock is a dis-
tributed, agent-based deployment framework to support ongoing cooperation
and negotiation among software producers and consumers [Hall et al., 1999].
It emphasises itself on the delivery process of components from producer to
consumer site, rather than complete deployment. In [Dumitras et al., 2007]
a dependency-agnostic upgrade concept with running distributed systems is
described in which the old and new configurations of a distributed system are
deployed next to each other in isolated runtime environments. Separate mid-
dleware forwards requests to the new configuration at a certain point in time.
TACOMA [Sudmann and Johansen, 2002] uses agents to perform deployment
steps and is built around RPM. This approach has various limitations, such as
the inability to perform atomic upgrades or safely install multiple variants of
components next to each other. Disnix overcomes these limitations.
In [Eilam et al., 2006], a model-driven application deployment architecture
is described, containing various models capturing separate concerns in a de-
ployment process, such as the application structure, network topology of the
94
data center, and transformation rules. These models are provided by various
stake holders involved in the deployment process of a distributed application
in data centers, such as as developers, domain experts and data center op-
erators. These models are transformed and optimised by various tools into
a concrete deployment plan which can be used to perform the actual de-
ployment process, taking various functional requirements and non-functional
requirements into account. These techniques are applied in various contexts,
for example to automatically compose services in a cloud environment [Kon-
stantinou et al., 2009] and can be bridged to various tools to perform the
deployment and provisioning [El Maghraoui et al., 2006].
The models used by Disnix in this chapter are merely executable specifica-
tions used to perform the actual deployment process and not designed for de-
ployment planning. Moreover, the semantics of the Nix expression language
that Disnix uses are close to the underlying purely functional deployment
model. However, in the next chapter, we have implemented an extension on
top of Disnix, providing additional models to specify a policy of mapping ser-
vices to machines in the network based on functional properties defined in the
service and infrastructure models. Also, various deployment planning algo-
rithms described in the literature can be used for this pur pose. In conjunction
with a discovery service service-oriented systems can be made self-adaptable.
It could also be possible to integrate Disnix components into the architecture
of [Eilam et al., 2006] so that their deployment planning techniques can be
used and our deployment mechanics to make a software deployment process
reliable.
The Engage system [Fischer et al., 2012] has a big conceptual overleap
with Disnix. Engage also allows components and infrastructure to be mod-
elled as separate entities from a high-level perspective and makes a distinc-
tion between components residing on the same machine (local dependencies),
within a container (inside dependencies) and dependencies between compo-
nents that may reside on different machines in the network (peer dependen-
cies). A major difference between Engage and Disnix is that Engage uses a
constraint-based algorithm to resolve dependencies, whereas Disnix uses the
deployment model of Nix in which services are constructed and configured
by function calls. Moreover, Engage uses an imperative deployment facility to
perform the actual deployment steps.
Another notable aspect that sets Disnix apart from other deployment tools,
is that every service component must be managed through the Nix store (al-
though there are ways to refer to components on the host system, which is
not recommended). Disnix cannot be used to manage the deployment of ex-
isting service components deployed by other means. If we would be able to
refer to components outside this model, we lose all the advantages that the
purely functional deployment model of Nix offers, such as the fact that we can
uniquely distinguish between variants of components, store them safely next
to each other and safely and (nearly) atomically upgrade components. If the
DisnixOS extension is used (described later in Chapter 6), also the underlying
infrastructure components are managed by Nix and thus cannot be used to
Chapter 4. Distributed Deployment of Service-Oriented Systems 95
manage infrastructure components deployed by other means. In some cases,
upgrades with Disnix can be more expensive than other deployment tools,
since everything has to managed by the Nix package manager and compo-
nents are bound to each other statically, but in return the Nix deployment
model provides very powerful reliability aspects.
4.7.4 Common practice
In practice many software deployment processes are partially automated by
manually composing several utilities together in scripts and use those to per-
form tasks. While this gives users some kind of specification and reproducibil-
ity, it cannot ensure properties such as correct deployment.
4.8 C O N C L U S I O N
We have shown Disnix, a distributed deployment extension to Nix, used to
automatically deploy, upgrade and roll back a service-oriented system such
as SDS2 consisting of components of various types in a network of machines
running on different platforms. Our experiments have shown that we can
automatically and reliably deploy the application components of SDS2 and
that the automated process yields a significant improvement in deployment
time, compared to the original semi-manual deployment process. In addi-
tion to SDS2, we have also implemented a collection of example systems im-
plemented in various programming languages and using various component
technologies.
Because Disnix is extensible, takes inter-dependencies into account, and
builds on the purely functional properties of Nix, we can safely upgrade only
necessary parts, and deactivate and activate the required services of a system,
making the upgrade process efficient and reliable.
In Chapter 6 we show DisnixOS, an extension to Disnix which can deploy
infrastructure components next to service components. Chapter 9 illustrates
Dysnomia, a proof of concept tool providing automated deployment of mu-
table state of containers (called mutable components).
96
5
Self-Adaptive Deployment of
Service-Oriented Systems
A B S T R A C T
In addition to the deployment challenges that we have described in the pre-
vious chapter, usually the environment in which distributed systems are de-
ployed is dynamic: any machine in the network may crash, network links may
temporarily fail, and so on. Such events may render the system partially or
completely unusable. If an event occurs, it is difficult and expensive to re-
deploy the system to the take the new circumstances into account. In this
chapter, we present a self-adaptive deployment framework built on top of
Disnix. This framework dynamically discovers machines in the network and
generates a mapping of components to machines based on non-functional
properties. Disnix is then invoked to automatically, reliably and efficiently
redeploy the system.
5.1 I N T R O D U C T I O N
As explained in the previous chapter, service-oriented systems are composed
of distributable services, such as web services, databases, web applications
and batch processes, working together to achieve a common goal. Service-
oriented systems are usually deployed in networks of machines (such as a
cloud environment) having various capabilities and properties, because there
are various technical and non-functional requirements that must be met. For
example, a machine assigned for storage may not be suitable for running a
component doing complex calculations or image rendering. We also want to
deploy services providing access to privacy-sensitive information on a ma-
chine with restricted access.
After a service-oriented system is deployed, it is not guaranteed that it
will stay operational. In networks, various events may occur. A machine may
crash and disappear from the network, or a new machine may be added.
Moreover, a capability of a machine could change, such as an increase of
memory. Some of these events may partially or completely break a system
or make the current deployment suboptimal, in which certain non-functional
deployment requirements are no longer met.
In order to take new circumstances into account after an event, a system
must be redeployed. As we have seen, deployment processes of ser vice-oriented
systems are usually difficult and expensive. For instance, they may require
system administrators to perform manual deployment steps, such as installing
components or editing configuration files. In many cases, it is a lot of effort to
97
build components, transfer the components and all dependencies to the right
machines and to activate the system. Upgrading is even more difficult as this
may completely break the system. Furthermore, finding a suitable mapping of
services to machines is a non-trivial process, since non-functional properties
of the domain (such as quality-of-service requirements) must be supported.
These factors cause a large degree of inflexibility in reacting to events.
Even though Disnix can reliably and conveniently automate a deployment
process of a service-oriented system, it requires specifications which must
be written manually by developers or system administrators. For example,
in order to react to an event, a new distribution model must be provided
satisfying all required technical and non-functional deployment requirements,
which is a labourious and tedious process.
In this Chapter, we describe a self-adaptive deployment extension built on
top of Disnix, the distributed deployment tool for service-oriented systems
described in the previous chapter. This framework continuously discovers
machines and their capabilities in a network. In case of an event, the system
is automatically, reliably and efficiently redeployed to take the new circum-
stances into account. A quality-of-service (QoS) policy model can be specified
to find a suitable mapping of services to machines dealing with desirable
deployment constraints from the domain. We have applied this deployment
extension to several case studies, including SDS2, the industrial case study
from Philips Research covered in the previous chapter.
In contrast to other self-adaptive approaches dealing with events in dis-
tributed environments, Disnix redeploys a system instead of changing proper-
ties of already deployed components, such as its behaviour and connections.
The advantages of this approach are that various types of services are sup-
ported (which may not all have self-adaptive properties) and that we do not
require components to use a specific component technology to make them
self-adaptive.
5.2 M O T I VAT I O N
We have applied the dynamic Disnix extension to several use cases, includ-
ing two custom developed systems, an open-source case study, and SDS2,
the industrial case study covered in the previous chapter. Each case-study
is implemented using various technologies, serving different goals and have
different requirements. We will use these throughout this chapter to motivate
and explain our work.
5.2.1 StaffTracker (PHP/MySQL version)
First, we present two toy systems that represent two different styles of dis-
tributed systems. Both implement the same application: allowing users to
store and query information about staff members of a university, such as
names, IP addresses and room numbers. From a room number of a staff mem-
ber, a zip code can be determined; from a zip code, the system can determine
98
Figure 5.1 Architecture of the StaffTracker (PHP/MySQL version)
the street and city in which the staff member is located. The first variant uses
a web interface written in PHP that accesses data stored in different MySQL
databases.
Figure 5.1 shows the architecture of this variant of the StaffTracker system.
Each object in the figure denotes a distributable component, capable of run-
ning on a separate machine in the network. The arrows denote an inter-
dependency relationship, which are dependencies between distributable com-
ponents. The StaffTracker system is composed of various MySQL databases
(denoted by a database icon) storing a dataset, and a web application front-
end implemented in PHP (denoted by a rectangle) allowing end users to query
and modify staff members.
In order to deploy this system, several technical requirements must be met.
First, all MySQL databases must be deployed on a machine running a MySQL
DBMS. Secondly, the web application front-end must be deployed on a ma-
chine running an Apache web server instance.
Various events may occur in a network of machines running this system. A
machine may crash, making it impossible to access a specific dataset. More-
over, if the machine running the web application front-end crashes, the sys-
tem becomes completely inaccessible. It is also possible that a new machine
is added, such as a database server, which may offer additional disk space. In
such cases, the system must be redeployed to take the new circumstances into
account.
To redeploy this system, a system administrator must install databases on a
MySQL server or web applications on an Apache web server. Moreover, he has
to adapt the configuration of the web application front-end, so that a database
from a different server is used. Furthermore, these steps must be performed
in the right order. If, for instance, the system administrator activates the web
application front-end, before all databases are activated specific features may
be inaccessible to end users.
5.2.2 StaffTracker (Web services version)
The second variant, shown in Figure 5.2, represents a SOA-style system based
on separate web services that each provide access to a different database. Ad-
ditionally, it provides a GeolocationService web service using GeoIP [MaxMind,
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 99
Figure 5.2 Architecture of the StaffTracker (Web services version)
2010] to determine the location of a staff members, based on his or her IP
address. The web services and web application components of this variant of
the StaffTracker are implemented in Java and require Apache Tomcat
1
for web
application hosting and management.
In order to deploy this system correctly, we must deploy MySQL databases
on machines running a MySQL DBMS (similar to the previous use case), but
we must also deploy web service and web application components on ma-
chines running Apache Tomcat.
In case of an event, redeployment takes even more effort than the previous
example. If the location of a database or web service changes, a system ad-
ministrator has to adapt all the configuration files of all dependent services.
Moreover, the system administrator has to repackage the web services and
web applications, transfer them to the right machines in the network and de-
activate and activate them in the right order (otherwise features may become
inaccessible).
5.2.3 ViewVC
ViewVC
2
is a free and open-source web-based Subversion and CVS repository
viewer implemented in Python. ViewVC is able to connect to remote Subver-
sion repositories in a network. It can optionally use a MySQL database to
query commit records used for development analysis tasks. The architecture
is shown in Figure 5.3.
As in the previous examples, we must ensure that the web application
front-end is deployed on a machine running an Apache web server and the
database on a machine running a MySQL DBMS. We must also ensure that
Subversion repositories are deployed on machines running svnserve, which
provides remote access to a Subversion repositor y through the SVN protocol.
1
http://tomcat.apache.org
2
http://viewvc.tigris.org
100
Figure 5.3 Architecture of ViewVC
5.2.4 SDS2
SDS2 [de Jonge et al., 2007] is the industrial case study we have used as main
motivation in the previous chapter. As explained earlier, medical devices
generate status and location events and SDS2 is a service-oriented architecture
designed by Philips Research for Asset Tracking and Utilisation in hospitals.
Apart from the technical requirements for deployment, such as mapping
services with a specific type to machines capable of running them, there are
also non-functional requirements which must be supported. For example, the
datasets containing events logs must be stored inside a hospital environment
for performance and privacy reasons. The services generating workflow anal-
ysis results and summaries must be deployed inside the Philips enterprise
environment, since Philips has a more powerful infrastructure and may want
to use generated data for other purposes.
In order to redeploy SDS2, nearly every service must be adapted (due to the
fact that every service uses a global configuration file containing all locations
of every service), repackaged, transferred to the right machines in the network
and deactivated and activated in the right order (a process which normally
takes several hours to perform manually).
5.3 A N N O TAT I N G D I S N I X M O D E L S
As explained in the previous chapter, we have developed Disnix, a distributed
deployment tool built on top of the Nix package manager to deal with the
complexity of the deployment process of service-oriented systems. Disnix
offers various advantages compared to related tooling.
We have also explained that Disnix deployment processes are automated
by three models; the services model specifying of which services a system is
composed and how they are connected, the infrastructure model capturing the
available machines in the network and their relevant properties/capabilities
and the distribution model mapping services to machines. In order to dynami-
cally redeploy systems, some of these models must be annotated with relevant
technical and non-functional properties.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 101
{distribution, pkgs, system}:
let
customPkgs = import ../top-level/all-packages.nix {
inherit pkgs distribution system;
};
in
rec {
mobileeventlogs = {
name = "mobileeventlogs";
pkg = customPkgs.mobileeventlogs;
type = "mysql-database";
stateful = true;
requiredZone = "hospital";
82
};
MELogService = {
name = "MELogService";
pkg = customPkgs.MELogService;
dependsOn = {
inherit mobileeventlogs;
};
type = "tomcat-webapplication";
requiredZone = "hospital";
};
SDS2AssetTracker = {
name = "SDS2AssetTracker";
pkg = customPkgs.SDS2AssetTracker;
dependsOn = {
inherit MELogService ...;
};
type = "tomcat-webapplication";
requiredZone = "philips";
83
};
...
}
Figure 5.4 Annotated services.nix services model for SDS2
5.3.1 Services model
Figure 5.4 shows an annotated services model for SDS2. This model is almost
entirely the same as the services model shown in the previous chapter. The
model captures the same properties, such as the available services, how to
build them, their inter-dependencies and their types. Additionally, each serv-
ice is augmented with custom attributes, describing miscellaneous technical
properties and non-functional properties:
82 The requiredZone attribute is a custom attribute, which we have used to
specify that a component must be deployed in a particular zone. In this
case, we have specified that the mobileeventlogs service, must be deployed
on a machine physically located hospital environment. This is a very
important requirement, which we have also outlined in the previous
chapter.
83 Here, the requiredZone attribute specifies that the SDS2AssetTracker service
should be deployed inside a Philips data center.
102
{
test1 = {
hostname = "test1.net";
tomcatPort = 8080;
mysqlUser = "user";
mysqlPassword = "secret";
mysqlPort = 3306;
targetEPR = http://test1.net/.../DisnixService;
system = "i686-linux";
supportedTypes = [
"process" "tomcat-webapplication" "mysql-databases" "ejabberd-databases"
84
];
zone = "hospital"; 85
};
test2 = {
hostname = "test2.net";
tomcatPort = 8080;
...
targetEPR = http://test2.net/.../DisnixService;
system = "x86_64-linux";
supportedTypes = [ "process" "tomcat-webapplication" ];
zone = "philips";
};
}
Figure 5.5 An annotated infrastructure.nix model for SDS2
These custom attributes are not directly used by the basic Disnix toolset for
deployment, but can be used for purposes such as expressing QoS require-
ments and to dynamically deploy ser vices, as we will see in Section 5.5.
5.3.2 Infrastructure model
Likewise, infrastructure models can also be annotated with custom properties
capturing technical and non-functional properties. In Figure 5.5, we have
annotated the SDS2 infrastructure model with two custom properties:
84 The supportedTypes attribute is a list specifying what types of services
can be deployed on the given machine. For example, the test1 machine
is capable of running generic UNIX processes, Apache Tomcat web ap-
plications, MySQL databases and ejabberd databases.
85 The zone attribute specifies in which zone the machine is located. In our
example, test1 is located in the hospital environment and test2 is located
in the Philips environment.
5.4 S E L F - A D A P T I V E D E P L O Y M E N T
As we have explained in the previous chapter, Disnix is a good basis for
distributed deployment, because it can safely install components without in-
terference and reliably install or upgrade a distributed system. Moreover, it
makes the time of the transition phase (in which the system is inconsistent)
as small as possible. All these features significantly reduce the redeployment
effort of a system.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 103
Figure 5.6 A self-adaptive deployment architecture for Disnix
However, Disnix requires an infrastructure model which is specified man-
ually by a developer or system administrator. For large networks in which
systems may disappear or added or properties may change, this is infeasible
to maintain. Moreover, for each event in a network, a new distribution model
must be specified. Not every machine may be capable of running a particular
service defined in the service model. Manually specifying a new suitable dis-
tribution taking certain non-functional requirements into account is usually
difficult and expensive.
In order to react automatically to events occurring in the network, we have
extended the Disnix toolset by providing automatic generation of the infras-
tructure and distribution models.
Figure 5.6 shows an extended architecture of Disnix to support self-adaptive
deployment. Rectangles represent objects that are manipulated by the tools.
The ovals represent a tool from the toolset performing a step in the deploy-
ment process. In this architecture several new tools are visible:
The infrastructure generator uses a discovery service to detect machines
in the network and their capabilities and properties. An infrastructure
model is generated from these discovered properties.
The infrastructure augmenter adds additional infrastructure properties to
the generated infrastructure model that are not announced by the ma-
chines. (Some properties are not desirable to announce because they are
privacy-sensitive, such as authentication credentials).
104
The distribution generator maps services in the services model to targets
defined in the generated and augmented infrastructure model. A quality
of ser vice (QoS) model is used to specify a policy describing how this
mapping should be performed, based on quality attributes specified in
the services and infrastructure model.
In this framework, a user provides three models as input parameters:
The services model, as we have seen in Section 5.3.1, captures all the
services constituting the system.
The augmentation model is used to add infrastructure properties to the
infrastructure model, which cannot be provided by the discovery serv-
ice, such as authentication credentials.
The quality-of-service model (QoS) is used to describe how services can
be distributed to target machines, based on QoS properties defined in
the services and infrastructure models.
After the infrastructure and distribution models have been generated, they
are passed to the disnix-env tool along with the services model. The disnix-
env tool then performs the actual deployment and the system is upgraded
according to the new distribution. By continuously executing this system
when events occur, a system becomes self-adaptive.
5.4.1 Infrastructure generator
We have implemented an infrastructure generator based on Avahi, a multicast
DNS (mDNS) and DNS service discovery (DNS-SD) implementation [Freedesk-
top Project, 2010a]. We have chosen Avahi for practical reasons, because it is
light-weight and included in many modern Linux distributions. Although we
only have a DNS-SD-based infrastructure generator, the architecture of the
self-adaptive deployment extension is not limited to a particular discovery
protocol. The infrastructure generator is a process that can be relatively easily
replaced by a generator using a different discovery protocol, such as SSDP.
Every target machine in the network multicasts properties and capabilities
in DNS records through DNS-SD. Some infrastructure properties are pub-
lished by default, such as the hostname which is required for the coordinator
machine to perform remote deployment steps, system specifying the operat-
ing system and architecture, and supportedTypes which specifies the types of
services which can be activated on the machine, such as mysql-database and
tomcat-webapplication. The list of supported types is configured by the installa-
tion of the DisnixService, which, for instance, provides a mysql-database activa-
tion script if the configuration script detects a MySQL instance running on the
target machine. Moreover, some hardware capabilities are published, such as
mem which publishes the amount of RAM available. A system administrator
can also publish custom properties through the same service as DNS records,
such as zone representing the location in which the machine is based.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 105
{infrastructure, lib}: 86
lib.mapAttrs 87
(target:
if target ? mysqlPort 88
then target // {
mysqlUsername ="foo"; mysqlPassword = "secret"; 89
}
else target
) infrastructure
Figure 5.7 augment.nix: A model augmenting the infrastructure model
The infrastructure generator captures the DNS-SD properties announced
by the target machines and uses it to generate an infrastructure model. The
generation step is quite trivial, because DNS records consist of key = value
pairs that map onto Nix attribute sets in a straight-forward manner.
5.4.2 Infrastructure augmenter
As mentioned before, it may not be desirable to publish all infrastructure
properties through a discovery service, such as authentication credentials re-
quired to deploy a MySQL database. Therefore, the generated infrastructure
model may need to be augmented with additional infrastructure properties.
Figure 5.7 shows an example of an augmentation model:
86 The augmentation model is a function taking two arguments. The infras-
tructure parameter represents the infrastructure model generated by the
discovery service. The lib parameter refers to an attribute set of library
functions, provided by Nixpkgs. The remainder of the model modifies
the attribute set containing infrastructure properties.
87 In the remainder of the augmentation model, we call the lib.mapAttrs func-
tion to iterate over each target attribute in the infrastructure model at-
tribute set.
88 Here, we have defined that if a MySQL port is defined in a machine
attribute set, we augment the MySQL authentication credentials 89 so
that a database can be deployed.
The strategy of augmenting additional infrastructure properties depends
on the policy used for the infrastructure and the domain in which the system
is used. In this example, we assume that every MySQL server uses the same
authentication credentials. There may be situations in which each individual
machine requires a separate username and password or perhaps groups of
machines. Other properties and transformations can be specified in this model
by using Nix expression language constructs and functions defined in the lib
attribute set.
106
{ services, infrastructure
, initialDistribution, previousDistribution, filters}: 90
filters.filterB {
91
inherit services infrastructure;
...
distribution = filters.filterA {
inherit services infrastructure;
distribution = initialDistribution;
...
};
}
Figure 5.8 qos.nix: A quality-of-service model implementing a policy
5.4.3 Distribution generator
Given the automatically discovered infrastructure model, Disnix must now
generate an distribution model that maps services onto machines. Automat-
ically generating a suitable distribution is not a trivial task. In some cases
it is difficult to specify what is an optimal distribution, because optimising
a particular attribute may reduce the quality of another important quality
attribute, a phenomenon called Pareto optimality in multidimensional optimi-
sation [Medvidovic and Malek, 2007; Malek et al., 2012]. Moreover, in some
cases finding a suitable distribution are NP-hard problems for which no algo-
rithm that can solve it in polynomial time is known, which makes it infeasible
to find an optimal distribution for large systems.
Furthermore, in many cases it is desired to implement constraints from the
domain, which must be reflected in the distribution of services to machines.
These constraints are largely context-specific. For example, constraints in the
medical domain (such as privacy) are not always as important in other do-
mains in which performance may be much more important. Also, the notion
of certain quality attributes may be different in each domain.
Because of all these issues, there is no silver bullet that always generates a
suitable distribution for a service-oriented system. Therefore, a developer or
system administrator must specify a suitable deployment strategy in a quality
of service model to deal with non-functional requirements from the domain.
The Disnix toolset contains various functions that assist users with specifying
deployment strategies. Apart from a number of included functions, custom
algorithms and tooling can be easily integrated into the toolset to support
desirable distribution strategies.
A distribution model can be generated from a service and infrastructure
model by using a quality-of-service model in which developers or system ad-
ministrators can combine filter functions that determine to which machine a
service can be deployed, by taking certain technical and non-functional con-
straints of services and machines into account.
The general structure of a quality-of-service model is shown in Figure 5.8:
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 107
90 A QoS model is a function taking several arguments. The services argu-
ment represents the services model and the infrastructure argument rep-
resents the infrastructure model. The initialDistribution represents a dis-
tribution model which maps services to candidate targets. Normally,
every target defined in the infrastructure model is considered a candi-
date host for every service. The previousDistribution argument represents
the distribution model of the previous deployment (or null in case of an
initial deployment). This attribute set is derived by transforming the
last manifest file stored in the Nix coordinator profile to a distribution
expression. This attribute set can be used to reflect about the upgrade
effort or to keep specific services on the same machines. The filters at-
tribute represents an attribute set containing various functions, which
filter and divide services defined in the initial distribution model.
91 In the body of the model, a developer must filter and transform the
initialDistribution attribute set into a suitable distribution, by applying vari-
ous functions defined in filters. Every function takes a distribution model
as input argument and optionally uses the services and infrastructure
models. The result of a function is a new distribution model taking the
effects of the filter into account. The result can be passed to another filter
function or used for the actual deployment. The process of composing
filter functions is repeated until a desirable distribution is achieved.
5.5 I M P L E M E N T I N G A Q U A L I T Y O F S E RV I C E M O D E L
The QoS policy specifications in Disnix consist of two parts, which are in-
spired by [Heydarnoori et al., 2006; Heydarnoori and Mavaddat, 2006]. First,
a candidate host selection phase is performed, in which for each service suitable
candidate hosts are determined. This phase sets the deployment boundaries
for each service. After a suitable candidate host selection is generated, the
division phase is performed in which services are allocated to the candidate
hosts. The dynamic deployment framework offers various functions defined
in the Nix expression language that can be composed to generate a desirable
distribution strategy dealing with non-functional properties from the domain.
5.5.1 Candidate host selection phase
Before we can reliably assign services to machines in the network, we must
first know on which machines a service is deployable. There are many con-
straints which render a service undeployable on a particular machine. First,
there are technical constraints. For example, in order to deploy a MySQL
database on a machine in the network, a running MySQL DBMS is required.
Moreover, a component may also have other technical limitations, e.g. if a
component is designed for Linux-based systems it may not be deployable on
a different operating system, such as Microsoft Windows.
Second, there are non-functional constraints from the domain. For instance,
in the medical domain, some data repositories must be stored within the phys-
108
{ services, infrastructure
, initialDistribution, previousDistribution, filters
}:
filters.mapAttrOnAttr {
inherit services infrastructure;
92
distribution = initialDistribution; 93
serviceProperty = "requiredZone"; infrastructureProperty = "zone"; 94
}
Figure 5.9 Candidate host selection using mapAttrOnAttr
ical borders of a specific country because this is required by the law of that
particular country. In order to filter a distribution, various mapping functions
are available.
One-to-One mapping
The function mapAttrOnAttr can be used to map a value of an attribute defined
in the services model onto a value of an attribute defined in the infrastructure
model.
Figure 5.9 shows an example application of this function, which creates
a candidate host selection for the service model defined in Figure 5.4 and
infrastructure model defined in Figure 5.5 ensuring that a service that must
be deployed in a particular zone is mapped onto machines located in that
zone:
92 The mapAttrOnAttr function requires the services and infrastructure mod-
els, which can be provided without any modifications.
93 The distribution model that we want to filter is the initial distribution
model, which considers every machine in the infrastructure model as a
potential candidate for every service.
94 The arguments on this line specify that the requiredZone attribute defined
in the services model must be mapped onto the zone attribute in the
infrastructure model.
The result of the expression in Figure 5.9 is a new distribution model, in
which every service belonging to a particular zone refers to machines that are
located in that zone.
One-to-Many mapping
The function mapAttrOnList can be used to map a value of a service attribute
onto a value defined in a list of values defined in the infrastructure model.
Figure 5.10 shows how this function can be used to create a candidate host
selection that ensures that a service of a specific type is mapped to machines
supporting that particular type:
95 This line specifies that the type attribute in the ser vices model is mapped
onto one of the values in the supportedTypes list.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 109
{ services, infrastructure
, initialDistribution, previousDistribution, filters
}:
filters.mapAttrOnList {
inherit services infrastructure;
distribution = initialDistribution;
serviceProperty = "type"; infrastructureProperty = "supportedTypes"; 95
}
Figure 5.10 Candidate host selection using attrOnList
{ services, infrastructure
, initialDistribution, previousDistribution, filters
}:
filters.mapStatefulToPrevious {
inherit services;
distribution = initialDistribution;
inherit previousDistribution;
}
Figure 5.11 Candidate host selection using mapStatefulToPrevious
The expression described in Figure 5.10 is very valuable for all our example
cases. This function ensures that a MySQL database is deployed on a machine
running a MySQL DBMS, a Apache Tomcat web application is deployed on a
server running Apache Tomcat and that other ser vices of a specific type are
deployed on the right machines.
Many-to-One mapping
Similarly, a function mapListOnAttr is provided to map a list of values in the
service model onto an attribute value defined in the infrastructure model.
Restrict stateful components
Another issue is that ser vices of a system may be stateful. For instance, data
in a MySQL database may change after it has been deployed. Disnix does
not migrate these changes to another machine in case of a redeployment.
Furthermore, migrating very large data sets can be very expensive.
By defining the property stateful = true in the services model (for example,
for the mobileeventlogs service in Figure 5.4), a developer can specify that a
service has mutable state.
By using the mapStatefulToPrevious function, as shown in Figure 5.11, a de-
veloper can specify that in case of an upgrade, the stateful service is mapped
only to the hosts of the previous deployment scenario. This ensures that a
service such as a MySQL database is not moved and the data is preser ved.
Filter unbuildable components
There is also a brute force filterBuildable function available, which tries to in-
stantiate every derivation function of a service on every machine type of ma-
chine in the infrastructure model and checks whether there are no exceptions
110
thrown. As a consequence, unbuildable functions are automatically filtered
out.
Although this function can be very useful in some cases, it also quite time
consuming to evaluate all build functions of a system. Therefore, it is not
enabled by default. Preferably, users should use other strategies to determine
whether a target is a suitable candidate for a service.
5.5.2 Division phase
After a candidate host selection has been made, the services must be divided
over the candidate hosts by using a suitable strategy, as there are typically still
multiple machines available to host particular services. In the literature, vari-
ous deployment planning algorithms are described that try to find a suitable
mapping of services to machines in certain scenarios. However, there is no
algorithm available that always provides the best distribution for a particular
system, nor it is always doable to find an optimal distribution. Therefore,
suitable strategies must be selected and configured by the user.
Interfacing with external tools
Disnix uses the Nix expression language to specify all the models that it is
using. The Nix expression language is a domain specific language designed
for describing build actions, but not really suitable for implementing deploy-
ment planning algorithms, which often use arithmetic operations which are
poorly supported in the Nix expression language. Therefore, Disnix provides
an interface for integrating external algorithms as separate tooling (because
the Nix expression language is limited for this purpose) and also caches the
result in the Nix store like software components (i.e. if the same algorithm is
executed with the same parameters the result from the Nix store is immedi-
ately returned instead of performing the algorithm again).
This interface basically converts pieces of the services and infrastructure
models into XML. These XML representations can then be transformed into
another model (e.g. by using XSLT) and used by an external tool imple-
menting a particular division algorithm. The output of the process is then
transformed into a Nix expression representing a new distribution model and
used by other functions defined in the quality of service model. A notable
property of the interface is that it also uses Nix to invoke the external division
algorithm the same way as it builds software components. This offers us the
advantage that the result of the division algorithm is stored in the Nix store
(just as ordinary components) and that the result is cached.
A one-dimensional division method
We have implemented a number of division methods. The most trivial method
is a one-dimensional division generator, which uses a single variable from the
services and infrastructure models and has three strategies. The greedy strat-
egy iterates for each service over the possible candidate hosts and picks the
first target until it is out of system resources, then moves over to the next un-
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 111
til all services are mapped. The highest-bidder strategy picks for each iteration
the target machine with the most system resources available. Similarly, the
lowest-bidder strategy picks the target with the least system resources capable
of running the service.
A cost effective division method
Additionally, we have support for a number of more complex algorithms,
such as an approximation algorithm of the minimum set cover problem [Hey-
darnoori et al., 2006]. This algorithm can be used to approximate the most
effective cost deployment scenario when every server has a fixed price, no
matter how many services it will host.
A reliable division method
Another complex is algorithm is an approximation algorithm for the Multi-
way Cut problem [Heydarnoori and Mavaddat, 2006], which can be used to
approximate a deployment scenario in which the number of network links
between services is minimised to make a system more reliable.
A division method for reliability testing
For system integration testing, it may be useful to apply an approximation
algorithm for vertex coloring, in which every node represents a service and
every color a machine in the infrastructure model. In the resulting distribu-
tion, every service has an inter-dependency relationship, which is a physical
network link to another machine. It can be interesting to terminate such net-
work connections and verify whether the system is capable of recovering from
these network failures. The Dynamic Disnix framework implements the basic
approximation algorithm described in [Brélaz, 1979] for this purpose.
5.6 E VA L U AT I O N
We have evaluated the case studies described in Section 5.2 in a cluster of
Xen virtual machines, each having the same amount of system resources. For
each case study, we have implemented a QoS model to deal with desirable
constraints, and we have used a separate division algorithm for each case.
We first performed an initial deployment in which the system is deployed
in a small network of machines. Then we trigger various events, such as
adding a server with specific capabilities or removing a server from the net-
work. For each event triggering a redeployment, we check whether the fea-
tures of the system are still accessible and measure the times of discovering
the network state, generating the distribution model, building the system,
distributing the services and transiting to (activating) the new configuration.
Table 5.1 summarises the results of the evaluation for the four systems
described in Section 5.2.
112
5.6.1 StaffTracker (PHP/MySQL version)
The first is the StaffTracker (PHP/MySQL version). The size of all components
and dependencies of this system is 120 KiB. The quality-of-service model that
we defined for this system maps services of a specific type to the machines
supporting them (e.g. MySQL database to servers running a MySQL DBMS).
Moreover, the database storing the staff members is marked as stateful and
must not be relocated in case of a redeployment. The highest bidder strategy
is used to divide services over the candidate hosts.
Initially, the system is deployed in a network consisting of an Apache web
server machine and a MySQL machine, which takes quite some time. Then
we fire two events in which we add a machine to the network. Since Disnix
can build and deploy components safely next to other components, this gives
us no downtime. Only the transition phase (which only takes about 1.3-1.7
seconds) introduces some inconsistency, which is very short. Finally, we fire
two remove events. In this case, a new deployment is calculated and the
system configuration is adapted. During these events downtimes are a bit
longer (about 15 seconds in total), because the system must be partially rebuilt
and reconfigured, during which time those services are not available.
5.6.2 StaffTracker (Web services version)
Next is the SOA variant of the StaffTracker. The size of all components and
dependencies of this variant is 94 MiB (which is significantly larger than the
PHP version). We have used the same candidate host selection strategy as the
previous example and the minimum set cover approximation algorithm for
dividing services over the candidate hosts.
Similar results are observable compared to the previous example. The ini-
tial deployment also takes a significant amount of time. Events in which ma-
chines are added only take a couple of seconds. The failure events in this sce-
nario are very cheap and only take about 8 seconds of redeployment time (in-
cluding building, transferring and activating). This event is so cheap because
Disnix does not remove already deployed components from machines, unless
they are explicitly garbage collected. Because after these failure events the
system is essentially the same as in a previous configuration, we can switch
back nearly instantly.
5.6.3 ViewVC
In the ViewVC case, the size of all components and dependencies of the sys-
tem is 360 MiB. We used the same candidate host mapping method as in
the previous examples. We deployed ViewVC and its dependencies into a
network consisting of an Apache web server machine, a MySQL server ma-
chine and a Subversion machine. All Subversion repositories were marked as
stateful, so that they were not relocated in case of an event. For dividing the
services we have used the greedy algorithm to use the maximum amount of
diskspace of a server.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 113
As the table shows, an add event only introduces a couple of seconds of
downtime. Event 3 takes a little longer, because a new Subversion repository
must be copied and transferred to a new Subversion server. Event 4 and 5
trigger a redeployment of the web application front-end and MySQL cache.
5.6.4 SDS2
The final evaluation subject is SDS2. For SDS2 we used, apart from the same
candidate host mapping criteria as the previous examples, a strategy that de-
ploys several services into a hospital environment and others in a Philips en-
vironment. The size of all components and dependencies of SDS2 is 599 MiB.
The division method was the multiway cut approximation algorithm, which
tries to find a distribution with a minimum number of network links between
services. For the initial deployment, we used 5 machines: 2 Apache Tomcat
servers (in the Philips and hospital environments), 2 MySQL ser vers (in the
Philips and hospital environments) and 1 Ejabberd server (in the hospital en-
vironment). The results of the deployment are comparable to the previous
examples, e.g., a long initial deployment time and a very short redeployment
time in case of a newly added server. The notable exception is the long re-
deployment times in case of a failing server. In these cases SDS2 had to be
completely rebuilt from source (apart for some shared library dependencies).
This is due to the fact that the SDS2 developers included a global configura-
tion file containing the locations of all services with each service. In case of
a change in the distribution, all services had to be rebuilt, repackaged and
redeployed on all machines in the network. This problem could have been
avoided by specifying small configuration files for each service which only
capture settings of its inter-dependencies.
5.6.5 General observations
Another notable general observation is the generation time for the initial de-
ployment step. These values were always close to 28 seconds for all case stud-
ies and much smaller during the redeployment steps. This is due to the fact
that these algorithms are invoked by the same build tooling used for Disnix,
which will first download some required dependencies in order to execute
the algorithms.
5.7 D I S C U S S I O N
We have applied Disnix to several use cases and have shown that we can
efficiently and reliably redeploy a system in case of an event dealing with
desirable non-functional properties. However, there are some limitations in
using Disnix for self-adaptive deployment of distributed systems.
The first issue is applicability. Disnix is designed for the deployment of
service-oriented systems, that is, systems composed of distributable compo-
nents, which are preferably small and relocatable. This requires developers
114
to properly divide a system into distributable components. Some distributed
systems are composed of components that are so tightly integrated into the
environment, that it is infeasible to relocate the component in case of an event.
For instance, the system may require a restart or a complete adaptation of the
system configuration. Another issue is that some distributable components
may completely fail after a disconnection without recovering afterwards. Dis-
nix does not detect this behaviour. Moreover, Disnix uses a centralised ap-
proach, which means that components must be distributed and activated from
a central location, and the complete system must be captured in the models
used by Disnix. This is not possible for all types of distributed systems.
Disnix has only limited support for dealing with mutable state, such as
databases, Subversion repositories, caches and log files. For services such as
databases and Subversion repositories, Disnix can initialize a database schema
and initial data at startup time, but Disnix does not automatically migrate the
changes in the data made afterwards.
There are a couple of solutions to deal with this issue. A developer or
system administrator may want to keep a database on the same machine af-
ter the initial deployment. (This is why we provide the mapStatefulToPrevious
function in Section 5.5.) Another solution is to create a cluster of databases
that replicate the contents of the master database. In Chapter 9 we describe
a generic approach to deal with mutable state of services, which snapshots
mutable state and migrates it automatically as required.
Disnix currently only supports simple network topologies, in which every
target machine is directly reachable. There is also no method to model com-
plex network topologies yet. This prevents Disnix from using advanced de-
ployment planning algorithms, such as Avala [Mikic-Rakic et al., 2005], that
improve availability and take the network topology into account.
Furthermore, the contribution of our framework lies mainly in the mech-
anisms used for redeployment and the fact that component are treated as
“black boxes”. It would also be useful to integrate more sophisticated moni-
toring facilities and control loop implementations [Oreizy et al., 1998].
5.8 R E L AT E D W O R K
There is a substantial amount of work on self-adaptation in distributed en-
vironments. According to Cheng et al. [2008] the challenges of engineering
self-adaptive software systems can be divided into four categories: modelling
dimensions (goals, mechanisms and effects), requirements, engineering and
assurances. Our approach mainly belongs to the first category, since our
framework’s main purpose is to provide facilities to perform redeployment
in case of an event.
Kramer and Magee [2007] present a three-layer reference model based on
existing research work in which to articulate some outstanding research chal-
lenges in self managed systems. The three layers are defined as goal manage-
ment, change management and component control. Our framework mainly
contributes to the change management layer and a bit to the component con-
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 115
trol layer, although our monitoring and control-loop aspects are relatively
primitive.
Several comparable approaches to ours involve specialized middleware.
For instance, [Caseau, 2005] describes using self-adaptive middleware to pro-
pose routing strategies of messages between components to meet require-
ments in a service-level agreement. In [Guennoun et al., 2008] a framework
of models for QoS adaptation is described that operates at the middleware
and transport levels of an application. In [Wichadakul and Nahrstedt, 2002]
the authors propose a translation system to develop QoS-aware applications,
which for instance support CPU scheduling.
Another approach is to reconfigure previously deployed components, such
as by changing the behaviour of components (e.g. [Camara et al., 2009]) or
their connections (such as [Chan and Bishop, 2009; Baresi and Pasquale, 2010]
for adapting compositions of web services).
Some approaches involving reconfiguration and redeployment are designed
for a specific component technology, such as [Oreizy et al., 1998] providing
an explicit architectural model and imperative language for modifying archi-
tectures consisting of C2 style components implemented in the Java program-
ming language. Moreover, the approach uses more primitive deployment fa-
cilities, e.g. it cannot automatically install components and their dependencies
on demand and without interference.
Compared to the previous approaches, Disnix is a more general approach
and does not require components of a distributed system to use a particular
component technology or middleware. Moreover, Disnix does not adapt com-
ponent behaviour or connections at runtime. Instead, it generates or builds
a new component and redeploys it. In some cases this takes more effort, but
the advantage is that Disnix tries to minimize this amount of effort and that
this approach can be used with various types of components regardless of
component technology. Moreover, if a crucial component breaks, the system
is redeployed to fix the system, which reconfiguration approaches cannot al-
ways do properly. In some cases runtime adaptation may still be required,
which should be managed by other means, such as by implementing such
properties in the services themselves.
Some aspects of the present work are suggested in [Deb et al., 2008], such
as an application model describing components and connections, a network
model, and utility functions to perform a mapping of components to ma-
chines. Although the authors assess the usefulness of some utility function
approaches, they do not perform actual deployment of a system, nor describe
a framework that reacts to events.
Avala [Mikic-Rakic et al., 2005] is an approximation algorithm to improve
availability through redeployment. The authors of the paper mainly assess the
performance of the algorithm, but not the deployment mechanisms. DICOM-
PLOY [Rigole and Berbers, 2006] is a decentralised deployment tool for com-
ponents using the DRACO component model and uses Collaborative Rein-
forcement Learning (CRL) in the design of the component deployment frame-
work. Disnix uses a centralised approach compared to DICOMPLOY. In some
116
cases DICOMPLOY may be more efficient in redeploying a system, but Disnix
is more general in that it supports various types of components and various
strategies for mapping components to machines.
Garlan et al. [2004] describe an architecture-based self-adaptation approach
having the ability to handle a wide variety of systems. A drawback of their
approach is that its assumed that the system provides access hooks for moni-
toring and adapting and that probes are availble providing information about
system properties. Besides our deployment mechanisms, we do not assume
that we can “hook” into already deployed components to alter their behaviour
or configuration. Instead, we redeploy components entirely (with a cache that
prevents us doing steps that have already been done before).
5.9 C O N C L U S I O N
We have shown a dynamic deployment extension built on top of Disnix, which
we have applied to several use cases. This deployment extension efficiently
and reliably redeploys a system in case of an event in the network (such as
a crashing machine) to keep the overall system working and to ensure that
desired non-functional requirements are met.
In the future we intend to improve this deployment system to properly
deal with mutable state of services, so that associated data with a service is
automatically migrated in case of a redeployment. Moreover, Disnix has to
be extended to support complex network topologies, so that more sophisti-
cated deployment algorithms can be supported and we can integrate more
sophisticated monitoring facilities and control loops.
Chapter 5. Self-Adaptive Deployment of Service-Oriented Systems 117
StaffTracker (PHP/MySQL)
State Description Discovery (s) Generation (s) Build (s) Distribution (s) Transition (s)
0 Initial deployment 1.0 27.9 31.9 1.2 1.6
1 Add MySQL server 2 1.0 8.3 4.7 1.7 1.7
2 Add MySQL server 3 1.0 4.4 4.5 1.2 1.3
3 Add Apache server 2 1.0 4.4 4.4 1.3 1.5
4 Remove MySQL server 2 1.0 8.2 4.6 1.5 1.8
5 Remove Apache server 1 1.0 8.8 4.4 0.9 1.2
StaffTracker (Web services)
0 Initial deployment 1.0 28.6 169.8 13.1 4.0
1 Add MySQL server $500 1.0 10.7 17.7 6.1 2.9
2 Add Tomcat server $500 1.0 8.4 10.2 13.7 5.7
3 Remove Tomcat server $500 1.0 4.2 0.5 0.6 2.0
4
Remove MySQL server $500 1.0 4.2 0.5 1.2 1.8
ViewVC
0 Initial deployment 1.0 28.5 555.9 9.5 17.5
1
Add MySQL server 1.0 8.2 0.6 0.6 1.1
2 Add Apache server 1.0 8.4 0.5 0.6 1.0
3 Add Subversion server + repo 1.0 8.4 6.3 3.7 8.2
4
Remove Apache server 1 1.0 8.4 4.6 7.5 1.9
5 Remove MySQL server 1 1.0 8.3 5.7 1.9 1.8
SDS2
0 Initial deployment 1.0 28.6 642.9 122.8 147.7
1 Add MySQL server 2 (hospital) 1.0 6.7 0.6 19.1 1.9
2 Add Tomcat server 2 (Philips) 1.0 8.4 0.6 18.5 1.8
3 Remove Tomcat server 1 (Philips) 1.0 8.3 360.2 26.9 6.3
4 Remove MySQL server 1 (hospital) 1.2 10.5 370.4 120.8 124.3
Table 5.1 Evaluation times of the example cases
118
Part III
Infrastructure Deployment
119
6
Distributed Infrastructure Deployment
A B S T R A C T
In addition to the deployment of service components, also underlying sys-
tem configurations of software systems must be deployed to either physical
machines or virtual machines hosted in an IaaS environment. As with service
deployment, developers and system administrators use automated tools. Con-
ventional infrastructure deployment tools often use convergent models. These
tools have various drawbacks, such as the fact that it is difficult to guarantee
that a model yields the same configuration on a different system, so that the
same configuration can be reproduced, and imperative actions which cannot
be trivially undone. In this chapter, we describe DisnixOS a component pro-
viding complementary infrastructure deployment to Disnix. DisnixOS uses
charon a network deployment tool built on top of NixOS.
6.1 I N T R O D U C T I O N
In the previous part of this thesis, we have covered service deployment as-
pects. In order to be able to properly use service components, we also require
the presence of the right infrastructure components. For example, machines
in the network must have the proper system resources and system services,
such as a database management system (DBMS) or an application server and
these must also be deployed by some means. Additionally, infrastructure
components must also satisfy various non-functional requirements, such as
performance and privacy concerns.
As with service deployment, infrastructure deployment is also a complex
and labourious process. For example, in order to configure a single machine
in a network, an operating system and the right packages must be installed, all
required system services must be enabled, user accounts must be created and
configuration files must be created or adapted having all required settings.
This complexity is proportional to the number of machines in the network.
In addition to generic deployment steps, such as installing system ser-
vices, also machine-specific deployment steps have to be performed, such
as installing the right driver packages, configuring network interfaces having
the right IP addresses, and hard drive partitions. Also certain infrastructure
components may have to be “tuned” to properly use the available system re-
sources to improve performance, such as the amount of RAM the DBMS may
utilise.
System configurations can also be deployed in various environments. For
example, next to physical machines, people nowadays also use virtual ma-
chines, which are typically hosted in a data-center of an IaaS provider, such
121
as Amazon EC2. Virtual machines in the cloud usually provide simpler hard-
ware management and lower costs. Apart from installing new system con-
figurations, also system configurations have to be upgraded, which is even
more difficult and may break a configuration. All these aspects introduce a
significant amount of complexity to the deployment process of networks of
machines.
In addition to the fact that a deployment process is complicated, it is also
difficult to reproduce a configuration of a network of machines in a different
environment. For example, it is not trivial to move a networked system con-
figuration from physical machines to an IaaS provider, or to move from one
IaaS provider to another and to be sure that the migrated configuration has
exactly the same properties.
To manage the complexity of deployment processes, developers and system
administrators typically use various automated deployment solutions, such
as Cfengine [Burgess, 1995], Puppet [Puppet Labs, 2011] or Chef [Opscode,
2011]. These tools make it possible to specify how to configure a network
of machines in a declarative language, which can be used to perform the
required deployment steps automatically.
However, these conventional deployment tools use convergent models cap-
turing what modifications steps should be performed on machines in the net-
work according to a policy specification. Convergent models have a number
of drawbacks. For example, it is harder to write models that always yield
the same configuration regardless to what machine it is deployed. Further-
more, these tools also imperatively modify systems, which may not always
give the same result and these changes are not always trivial to undo. While
upgrading, there is also a time window in which the machine configurations
are inconsistent.
In this chapter, we describe DisnixOS, a component providing complemen-
tary infrastructure deployment to Disnix in addition to service deployment.
Instead of capturing how system configurations should be modified, it cap-
tures various aspects that constitutes a system (such as the system services)
and can separate logical properties of systems from physical characteristics.
From these specifications, a complete system configuration for a specific tar-
get is produced, consisting of all packages, configuration files and activation
scripts required to deploy the configuration.
Apart from mutable state, the actual configuration always matches the in-
tended configuration defined in the deployment specification, which can be
reproduced anywhere without side effects. Moreover, due to the purely func-
tional nature of the underlying deployment tooling, we can also reliably up-
grade networks of machines. DisnixOS uses charon [Dolstra et al., 2013] a
network deployment tool built on top of NixOS for infrastructure deployment
and provisioning of virtual machines in an IaaS environment, such as Amazon
EC2.
122
6.2 C O N V E R G E N T D E P L O Y M E N T: C F E N G I N E
The best known deployment tool for management of system configurations
is probably Cfengine [Burgess, 1995]. The purpose of Cfeninge is to provide
automated configuration and maintenance of machines in a network from a
policy specification. From this specification, Cfengine performs a number of
imperative deployment actions, such as copying files, modifying file contents,
creating symlinks and invoking arbitrary tools in a network of machines (or a
subset belonging to a particular class).
Figure 6.1 shows a partial Cfengine model for configuring an Apache web
server:
96 This is a template generating a common bundle defining what bundles
must be processed in what order. In our example, we only process the
web_server bundle.
98 A bundle captures a specific system configuration aspect. The web_-
ser ver bundle is a promise bundle (because the agent type is used) and
takes care of various aspects of the Apache web server, such as starting
and stopping.
99 Identifiers followed by a colon declare sections within a bundle. Here, a
vars section is defined, assigning values to variable identifiers.
100 The processes section manages processes. Here, we determine the status
of the apache2 process. If the user has requested that the server must
be (re)started and if the Apache server is properly configured, Cfengine
executes the start_apache command (defined later in this model). If the
user has requested that the server must be stopped, it invokes the stop
step of the Apache init.d script.
101 An identifier followed by a double colon defines a class identifier. Classes
are true or false design decisions, which can be either hard classes, us-
ing properties discovered at runtime, such as a time stamps, hostnames,
operating system names and process lists or soft classes, which are user
defined combinations of other classes or results of promises.
102 The body of a section contains promises, which describe a state in which
we want to end up. Cfengine tries to converge to that given state. Promises
can capture various properties, such as the contents of a file of a result
from a tool invocation. Furthermore, state descriptions can also be pro-
duced by other bundles. This particular promise checks whether Apache
has to be (re)started and yields a true decision for the start_apache class,
if this is the case.
103 The promise in the commands section intends to start the Apache web
server. The class decision has been made earlier in
102
.
Chapter 6. Distributed Infrastructure Deployment 123
body common control { 96
bundlesequence => { "web_server" }; 97
}
bundle agent web_server(state) { 98
vars: 99
"document_root" string => "/var/www";
"site_http_conf" string => "/home/sander/httpd.conf";
"match_package" slist =>
{ "apache2", "apache2-mpm-prefork", "apache2-proxy-balancer" };
processes: 100
web_ok.on:: 101
"apache2" restart_class => "start_apache"; 102
off::
"apache2" process_stop => "/etc/init.d/apache2 stop";
commands:
103
start_apache:: "/etc/init.d/apache2 start";
packages: 104
"$(match_package)"
package_policy => "add",
package_method => rpm,
classes => if_ok("software_ok");
files: 105
software_ok::
"/etc/sysconfig/apache2"
edit_line => fixapache,
classes => if_ok("web_ok");
}
bundle edit_line fixapache { 106
vars:
"add_modules" slist => { "jk", "proxy_balancer" };
"del_modules" slist => { "php4", "php5" };
insert_lines:
"APACHE_CONF_INCLUDE_FILES="$(web_server.site_http_conf)";
field_edits:
"APACHE_MODULES=.
*
"
edit_field => quotedvar("$(add_modules)","append");
"APACHE_MODULES=.
*
"
edit_field => quotedvar("$(del_modules)","delete");
}
Figure 6.1 A Cfengine code fragment to configure an Apache web server
104 The promise in the packages section checks whether several required
RPM packages are present, such as the Apache web server and several
Apache modules. If these are not present, it invocates the RPM package
manager [Foster-Johnson, 2003] on the target machine, to install them.
If the installation succeeds and all packages are present, Cfengine yields
a true decision for the software_ok class.
105 The promise in the files section checks whether the Apache system con-
figuration file contains a number of required settings and updates them
124
if they are not there. This promise is only executed if all the required
Apache software packages are present, which should be achieved by the
previous promise 104 . If this process has achieved the intended result,
it yields a true decision for the web_ok class. The actual modification of
the configuration file is defined in the fixapache bundle, shown later in
this model.
106 This bundle contains promise body parts, which are used to modify the
Apache system configuration file. In our example, this bundle adds a
reference in the system configuration file to the web server configuration
file, removes a number of Apache modules from the Apache configura-
tion: (php4 and php5) and it adds several Apache modules: mod_jk and
mod_proxy_balancer.
6.2.1 Similarities
Cfengine models are declarative models in the sense that a developer or system
administrator write promises, which describe the desired state that must be
achieved, instead of the steps that have to be performed. From these collec-
tions of promises, Cfengine tries to converge [Traugott and Brown, 2002] to the
described state. First, Cfengine determines whether it has already achieved
the intended state and if this is not the case it performs all the required
steps to change the state to come to the new desired state as closely as possi-
ble.
If we compare the Cfengine concepts to the goals of our reference archi-
tecture for distributed software deployment, it achieves a number of similar
goals:
They are designed to fully automate deployment tasks, reducing de-
ployment times and chances on human errors.
Both approaches intend to support integration of custom tools and to
work on various types of components and systems.
Making deployment processes efficient is also an important ingredient;
Cfengine always checks whether the intended state has been reached,
and only performs the steps that are necessary to reach it.
6.2.2 Differences
The convergent deployment approach also has a number of drawbacks com-
pared to our deployment vision:
Cfengine has several reliability weaknesses. One of the issues is that it
is not trivial to ensure dependency completeness. For example, it may
be possible to successfully deploy an Apache web server configuration,
while not specifying all the package dependencies on the Apache mod-
ules that the configuration requires. For example, in Figure 6.1, we have
Chapter 6. Distributed Infrastructure Deployment 125
defined that we want to use the mod_jk and mod_proxy_balancer Apache
modules, but we did not specify that we want to install the RPM pack-
ages providing these modules.
Deploying such a configuration may yield a working system on a sys-
tem which accidentally has the required installed packages installed and
yield an inconsistent configuration on a system which do not have these
packages installed. Developers and system administrators have to take
care that these dependencies are present, and there is no means to stati-
cally verify them.
Deployment steps are also performed imperatively, as files are overwrit-
ten or removed. Therefore, it is hard to ensure purity and to undo these
actions
1
. While these imperative steps are performed, a system is also
temporarily inconsistent, as packages and configuration files may be-
long to both an old configuration and a new configuration. During this
phase also system s ervices can be stopped and started, rendering parts
of a system unavailable.
It is also difficult to ensure reproducability. Because modifications depend
on the current state of the system, actions that result from a Cfengine
model may produce a different result on a different system, e.g. because
the initial contents of system configuration files are different. Also im-
plicit dependencies that are required, but not declared in the Cfengine
model, may make it difficult to reproduce a configuration. Further-
more, certain configuration properties are platform specific. For ex-
ample, RPM packages with the same name may not provide the same
functionality or compatibility on a different operating system or a newer
version of a Linux distribution.
Typically, because of these drawbacks, Cfengine users have to change a
Cfengine model and reconfigure networks of systems several times in (es-
pecially in heterogeneous networks), to see whether the desired results are
achieved and they must take extra care that all dependencies are present.
6.3 O T H E R D E P L O Y M E N T A P P R O A C H E S
Many deployment tools have been inspired by Cfengine providing extra and
improved features, such as Puppet [Puppet Labs, 2011] and Chef [Opscode,
2011]. An additional feature of Puppet is “transactional” upgrades, by trying
to simulate the deployment actions first (e.g. by checking whether a file can be
actually copied or adapted) before performing the actual deployment process.
This technique is similar to [Cicchetti et al., 2009] used for making maintainer
scripts of RPM and Debian packages transactional. Although simulation re-
duces the risks on errors, interruption of the actual deployment process may
still render a system configuration inconsistent.
1
Of course, Cfengine can also be instructed to make a backup, but these practices are uncom-
mon and require extra specification that make these models bigger and more complicated.
126
Apart from Cfengine-based tools, there are also other tools used for dis-
tributed deployment. MLN [Begnum, 2006] is a tool for managing large net-
works of virtual machines and has a declarative language to specify arbitrary
topologies. It does not manage the contents of VMs beyond a templating
mechanism.
ConfSolve [Hewson et al., 2012] is a tool that can be used to model sys-
tem configuration aspects in a class-based Object-Oriented domain-specific
language. A model can be translated into a Constraint Satisfaction Problem
(CSP) to generate valid system configurations. ConfSolve’s approach is to
compose system configurations based on constraints, but not focused on the
mechanics needed to reliably deploy such configurations.
There are also many distributed application component deployment tools
available, such as the S oftware Dock [Hall et al., 1999] and Disnix [van der
Burg and Dolstra, 2010a]. These tools manage the deployment of applica-
tion components (e.g. web services) but not their underlying infrastructure
components and their settings. These tools are actually related to aspects
described in the previous part of this thesis, covering service deployment as-
pects.
6.4 C O N G R U E N T D E P L O Y M E N T
In the previous section, we have explained how conventional tools with con-
vergent models work and what their drawbacks are in reaching our deploy-
ment vision supporting reliable, reproducible and efficient deployment.
In this section, we explain how we have implemented an approach using
congruent models. We have achieved this goal by expanding upon earlier
work: NixOS, described in Section 2.3 of this thesis.
As we have seen, NixOS manages complete system configurations, includ-
ing all packages, boot scripts and configuration files, from a single declarative
specification. Furthermore, because NixOS uses Nix, it has unique features
such as the ability to store static components safely in isolation from each
other, so that upgrades are reliable.
6.4.1 Specifying a network configuration
In order to achieve congruent deployment, we have to perform our deploy-
ment process in an unconventional manner and we have to specify our in-
tended configurations differently.
First, we have to express what the properties of the systems in the network
are, instead of what changes we want to carry out. The models used by Dis-
nixOS and charon are declarative about the composition of system components
whereas the models used by Cfengine (and derivatives) are declarative about
the intended result of imperative actions. From our specification, which are
declarative about the system properties, DisnixOS decides what components
must be deployed and which can be remain untouched.
Chapter 6. Distributed Infrastructure Deployment 127
{ 107
storage = 108
{pkgs, config, nodes, ...}:
{
services.portmap.enable = true;
services.nfs.server.enable = true;
services.nfs.server.exports = ’’
/repos
*
(rw,no_root_squash)
’’;
};
postgresql = 109
{pkgs, config, nodes, ...}:
{
services.postgresql.enable = true;
services.postgresql.enableTCPIP = true;
};
webserver =
110
{pkgs, config, nodes, ...}:
{
fileSystems = pkgs.lib.mkOverride 50
[ { mountPoint = "/repos";
device = "
${nodes.storage.networking.hostName}:/repos";
111
fsType = "nfs";
options = "bootwait"; }
];
services.portmap.enable = true;
services.nfs.client.enable = true;
services.httpd.enable = true;
services.httpd.adminAddr = "root@localhost";
services.httpd.extraSubservices = [
{ serviceType = "trac"; }
];
environment.systemPackages = [ pkgs.pythonPackages.trac pkgs.subversion ];
};
}
Figure 6.2 network-logical.nix: A partial logical configuration of a Trac environment
Second, we have to separate logical and physical characteristics of the ma-
chines in the network to improve retargetability. Typically, when configuring
machines many aspects of the configuration are the same, such as installing
end-user packages and installing system services, regardless of the type of
machine we want to deploy. Some other aspects are specific to the system,
such as driver packages, network settings, hard drive partitions and machine
specific performance tunings.
Logical specification
Figure 6.2 shows a partial example of a congruent model defined in the
Nix expression language describing a logical configuration of a Trac environ-
ment
2
, a web application system for managing software development projects:
2
http://trac.edgewall.org
128
107 The model in Figure 6.2 represents an attribute set in the form: { name
1
= value
1
; name
2
= value
2
; ...; name
n
= value
n
; }, in which each attribute name
represents a machine in the network and each attribute value captures a
logical configuration aspect of the machine.
108 Every attribute value refers to a NixOS module, capturing the logical
configuration aspects of a system. As we have seen earlier in Sec-
tion 2.3.3, NixOS modules are functions taking several arguments. The
pkgs argument refers to Nixpkgs, a collection of over 2500 packages that
can be automatically deployed by Nix and the config argument refers to
an attribute set capturing all configuration properties of the system.
In a network specification, such as Figure 6.2, NixOS modules may also
take the nodes parameter which can be used to retrieve properties from
the configuration of any machine in the network. In the body of each
NixOS module, we define an attribute set capturing various configura-
tion aspects.
For the storage machine, we have specified that we want to enable a
NFS server and the RPC portmapper (which is required by NFS) system
services. Furthermore, we define the /repos directory as a NFS mount,
which purpose is to store Subversion repositories which must be re-
motely accessible.
109 For the postgresql machine we have defined that we want to enable the
PostgreSQL database server, which should also accept TCP/IP connec-
tions next to Unix domain socket connections.
110 The webserver machine is configured to have an Apache HTTP server
instance hosting the Trac web application. Furthermore, it also uses
a NFS client to access to Subversion repositories stored on the storage
machine.
111 This line specifies the NFS device containing the hostname of the storage
system and the repos directory containing Subversion repositories that
we want to access. The nodes parameter is used to retrieve the hostName
property from the storage system.
Physical specification
The specification in Figure 6.2 captures the logical configuration of a network
machines. All the properties in this model are usually the same for any type
of machine in a network to which we want to deploy the configuration.
In order to perform the complete deployment process of a network of ma-
chines, the specification shown in the previous section is not sufficient. Next
to logical properties, we also have to provide machine specific physical prop-
erties, such as network settings, hard drive partitions and boot loader settings,
which are typically different across different various types of machines.
Figure 6.3 shows an example of a physical network configuration. The
structure of this model is the same as the logical specification shown in Fig-
ure 6.2, except that it captures machine specific settings. For example, we
Chapter 6. Distributed Infrastructure Deployment 129
{
database =
{ pkgs, config, nodes, ...}:
{
boot.loader.grub = {
version = 2;
device = "/dev/sda";
};
boot.initrd.kernelModules = [ "ata_piix" "megaraid_sas" ];
fileSystems = [
{ mountPoint = "/"; device = "/dev/sda1"; }
];
nixpkgs.system = "i686-linux";
networking.hostName = "database.example.org";
services.disnix.enable = true;
...
};
storage = ...
webserver = ...
}
Figure 6.3 network-physical.nix: A partial physical network configuration for a Trac
environment
have defined that the MBR of the GRUB bootloader should be stored on the
first hard drive, that we need several kernel modules supporting a particular
RAID chipset, that the first partition of the first hard drive is the root parti-
tion, that the system is a 32-bit Intel Linux machine (i686-linux) and that the
host name of the machine is: database.example.org. As we have seen earlier in
Section 2.3, these attributes are all NixOS configuration properties.
Furthermore, in order to be able to use Disnix and to be able to active a
service component, we must enable the Disnix service. The corresponding
NixOS module ensures that all supported activation types are configured.
For example, if the Apache HTTP service is installed, then the Disnix NixOS
module triggers the installation of the apache-webapplication activation script.
Besides physical characteristics of machines, we can also annotate physical
models with IaaS provider properties, such as Amazon EC2 specific settings,
so that virtual machines can be automatically instantiated with the appropri-
ate system resources, such as the amount of CPU cores, persistence storage
space etc. The RELENG paper about Charon has more details on the instanti-
ation and provisioning of virtual machines [Dolstra et al., 2013].
6.5 I M P L E M E N TAT I O N O F T H E I N F R A S T R U C T U R E D E P L O Y-
M E N T P R O C E S S
In the previous section, we have described how to define various concerns of
networks of machines in separate models. DisnixOS uses charon to perform
the deployment of the system configurations. Apart from the fact that NixOS
configurations are merged using the NixOS module system and charon is ca-
130
Figure 6.4 Architecture of disnixos-env
pable of instantiating VMs in a cloud provider environment (and uploading
bootstrap NixOS disk images), the steps performed by charon are roughly the
same as Disnix (described earlier in Chapter 4), e.g. it also builds components
of entire system configurations first, then it transfers the closures and finally
it activates the system configurations. If any of the activation steps fail, a
roll back is perfor med. As entire system configurations (similar to services in
the previous chapter) are also Nix components, we can safely store them in
isolation from each other.
6.6 U S A G E
The usage of DisnixOS is similar to the basic Disnix toolset. The major differ-
ence is that instead of an infrastructure model, DisnixOS takes a collection of
network specifications capturing several machine configuration aspects of all
machines in the network. These are exactly the same specifications used by
charon. By running the following command:
$ disnixos-env -s services.nix -n network-logical.nix \
-n network-physical.nix -d distribution.nix
The complete deployment process is performed in which both the services of
which the system is composed and the system configurations of network of
machines are built from source code. Then the NixOS configurations in the
network are upgraded, and finally the service-oriented system is deployed the
same way the regular disnix-env command does.
6.7 A R C H I T E C T U R E
Figure 6.4 shows the architecture of the disnixos-env tool. First disnixos-env in-
vokes charon to deploy the network of machines defined in the network speci-
fications, each capturing a specific concern.
Chapter 6. Distributed Infrastructure Deployment 131
{
test1 =
{pkgs, config, nodes, ...}:
{
services.tomcat.enable = true;
nixpkgs.system = "x86_64-linux";
disnix.infrastructure.zone = "philips";
};
test2 =
{pkgs, config, nodes, ...}:
{
services.tomcat.enable = true;
services.mysql.enable = true;
services.mysql.rootPassword = ./mysqlpw;
services.mysql.initialScript = ./mysqlscript;
nixpkgs.system = "i686-linux";
disnix.infrastructure.zone = "hospital";
};
}
Figure 6.5 A NixOS network configuration
{
test1 = {
tomcatPort = 8080;
supportedTypes = [ "process" "wrapper" "tomcat-webapplication" ];
system = "x86_64-linux";
zone = "philips";
};
test2 = {
tomcatPort = 8080;
mysqlPort = 3306;
mysqlUsername = "root";
mysqlPassword = builtins.readFile ./mysqlpw;
supportedTypes = [ "process" "wrapper" "tomcat-webapplication" "mysql-database" ];
system = "i686-linux";
zone = "hospital";
};
}
Figure 6.6 A generated infrastructure expression
After charon has successfully deployed the system configurations, the disnixos-
geninfra tool is invoked, which purpose is to convert the given network speci-
fications into a Disnix infrastructure model.
The difference between the network specifications that charon uses and the
infrastructure model that Disnix uses, is that the charon models are explicit
models, which can be used to derive complete configurations, whereas the
Disnix infrastructure model is implicit; it reflects the infrastructure properties
as they should be, but the developer or system administrators have to make
sure that these properties are correct. Furthermore, the infrastructure model
only has to contain properties required for service deployment (not all system
properties) and can also be used to represent target machines deployed by
other means (whereas network models can only be used by NixOS).
132
To determine what infrastructure properties must be provided, various
NixOS configuration settings are examined. For example, if the
ser vices.tomcat.enable property is enabled (which enables Apache Tomcat), the
resulting infrastructure model contains the Apache Tomcat port number (tom-
catPort), as well as the tomcat-webapplication type in the supportedTypes list. Fur-
thermore, in order to provide non-functional properties to the generated in-
frastructure model, end-users can define custom attributes in the disnix.infrastructure
NixOS configuration attribute set. Figure 6.5 shows an example network con-
figuration and Figure 6.6 shows the resulting infrastructure model.
After the infrastructure model is generated, the regular disnix-env tool is
invoked, which takes care of deploying all services. Disnix takes the user
provided serv ices model and distribution models as well as the generated
infrastructure model as input parameters.
If any of these previous steps fail, a rollback is performed which restores
both the previous service deployment state as well as the previous infrastruc-
ture configuration.
6.8 E X P E R I E N C E
We have used DisnixOS for the deployment of a number of test subjects,
which we have captured in logical network models. In addition to the Dis-
nix examples described in Chapter 4, we have also deployed a collection of
WebDSL applications running on Java Servlet containers, such as Yellow-
grass (http://yellowgrass.org), Researchr (http://researchr.org)
and webdsl.org (http://webdsl.org), a Mediawiki environment (http:
//mediawiki.org) (in which we have defined a web ser ver hosting the Wiki
application and a server running a PostgreSQL server to store the Wiki con-
tents), a Trac environment (http://trac.edgewall.org), and a Hydra
build environment (http://hydra.nixos.org), a continuous build and
integration cluster consisting of 11 machines.
We were able to deploy the logical network configurations to various net-
works with different characteristics, such as a network consisting of physical
machines and QEMU and VirtualBox VMs as well as various IaaS provider
environments, such as Amazon EC2 and Linode. Furthermore, in all deploy-
ment scenarios, we never ran into an inconsistent state, due to missing de-
pendencies or a situation in which a configuration contains components from
an older system configuration and a newer system configuration at the same
time.
6.9 D I S C U S S I O N
In this chapter, we have explained how we achieve congruent deployment
with DisnixOS for static parts of a system. There are some desired properties
that we did not achieve.
As with Disnix for service deployment, DisnixOS and charon are able to
manage static parts of a system in isolation in the Nix store. However, the
Chapter 6. Distributed Infrastructure Deployment 133
underlying deployment model is not capable of dealing with mutable state,
such as databases. Currently, the only way we deal with state is by generating
self-initialising activation scripts, which (for example) import a schema on ini-
tial startup. If we migrate a deployed web application from one environment
to another, the data is not automatically migrated.
In order to deploy systems with DisnixOS or charon, all components of a
system must be captured in Nix expressions so that they can be built entirely
by Nix and isolated in the Nix store. They are not designed for adapting
configurations not managed by Nix, because we cannot safely manage com-
ponents deployed by other means in a purely functional manner.
6.10 L I M I TAT I O N S
DisnixOS is built on top of NixOS, which is a Linux based system. For many
operating systems (especially proprietary ones), system services are so tightly
integrated with the operating system that they cannot be managed through
Nix (upon which Disnix is based). The DisnixOS extension is not suitable
for deployment of networks using other types of Linux distributions or other
operating systems, such as FreeBSD
3
or Windows, because they have system
configurations, which cannot be completely managed by Nix.
On proprietary infrastructure, such as Windows, the basic Disnix toolset
can still be used for deploying service components, but infrastructure compo-
nents have to be deployed by other means, e.g. manually or by using other
deployment tools.
6.11 C O N C L U S I O N
In this chapter, we have explained how distributed deployment tools with
convergent models work and we have explained DisnixOS providing comple-
mentary infrastructure deployment to Disnix, which uses congruent models
that always match the intended configuration for static parts of a system. We
have used this tool to deploy various system configurations in various envi-
ronments, such as the Amazon EC2 cloud.
3
As FreeBSD is free and open source, it may also be possible to implement a FreeBSD-based
NixOS, however nobody has done this so far.
134
7
Automating System Integration Tests for
Distributed Systems
A B S T R A C T
Automated regression test suites are an essential software engineering prac-
tice: they provide developers with rapid feedback on the impact of changes to
a system’s source code. The inclusion of a test case in an automated test suite
requires that the system’s build process can automatically provide all the en-
vironmental dependencies of the test. These are external elements necessary for
a test to succeed, such as shared libraries, running programs, and so on. For
some tests (e.g., a compiler’s), these requirements are simple to meet.
However, many kinds of tests, especially at the integration or system level,
have complex dependencies that are hard to provide automatically, such as
running database servers, administrative privileges, services on external ma-
chines or specific network topologies. As such dependencies make tests diffi-
cult to script, they are often only performed manually, if at all. This particu-
larly affects testing of distributed systems and system-level software.
In this chapter, we show how we can automatically instantiate the complex
environments necessary for tests by creating (networks of) virtual machines
on the fly from declarative specifications. Building on NixOS, a Linux distri-
bution with a declarative configuration model, these specifications concisely
model the required environmental dependencies. We also describe techniques
that allow efficient instantiation of VMs. As a result, complex system tests be-
come as easy to specify and execute as unit tests. We evaluate our approach
using a number of representative problems, including automated regression
testing of a Linux distribution.
Furthermore, we also show how to integrate these techniques with Disnix,
so that apart from deploying service-oriented systems in real networks, they
can also be continuously tested in a virtual environment.
7.1 I N T R O D U C T I O N
Automated regression test suites are an essential software engineering prac-
tice, as they provide developers with rapid feedback on the impact of changes
to a system’s source code. By integrating such tests in the build process of a
software project, developers can quickly determine whether a change breaks
some functionality. These tests are easy to realise for certain kinds of software:
for instance, a regression test for a compiler simply compiles a test case, runs
it, and verifies its output; similarly, with some scaffolding, a unit test typically
just calls some piece of code and checks its result.
135
However, other regression tests, particularly at the integration or system
level, are significantly harder to automate because they have complex require-
ments on the environment in which the tests execute. For instance, they might
require running database servers, administrative privileges, services on exter-
nal machines or specific network topologies. This is especially a concern for
distributed systems and system-level software. Consider for instance the fol-
lowing motivating examples used in this chapter:
OpenSSH is an implementation of the Secure Shell protocol that allows
users to securely log in on remote systems. One component is the pro-
gram sshd, the secure shell server daemon, which accepts connections
from the SSH client program ssh, handles authentication, starts a shell
under the requested user account, and so on. A test of the daemon must
run with super-user privileges on Unix (i.e. as root) because the daemon
must be able to change its user identity to that of the user logging in. It
also requires the existence of several user accounts.
Quake 3 Arena is a multiplayer first-person shooter video game. An auto-
mated regression test of the multiplayer functionality must start a server
and a client and verify that the client can connect to the server success-
fully. Thus, such a test requires multiple machines. In addition, the
clients (when running on Unix) require a running X11 server for their
graphical user interfaces.
Transmission is a Bittorrent client, managing downloads and uploads of
files using the peer-to-peer Bittorrent protocol. Peer-to-peer operation
requires that a running Bittorrent client is reachable by remote Bittor-
rent clients. This is not directly possible if a client resides behind a
router that performs Network Address Translation (NAT) to map inter-
nal IP addresses to a single external IP address. Transmission clients
automatically attempt to enable port forwarding in routers using the
UPnP-IGD (Universal Plug and Play Internet Gateway Device) proto-
col. A regression test for this feature thus requires a network topology
consisting of an IGD-enabled router, a client “behind” the router, and a
client on the outside. The test succeeds if the second client can connect
to the first through the router.
Thus, each of these tests requires special privileges, system services, exter-
nal machines, or network topologies. This makes them difficult to include in
an automated regression test suite: as these are typically started from (say) a
Makefile on a developer’s machine prior to checking in changes, or on a con-
tinuous build system, it is important that they are self-contained. That is, the
test suite should set up the complete environment that it requires. Without
this property, test environments must be set up manually. For instance, we
could manually set up a set of (virtual) machines to run the Transmission test
suite, but this is labourious and inflexible. As a result, such tests tend to be
done on an ad hoc, semi-manual basis. For example, many major system Unix
136
packages (e.g. OpenSSH, the Linux kernel or the Apache web server) do not
have automated test suites.
In this chapter, we describe an approach to make such tests as easy to
write and execute as conventional tests that do not have complex environ-
mental dependencies. This opens a whole class of automated regression tests
to developers. In this approach, a test declaratively specifies the environment
necessary for the test, such as machines and network topologies, along with
an imperative test script. From the specification of the environment, we can
then automatically build and instantiate virtual machines (VMs) that implement
the specification, and execute the test script in the VMs.
We achieve this goal by expanding on our previous work on NixOS. As we
have seen earlier in Section 2.3, in NixOS, the entire operating system sys-
tem packages such as the kernel, server packages such as Apache, end-user
packages such as Firefox, configuration files in /etc and boot scripts, and so
on is built from source from a specification in what is in essence a purely
functional “Makefile”. The fact that Nix builds from a purely functional spec-
ification means that configurations can easily be reproduced.
The latter aspect forms the basis for this chapter. In a normal NixOS sys-
tem, the configuration specification is used to build and activate a configura-
tion on the local machine. In Section 7.2, we show how these specifications can
be used to produce the environment necessar y for running a single-machine
test. We also describe techniques that allow space and time-efficient instanti-
ation of VMs. In Section 7.3, we extend this approach to networks of VMs. In
Section 7.4, we show how we can integrate the distributed testing techniques
with Disnix, to automatically test ser vice-oriented systems deployed by Dis-
nix. We discuss various aspects and applications of our work in Section 7.5,
including distributed code coverage analysis and continuous build systems.
We have applied our approach to several real-world scenarios; we quantify
this experience in Section 7.6.
7.2 S I N G L E - M A C H I N E T E S T S
NixOS system configurations allow developers to concisely specify the en-
vironment necessary for a integration or system test and instantiate virtual
machines from these, even if such tests require special privileges or running
system services. After all, inside a virtual machine, we can do any actions
that would be dangerous or not permitted on the host machine. We first ad-
dress single-machine tests; in the next section, we extend this to networks of
machines.
7.2.1 Specifying and running tests
Figure 7.1 shows an implementation of the OpenSSH regression test described
in Section 7.1:
113 The specification consists of two parts. The machine attribute refers to
a NixOS configuration, as we have seen in Section 2.3; a declarative
Chapter 7. Automating System Integration Tests for Distributed Systems 137
let openssh = stdenv.mkDerivation { ... };
in
makeTest { 112
machine =
{ config, pkgs, ... }: 113
{
users.extraUsers =
[
{ name = "sshd"; home = "/var/empty"; }
{ name = "bob"; home = "/home/bob"; }
];
};
testScript = ’’ 114
$machinesucceed( 115
"${openssh}/bin/ssh-keygen -f /etc/ssh/ssh_host_dsa_key",
"${openssh}/sbin/sshd -f /dev/null",
"mkdir -m 700 /root/.ssh /home/bob/.ssh",
"${openssh}/bin/ssh-keygen -f /root/.ssh/id_dsa",
"cp /root/.ssh/id_dsa.pub "/home/bob/.ssh/authorized_keys");
$machinewaitForOpenPort(22);
116
$machinesucceed("${openssh}/bin/ssh bob\@localhost ’echo \$USER’") eq "bob\n" or die;
’’;
}
Figure 7.1 openssh.nix: Specification of an OpenSSH regression test
specification of the machine in which the test is to be performed. The
machine specification is very simple: all that we need for the test beyond
a basic NixOS machine is the existence of two user accounts (sshd for the
SSH daemon’s privilege separation feature, and bob as a test account for
logging in).
114 The testScript attribute is an imperative test script implemented in Perl
performing operations in the virtual machine (the guest) using a number
of primitives. The OpenSSH test script creates an SSH host key (required
by the daemon to allow clients to verify that they are connecting to the
right machine), starts the daemon, creates a public/private key pair, add
the public key to Bob’s list of allowed keys, waits until the daemon is
ready to accept connections, and logs in as Bob using the private key.
Finally, it verifies that the SSH session did indeed log in as Bob and
signals failure otherwise.
115 The succeed primitive executes shell commands in the guest and aborts
the test if they fail.
116 waitForOpenPort waits until the guest is listening on the specified TCP
port.
112 The function makeTest applied to machine and testScript evaluates to two
attributes: vm, a derivation that builds a script to start a NixOS VM
matching the specification in machine; and test, a derivation that depends
on vm, runs its script to start the VM, and then executes testScript.
138
The following command performs the OpenSSH test:
$ nix-build openssh.nix -A test
That is, it builds the OpenSSH package as well as a complete NixOS instance
with its hundreds of dependencies. For interactive testing, a developer can
also do:
$ nix-build openssh.nix -A vm
$ ./result/bin/run-vm
(The call to nix-build leaves a symbolic link result to the output of the vm deriva-
tion in the Nix store.) This starts the virtual machine on the user’s desktop,
booting a NixOS instance with the specified functionality.
7.2.2 Implementation
Two important practical advantages of our approach are that the implemen-
tation of the VM requires no root privileges and is self-contained (openssh.nix
and the expressions that build NixOS completely describe how to build a VM
automatically). These properties allow such tests to be included in an auto-
mated regression test suite.
Virtual machines are built by the NixOS module qemu-vm.nix that defines
a configuration value system.build.vm, which is a derivation to build a shell
script that starts the NixOS system built by system.build.toplevel in a virtual ma-
chine. We use QEMU/KVM (http://www.linux-kvm.org/), a modified
version of the open source QEMU processor emulator that uses the hardware
virtualisation features of modern CPUs to run VMs at near-native speed. An
important feature of QEMU over most other VM implementations is that it
allows VM instances to be easily started and controlled from the command
line. This includes the fully automated starting, running and stopping of a
VM in a derivation. Furthermore, QEMU provides special support for boot-
ing Linux-based operating systems: it can directly boot from a kernel and
initial ramdisk image on the host filesystem, rather than requiring a full hard
disk image with that kernel installed. (The initial ramdisk in Linux is a small
filesystem image responsible for mounting the real root filesystem.) For in-
stance, the system.build.vm derivation generates essentially this script:
${pkgs.qemu_kvm}/bin/qemu-system-x86_64 -smb /
-kernel ${config.boot.kernelPackages.kernel}
-initrd ${config.system.build.initialRamdisk}
-append "init=${config.system.build.bootStage2}
systemConfig=${config.system.build.toplevel}"
The system.build.vm derivation does not build a virtual hard disk image for
the VM, as is usual. Rather, the initial ramdisk of the VM mounts the Nix
store of the host through the network filesystem CIFS (the -smb / option above);
QEMU automatically starts a CIFS server on the host to service requests from
the guest. This is a crucial feature: the set of dependencies of a system is
Chapter 7. Automating System Integration Tests for Distributed Systems 139
hundreds of megabytes in size at the least, so to build such an image ev-
ery time we reconfigure the VMs would be very wasteful in time and space.
Thus, rebuilding a VM after a configuration change usually takes only a few
seconds.
The VM start script does create an empty ext3 root filesystem for the guest
at startup, to hold mutable state such as the contents of /var or the system
account file /etc/passwd. Thanks to sparse allocation of blocks in the virtual disk
image, image creation takes a fraction of a second. NixOS’s boot process is
self-initialising: it initialises all state needed to run the system. For interactive
use, the filesystem is preserved across restarts of the VM, saved in the image
file ./hostname.qcow2.
QEMU provides virtualised network connectivity to VMs. The VM has
a network interface, eth0, that allows it to talk to the host. The guest has
IP address 10.0.2.15, QEMU’s virtual gateway to the host is 10.0.2.2, and the
CIFS server is 10.0.2.4. This feature is implemented entirely in user space: it
requires no root privileges.
The test script executes commands on the VM using a root shell running
on the VM that receives commands from the host through a TCP connection
between the VM and the host. (The remotely accessible root shell is provided
by a NixOS module added to the machine configuration by makeTest and does
not exist in normal use.) QEMU allows TCP ports in the guest network to be
forwarded to Unix domain sockets [Stevens and Rago, 2005] on the host. This
is important for security: we do not want anybody other than the test script
connecting to the port. Also, in continuous build environments any number
of builds may execute concurrently; the use of fixed TCP ports on the host
would preclude this.
7.3 D I S T R I B U T E D T E S T S
Many typical system tests are distributed: they require multiple machines
to execute. Therefore a natural extension of the declarative machine spec-
ifications in the previous section is to specify entire networks of machines,
including their topologies.
7.3.1 Specifying networks
Figure 7.2 shows a small automated test for Quake 3 Arena. As described in
Section 7.1, it verifies that clients can successfully join a game running on a
non-graphical server:
117 Here, makeTest is called with a nodes argument instead of a single ma-
chine. This is a set specifying all the machines in the network; each
attribute is a NixOS configuration module. This model has the same
definition as the logical network specifications described in Chapter 6.
118 In our example, we specify a network of two machines. The server ma-
chine automatically starts a Quake server daemon.
140
makeTest {
nodes = 117
{ server = 118
{ config, pkgs, ... }:
{
jobs.quake3Server =
{ startOn = "startup";
exec =
"${pkgs.quake3demo}/bin/quake3"
+ " +set dedicated 1"
+ " +set g_gametype 0"
+ " +map q3dm7 +addbot grunt"
+ " 2> /tmp/log";
};
};
client = 119
{ config, pkgs, ... }:
{
services.xserver.enable = true;
environment.systemPackages = [ pkgs.quake3demo ];
};
};
testScript = ’’ 120
startAll;
$serverwaitForJob("quake3-server");
$clientwaitForX;
$clientsucceed("quake3 +set name Foo +connect server &");
sleep 40;
$serversucceed("grep ’Foo.
*
entered the game’ /tmp/log");
$clientscreenshot("screen.png");
’’;
}
Figure 7.2 Specification of a Quake client/server regression test
119 The client machine runs an X11 graphical user interface and has the
Quake client installed, but otherwise does nothing.
120 The test script in the example first starts all machines and waits until
they are ready. This speeds up the test as it boots the machines in paral-
lel; otherwise they are only booted on demand, i.e., when the test script
performs an action on the machine. It then executes a command on the
client to start a graphical Quake client and connect to the server. Af-
ter a while, we verify on the server that the client did indeed connect.
The derivation will fail to build if this is not the case. Finally, we make
a screenshot of the client to allow visual inspection of the end state, if
desired.
The attribute test returned by makeTest evaluates to a derivation that exe-
cutes the VMs in a virtual network and runs the given test suite. The test
script uses additional test primitives, such as waitForJob (which waits until the
given system service has started successfully) and screenshot (which takes a
screenshot of the virtual display of the VM).
GUI testing is a notoriously difficult subject [Grechanik et al., 2009]. The
point here is not to make a contribution to GUI testing techniques per se, but
Chapter 7. Automating System Integration Tests for Distributed Systems 141
nodes = {
tracker = 121
{ config, pkgs, ... }:
{
environment.systemPackages = [ pkgs.transmission pkgs.bittorrent ];
services.httpd.enable = true;
services.httpd.documentRoot = "/tmp";
};
router = 122
{ config, pkgs, ... }:
{
environment.systemPackages = [ iptables miniupnpd ]; 123
virtualisation.vlans = [ 1 2 ];
};
client1 = 124
{ config, pkgs, nodes, ... }:
{
environment.systemPackages = [ transmission ];
virtualisation.vlans = [ 2 ];
networking.defaultGateway = nodes.router.config.networking.ifaces.eth2.ipAddress;
};
client2 = 125
{ config, pkgs, ... }:
{
environment.systemPackages = [ transmission ];
};
};
Figure 7.3 Network specification for the Transmission regression test
to show that we can easily set up the infrastructure needed for such tests. In
the test script, we can run any desired automated GUI testing tool.
The virtual machines can talk to each other because they are connected
together into a virtual network. Each VM has a network interface eth1 with
an IP address in the private range 192.168.1.n assigned in sequential order by
makeTest. (Recall that each VM also has a network interface eth0 to communi-
cate with the host.) QEMU propagates any packet sent on this interface to all
other VMs in the same virtual network. The machines are assigned hostnames
equal to the corresponding attribute name in the model, so the machine built
from the server attribute has hostname server.
7.3.2 Complex topologies
The Quake test has a trivial network topology: all machines are on the same
virtual network. The test of the port forwarding feature in the Transmis-
sion Bittorrent client (described in Section 7.1) requires a more complicated
topology: an “inside” network, representing a typical home network behind
a router, and an “outside” network, representing the Internet. The router
should be connected to both networks and provide Network Address Trans-
142
testScript = ’’
# Enable NAT on the router and start miniupnpd.
$routersucceed(
"iptables -t nat -F", ...
"miniupnpd -f ${miniupnpdConf}");
# Create the torrent and start the tracker.
$trackersucceed(
"cp
${file} /tmp/test",
"transmissioncli -n /tmp/test /tmp/test.torrent",
"bittorrent-tracker --port 6969 &");
$trackerwaitForOpenPort(6969);
# Start the initial seeder.
my $pid = $trackerbackground(
"transmissioncli /tmp/test.torrent -w /tmp");
# Download from the first (NATted) client.
$client1succeed("transmissioncli http://tracker/test.torrent -w /tmp &");
$client1waitForFile("/tmp/test");
# Bring down the initial seeder.
$trackersucceed("kill -9 $pid");
# Now download from the second client.
$client2succeed("transmissioncli http://tracker/test.torrent -w /tmp &");
$client2waitForFile("/tmp/test");
’’;
Figure 7.4 Test script for the Transmission regression test
lation (NAT) from the inside network to the outside. Machines on the inside
should not be directly reachable from the outside. Thus, we cannot do this
with a single virtual network. To support such scenarios, makeTest can create
an arbitrary number of virtual networks, and allows each machine specifica-
tion to declare in the option virtualisation.vlans to what networks they should be
connected.
Figure 7.3 shows the specification of the machines for the Transmission test.
It has two virtual networks, identified as 1 (the “outside” network) and 2 (the
“inside”):
122 The router machine is connected to both networks.
124 The client1 machine is connected to network 2.
125
The client2 machine is connected to network 1 (If virtualisation.vlans is omit-
ted, it defaults to 1.)
121 The tracker runs the Apache web server to make torrent files available
to the clients.
123 The configuration further specifies what packages should be installed on
what machines, e.g., the router needs the iptables and miniupnpd packages
for its NAT and UPnP-IGD functionality.
The test, shown in Figure 7.4, proceeds as follows. We first initialise NAT
on the router. We then create a torrent file on the tracker and start the tracker
Chapter 7. Automating System Integration Tests for Distributed Systems 143
program, a central Bittorrent component that keeps track of the clients that
are sharing a given file, on port 6969. Also on the tracker we start the initial
seeder, a client that provides the initial copy of the file so that other clients
can obtain it. We then start a download on the client behind the router and
wait until it finishes. If Transmission and miniupnpd work correctly in concert,
the router should now have opened a port forwarding that allows the second
client to connect to the first client. To verify that this is the case, we shut down
the initial seeder and start a download on the second client. This download
can only succeed if the first client is reachable through the NAT router.
Each virtual network is implemented as a separate QEMU network; thus a
VM cannot send packets to a network to which it is not connected. Machines
are assigned IP addresses 192.168.n.m, where n is the number of the network
and m is the number of the machine, and have Ethernet interfaces connected
to the requested networks. For example, the router will have interfaces eth1
with IP address 192.168.1.3 and eth2 with address 192.168.2.3, while the first
client will only have an interface eth1 with IP address 192.168.2.1. The test
infrastructure provides operations to simulate events such as network outages
or machine crashes.
7.4 S E RV I C E - O R I E N T E D T E S T S
The techniques described in the previous section provide a distributed test-
ing framework on infrastructure level. These testing facilities can be used in
conjunction with Disnix to automatically test service-oriented systems.
7.4.1 Writing test specifications
Figure 7.5 shows a specification that tests the web services implementation of
the StaffTracker system, described earlier in Chapter 5:
126 The local attribute src defines the location to the source tarball of the
StaffTracker system, containing the source code and deployment expres-
sions. This attribute is used in various function invocations in this spec-
ification.
127
The build attribute is bound to the disnixos.buildManifest function invoca-
tion, which builds the Disnix manifest file, using the given services,
network and distribution specifications included with the source code.
This command-line tool: disnix-manifest has the same purpose.
128 The tests attribute is bound to the disnixos.disnixTest function invocation,
which builds and launches a network of virtual machines and deploys
the StaffTracker system, built previously by the function bound to the build
attribute with Disnix. Furhermore, it performs a number of specified
testcases in this virtual network.
129 Likewise, the testScript specifies Perl code performing test cases in the
virtual network. Here, we first wait until the web application front-
144
{ nixpkgs ? <nixpkgs>, nixos ? <nixos> }:
let
pkgs = import nixpkgs { inherit system; };
disnixos = import "${pkgs.disnixos}/share/disnixos/testing.nix" {
inherit nixpkgs nixos;
};
src = /home/sander/WebServices-0.1.tar.gz; 126
in
rec {
build = disnixos.buildManifest { 127
name = "WebServices-0.1";
version = builtins.readFile ./version;
inherit src;
servicesFile = "deployment/DistributedDeployment/services.nix";
networkFile = "deployment/DistributedDeployment/network.nix";
distributionFile = "deployment/DistributedDeployment/distribution.nix";
};
tests = disnixos.disnixTest { 128
name = "WebServices-0.1";
inherit src;
manifest = build;
networkFile = "deployment/DistributedDeployment/network.nix";
testScript =
129
’’
# Wait until the front-end application is deployed
$test2waitForFile("/var/tomcat/webapps/StaffTracker/stafftable.jsp");
# Capture the output of the entry page
my $result =
$test3mustSucceed("curl --fail http://test2:8080/StaffTracker/stafftable.jsp");
# The entry page should contain my name :-)
if ($result =~ /Sander/) {
print "Entry page contains Sander!\n";
}
else {
die "Entry page should contain Sander!\n";
}
# Start Firefox and take a screenshot
$test3mustSucceed("firefox http://test2:8080/StaffTracker &");
$test3waitForWindow(qr/Firefox/);
$test3mustSucceed("sleep 30");
$test3screenshot("screen");
’’;
};
Figure 7.5 stafftracker.nix: A Disnix test expression implementing test cases for the
StaffTracker web services example
end is deployed, then we check whether the entry package is accessible
showing staff members of a university department and checks whether
my name is listed in it. Then Firefox is launched to make a screenshot
of the entry page.
By running the following command-line instruction, the test suite can be
executed non-interactively:
$ nix-build stafftracker.nix -A tests
Chapter 7. Automating System Integration Tests for Distributed Systems 145
Figure 7.6 Using disnixos-vm-env to deploy the StaffTracker in a network of three
virtual machines
Apart from running the tests from a command-line instruction, the expression
can also be executed in Hydra.
7.4.2 Deploying a virtual network
In addition to running non-interactive test suites, the DisnixOS tool suite also
provides the disnixos-vm-env tool. The usage of disnixos-vm-env is identical to
disnixos-env described in Chapter 6:
$ disnixos-vm-env -s services.nix -n network.nix \
-d distribution.nix
This command-line invocation generates a virtual network resembling the
machines defined in the network specification, then launches the virtual ma-
chines interactively and finally deploys the service-oriented system using Dis-
nix. Figure 7.6 shows what the result may look like if the system is deployed
in a virtual network consisting of three virtual machines.
Figure 7.7 shows the architecture of the disnixos-vm-env tool. First, the net-
work model is used by a NixOS tool called nixos-build-vms, which builds a net-
work of virtual machine configurations closely matching the configurations
described in the network model.
Then the disnixos-geninfra tool is used to convert a network model into an
infrastructure model, which the basic Disnix toolset understands. The main
difference between the infrastructure model and the network model is that the
infrastructure model is an implicit model: it should reflect the system config-
urations of the machines in the network, but the system administrator must
146
Figure 7.7 Architecture of the disnixos-vm-env tool
make sure that the model matches the actual configuration. Moreover, it only
has to capture attributes required for deploying the services. The network
model is an explicit model, because it describes the complete configuration of
a system and is used to deploy a complete system from this specification. If
building a system configuration from this specification succeeds, we know
that the specification matches the given configuration.
After generating the infrastructure model, the disnix-manifest tool is invoked,
which generates a manifest file for the given configuration. Finally, the nixos-
test-driver is started, which launches the virtual machines for testing.
7.5 D I S C U S S I O N
7.5.1 Declarative model
To what extent do we need the properties of Nix and NixOS, in particular the
fact that an entire operating system environment is built from source from a
specification in a single formalism, and the purely functional nature of the
Nix store? There are many tools to automate deployment of machines. For
instance, Red Hat’s Kickstart tool installs RPM-based Linux systems from a
textual specification and can be used to create virtual machines automatically,
with a single command-line invocation [Red Hat, Inc., 2009].
However, there are many limitations to such tools:
Having a single formalism that describes the construction of an entire
network from source makes hard things easy, such as building part of
the system with coverage analysis. In a tool such as Kickstart, the bi-
Chapter 7. Automating System Integration Tests for Distributed Systems 147
nary software packages are a given; we cannot easily modify the build
processes of those packages.
For testing in virtual machines, it is important that VMs can be built
efficiently. With Nix, this is the case because the VM can use the host’s
Nix store. With other package managers, that is not an option because
the host filesystem may not contain the (versions of) packages that a VM
needs. One would also need to be root to install packages on the host,
making any such approach undesirable for automated test suites.
For automatic testing, one needs a formalism to describe the desired
configurations. In NixOS this is already given: it is what users use to de-
scribe regular system configurations. In conventional Unix systems, the
configuration is a result of many “unmanaged” modifications to system
configuration files (e.g. in /etc). Thus, given an existing Unix system, it is
hard to distill the “logical” configuration of a system (i.e., specification
in ter ms of high-level requirements) from the multitude of configuration
files.
7.5.2 Operating system generality
The network specifications described in this chapter build upon NixOS: they
build NixOS operating system instances. This obviously limits the general-
ity of our current implementation: a test that must run on a Windows ma-
chine cannot be accommodated. In this sense, it shows an “ideal” situation,
in which entire networks of machines can be built from a purely functional
specification. Nixpkgs does contain functions to build virtual machines for
Linux distributions based on the RPM or Apt package managers, such as Fe-
dora and Ubuntu. These can be supported in network specifications, though
they would be harder to configure since they are not declarative. On the other
hand, platforms such as Windows that lack fine-grained package manage-
ment mechanisms are difficult to support in a useful manner. Such platforms
would require pre-configured virtual images, which are inflexible.
The Nix package manager itself is portable across a variety of operating
systems, and the generation of virtual machines works on any Linux host
machine (and probably other operating systems supported by QEMU). The
fact that the guest OS is NixOS is usually fine for automated regression test
suites, since many test cases do not care about the specific type of guest Linux
distribution.
7.5.3 Test tool generality
Our approach can support any fully non-interactive test tool that can build and
run on Linux. Since Nix derivations run non-interactively, tools that require
user intervention (e.g., interactive GUI testing tools) are not supported. Like-
wise, only systems with a fully automated build process are supported.
148
Figure 7.8 Part of the distributed code coverage analysis report for the Subversion
web service
7.5.4 Distributed coverage analysis
Declarative specifications of networks and associated test suites make it easy
to perform distributed code coverage analysis. Again, we make no contributions
to the technique of coverage analysis itself; we improve its deployability. First,
the abstraction facilities of the Nix expression language make it easy to specify
that parts of the dependency graph of a large system are to be compiled with
coverage instrumentation (or any other for m of build-time instrumentation
one might want to apply). Second, by collecting coverage data from every
machine in a test run of a virtual network, we get more complete coverage
information. Consider for instance, a typical configuration of the Subversion
revision control system: clients run the Subversion client software, while a
server runs a Subversion module plugged into the Apache web server to pro-
vide remote access to repositories through the WebDAV protocol. These are
both built from the Subversion code base. If a client performs a checkout from
a server, different paths in the Subversion code will be exercised on the client
than on the server. The coverage data on both machines should be combined
to get a full picture.
We can add coverage instrumentation to a package using the configuration
value nixpkgs.config.packageOverrides. This is a function that takes the original
contents of the Nix Packages collection as an argument, and returns a set of
replacement packages:
nixpkgs.config.packageOverrides = pkgs: {
subversion = pkgs.subversion.override {
stdenv = pkgs.addCoverageInstrumentation pkgs.stdenv;
};
};
The original Subversion package, pkgs.subversion, contains a function, override,
that allows the original dependencies of the package to be overriden. In this
case, we pass a modified version of the standard build environment (stdenv)
that automatically adds the flag --coverage to every invocation of the GNU C
Compiler. This causes GCC to instrument object code to collect coverage data
and write it to disk. Most C or C++-based packages can be instrumented in
this way, including the Linux kernel.
The test script automatically collects the coverage data from each machine
in the virtual network at the conclusion of the test, and writes it to $out. The
function makeReport then combines the coverage data from each virtual ma-
chine and uses the lcov tool [Larson et al., 2003] to make a set of HTML pages
Chapter 7. Automating System Integration Tests for Distributed Systems 149
showing a coverage report and each source file decorated with the line cover-
age. For example, we have built a regression test for the Subversion example
with coverage instrumentation on Apache, Subversion, Apr, Apr-util and the
Linux kernel. Figure 7.8 shows a small part of the distributed coverage anal-
ysis report resulting from the test suite run. The line and function coverage
statistics combine the coverage from each of the four machines in the network.
One application of distributed coverage analysis is to determine code cov-
erage of large systems, such as entire Linux distributions, on system-level tests
(rather than unit tests at the level of individual packages). This is useful for
system integrators, such as Linux distributors, as it reveals the extent to which
test suites exercise system features. For instance, the full version of the cov-
erage report in Figure 7.8 readily shows which kernel and Apache modules
are executed by the tests, often at a very specific level: e.g., the ext2 filesys-
tem does not get executed at all, while ext3 is used, except for its extended
attributes feature.
7.5.5 Continuous builds
The ability to build and execute a test with complex dependencies is very
valuable for continuous integration. A continuous integration tool (e.g. Cruise-
Control) continuously checks out the latest source code of a project, builds it,
runs tests, and produces a report [Fowler and Foemmel, 2005]. A problem
with the management of such tools is to ensure that all the dependencies
of the build and the test are available on the continuous build system (e.g., a
database server to test a web application). In the worst case, the administrator
of the continuous build machines must install such dependencies manually.
By contrast, the single command
$ nix-build subversion.nix -A report
causes Nix to build or download everything needed to produce coverage
report for the Subversion web service test: the Linux kernel, QEMU, the
C compiler, the C library, Apache, the coverage analysis tools, and so on.
This automation makes it easy to stick such tests in a continuous build sys-
tem. In fact, there is a Nix-based continuous build system, Hydra (http:
//hydra.nixos.org), that continuously checks out Nix expressions de-
scribing build tasks from a revision control systems, builds them, and makes
the output available through a web interface.
7.6 E VA L U AT I O N
We have created a number of tests
1
using the virtual machine-based testing
techniques described in this chapter. These are primarily used as regression
tests for NixOS: every time a NixOS developer commits a change to NixOS
1
The outputs of these tests can be found at http://hydra.nixos.org/jobset/nixos/
trunk/jobstatus. The Nix expressions are at https://svn.nixos.org/repos/nix/
nixos/trunk/tests.
150
or Nixpkgs, our continuous integration system rebuilds the tests, if necessary.
The tests are the following:
Several single-machine tests, e.g. the OpenSSH test, and a test for the
KDE Plasma desktop environment that builds a NixOS machine and
verifies that a user can successfully log into KDE and start several ap-
plications.
A two-machine test of an Apache-based Subversion service, which per-
forms HTTP requests from a client machine to create repositories and
user accounts on the server through the web interface, and executes Sub-
version commands to check out from and commit to repositories. It is
built with coverage instrumentation to perform a distributed coverage
analysis.
A four-machine test of Trac, a software project management service [Edge-
wall Software, 2009] involving a PostgreSQL database, an NFS file server,
a web server and a client.
A four-machine test of a load-balancing front-end (reverse proxy) Apache
server that sits in front of two back-end Apache servers, along with a
client machine. It uses test primitives that simulate network outages to
verify that the proxy continues to work correctly if one of the back-ends
stops responding.
A three-machine variant of the Quake 3 test in Figure 7.2.
The four-machine Transmission test in Figure 7.3.
Several tests of the NixOS installation CD. An ISO-9660 image of the
installation CD is generated and used to automatically install NixOS
on an empty virtual hard disk. The function that performs this test is
parametrised with test script fragments that partition and format the
hard disk. This allows many different installation scenarios (e.g., “XFS
on top of LVM2 on top of RAID 5 with a separate /boot partition”) to be
expressed concisely.
The installation test is a distributed test, because the NixOS installa-
tion CD is not self-contained: during installation, it downloads sources
and binaries for packages selected by the user from the Internet, mostly
from the NixOS distribution server at http://nixos.org/. Thus, the test con-
figuration contains a web server that simulates nixos.org by serving the
required files.
A three-machine test of NFS file locking semantics in the Linux kernel,
e.g., whether NFS locks are properly maintained across server crashes.
(This test is slow because the NFS protocol requires a 90-second grace
period after a server restart.)
Chapter 7. Automating System Integration Tests for Distributed Systems 151
Test # VMs Duration (s) Memory (MiB)
empty 1 45.9 166
openssh 1 53.7 267
kde4 1 140.4 433
subversion 2 104.8 329
trac 4 159.4 756
proxy 4 65.4 477
quake3 3 80.6 528
transmission 4 89.5 457
installation 2 302.7 751
nfs 3 259.7 358
Table 7.1 Test resource consumption
For a continuous test to be effective, it must be timely: the interval between
the commit and the completion of the test must be reasonably short. Table 7.1
shows the execution time and memory consumption for the tests listed above,
averaged over five runs. As a baseline, the test empty starts a single machine
and shuts down immediately.
The execution time is the elapsed wall time on an idle 4-core Intel Core
i5 750 host system with 6 GiB of RAM running 64-bit NixOS. The memory
consumption is the peak additional memory use compared to the idle system.
(The host kernel caches were cleared before each test run by executing echo
3 > /proc/sys/vm/drop_caches.) All VMs were configured with 384 MiB of RAM,
though due to KVM’s para-virtualised “balloon” driver the VMs typically use
less host memory than that. The host kernel was configured to use KVM’s
same-page merging feature, which lets it replace identical copies of memory
pages with a single copy, significantly reducing host memory usage. (See
e.g. [Miło´s et al., 2009] for a description of this approach.)
Table 7.1 shows that the tests are fast enough to execute from a continuous
build system. Many optimisations are possible, however: for instance, VMs
with identical ker nels and initial ramdisks could be started from a shared,
pre-computed snapshot.
7.7 R E L AT E D W O R K
Most work on deployment of distributed systems takes place in the context of
system administration research. Cfengine [Burgess, 1995] maintains systems
on the basis of declarative specifications of actions to be performed on each
(class of) machine. Stork [Cappos et al., 2007] is a package management sys-
tem used to deploy virtual machines in the PlanetLab testbed. These and most
other deployment tools have convergent models [Traugott and Brown, 2002],
meaning that due to statefulness, the actual configuration of a system after
an upgrade may not match the intended configuration. By contrast, NixOS’s
purely functional model ensures congruent behaviour: apart from mutable
state, the system configuration always matches the specification.
152
Virtualisation does not necessarily make deployment easier; apart from
simplifying hardware management, it may make it harder, since without
proper deployment tools, it simply leads to more machines to be managed
[Reimer et al., 2008].
MLN [Begnum, 2006], a tool for managing large networks of VMs, has
a declarative language to specify arbitrary network topologies. It does not
manage the contents of VMs beyond a templating mechanism.
Our VM testing approach currently is only appropriate for relatively small
virtual networks. This is usually sufficient for regression testing of typical
bugs, since they can generally be reproduced in a small configuration. It
is not appropriate for scalability testing or network experiments involving
thousands of nodes, since all VMs are executed in the same derivation and
therefore on the same host. However, depending on the level of virtualisation
required for a test, it is possible to use virtualisation techniques that scale to
hundreds of nodes on a single machine [Hibler et al., 2008].
There is a growing body of research on testing of distributed systems;
see [Rutherford et al., 2008, Section 5.4] for an overview. However, deploy-
ment and management of test environments appear to be a somewhat ne-
glected issue. An exception is Weevil [Wang et al., 2005], a tool for the de-
ployment and execution of experiments in testbeds such as PlanetLab. We are
not aware of tools to support the synthesis of VMs in automatic regression
tests as part of the build processes of software packages.
During unit testing, environmental dependencies such as databases are
often simulated using test stubs or mock objects [Freeman et al., 2004]. These
are sometimes used due to the difficulty of having a “real” implementation
of the simulated functionality. Generally, however, stubs and mocks allow
more fine-grained control over interactions than would be feasible with real
implementations, e.g., when testing against I/O errors that are hard to trigger
under real conditions.
7.8 C O N C L U S I O N
In this chapter, we have shown a method for synthesizing virtual machines
from declarative specifications to perform integration or system tests. This
allows such tests to be easily automated, an essential property for regression
testing. It enables developers to write integration tests for their software that
would otherwise require a great deal of manual configuration, and would
likely not be done at all.
Chapter 7. Automating System Integration Tests for Distributed Systems 153
154
Part IV
General Concepts
155
8
Atomic Upgrading of Distributed Systems
A B S T R A C T
In this thesis, we have implemented two tools to perform reliable, repro-
ducible and efficient distributed deployment and upgrade tasks. Disnix is a
distributed deployment tool working on service level and DisnixOS provides
complementary deployment support on infrastructure level. A key aspect in
making their upgrades reliable is by making the upgrade process as atomic
as possible. We either want to have the current deployment state or the new
deployment state, but never an inconsistent mix of the two. Furthermore, it is
also desirable that end-users are not able to notice that the system is temporar-
ily inconsistent while an upgrade is performed. In this chapter, we describe
how the Nix package manager achieves atomic upgrades and rollbacks on
local machines and we explore how we can expand these concepts to dis-
tributed systems. We have implemented a trivial case study to demonstrate
the concepts.
8.1 I N T R O D U C T I O N
A distributed system can be described as a collection of independent com-
puters that appear to a user as one logical system [Nadiminti et al., 2006].
Processes in a distributed system work togethe r to reach a common goal by
exchanging data with each other.
A major problem of upgrading distributed systems is integrity if we want
to change the deployment state of a distributed system to a new state, for
instance by adding, removing or replacing components on computers or mi-
grating components from one computer to another, we do not want to end up
with a system in an inconsistent state, i.e., a mix of the old and new configura-
tions. Thus, if we upgrade a distributed system, there should never be a point
in time where a new version of a service on one computer can talk to an old
version of a service on another computer.
To guarantee the correctness of the deployment we need some notion of a
transactional, atomic commit similar to database systems. That is, the transi-
tion from the current deployment state to the new deployment state should be
performed completely or not at all, and it should not be observable half-way
through.
In this thesis, we have described two tools that provide a solution for this
problem, namely Disnix covered in Chapter 4, which is designed for deploy-
ment on service level and DisnixOS (using charon) covered in Chapter 6, de-
signed for deployment on infrastructure level. In this chapter, we explain how
157
Nix performs local upgrades and we explore how to expand this feature to
support atomic upgrades of distributed systems.
8.2 ATO M I C U P G R A D I N G I N N I X
As we have explained earlier in Section 2.2, the Nix package manager is ca-
pable of performing atomic upgrades. Atomic upgrades are possible, because
of a number of important Nix characteristics.
8.2.1 Isolation
The first important aspect is that Nix isolates component variants using unique
file names, such as /nix/store/pz3g9yq2x2ql...-firefox-12.0. As we have seen, the
string pz3g9yq2x2ql... is a 160-bit cryptographic hash of the inputs to the build
process of the package: its source code, the build script, dependencies such
as the C compiler, etc. This scheme ensures that different versions or variant
builds of a package do not interfere with each other in the file system. For
instance, a change to the source code (e.g., due to a new version) or to a de-
pendency (e.g., using another version of the C compiler) will cause a change
to the hash, and so the resulting package will end up under a different path
name in the Nix store.
Furthermore, components never change after have been successfully built.
There is no situation where the deployment of a Nix package overwrites or
removes files belonging to another component. When building the same com-
ponent with different parameters, we always get a new package with a new
unique hash code.
If the build procedure of a component fails for a particular reason, it is
marked as invalid and removed if the Nix garbage collector is invoked.
8.2.2 Profiles
The second important aspect is the way Nix profiles are implemented. As
described earlier, Nix profiles are symlink trees, in which symlinks refer to
components in the Nix store and its primary use case is to make it easier for
end users to access installed Nix packages.
For example, if an end-user requests to install Mozilla Firefox in his Nix
profile, e.g.:
$ nix-env -i firefox
Then the Nix package manager generates a new Nix profile, which is a
component residing in the Nix store containing symlinks to the contents of
all packages installed in the users’ Nix profile. The generated Nix profile has
the same contents as the previous generation, with symlinks to the contents
of the Mozilla Firefox package added.
After generating the Nix profile component, the symlink referring to the
active Nix profile generation is flipped so that it points to the newly generated
158
/home/sander/.nix-profile
/nix/var/nix/profiles
default
default-23
default-24
/nix/store
q2ad12h4alac...-user-env
bin
firefox
z13318h6pl1z...-user-env
pz3g9yq2x2ql...-firefox-12.0
bin
firefox
6cgraa4mxbg1...-glibc-2.13
lib
libc.so.6
Figure 8.1 A Nix profile configuration scenario
Nix profile, making the new set of installed packages available to the end-
user. Because flipping symlinks is an atomic operation in UNIX, systems can
be upgraded atomically.
If any of these previous steps fails, then the installation procedure stops
immediately. Because Nix store components installed in an older Nix profile
can safely coexist next to components belonging to a newer Nix profile and
because files are never overwritten or automatically removed, a system is still
in its previous deployment state if the upgrade procedure suddenly stops.
Furthermore, because components belonging to the new configuration are
not part of the current active Nix profile, they are considered garbage and are
automatically removed when the Nix garbage collector is invoked.
Figure 8.1 shows an example of a hierarchy of symlinks, capturing a de-
ployment scenario after Mozilla Firefox has been installed in a users’ Nix
profile. /home/sander/.nix-profile is a symlink residing in my home directory. By
adding /home/sander/.nix-profile/bin to the PATH environment variable, programs
can be conveniently accessed by typing them on the command-line.
The symlink in my home directory points to /nix/var/nix/profiles/default repre-
senting the active version of the default profile. The default profile symlink
points to a generation symlink that is currently used. The generation symlink
refers to a Nix store component containing symlinks that refer to the contents
of the packages installed by the user, including Mozilla Firefox.
If the user wants to undo the operation, it can perform a rollback by run-
ning:
$ nix-env -rollback
The rollback operation is trivial. To undo the previous deployment op-
eration, only the /nix/var/nix/profiles/default symlink has to be flipped, so that it
points to default-23 instead of default-24. This operation is an atomic operation.
Chapter 8. Atomic Upgrading of Distributed Systems 159
Apart from ordinary Nix user profiles, the same concepts are also applied
in NixOS. As explained earlier in Section 2.3, the NixOS system profile, con-
sisting of all static parts of a Linux distribution and activation scripts which
take care of the dynamic parts of a system, are managed the same way as
Nix user profiles and can also be atomically upgraded on a single machine by
flipping symlinks.
8.2.3 Activation
In NixOS, merely flipping symlinks is not enough to fully upgrade a system.
After the static parts of a system have been successfully upgraded, also the
impure parts of a system must be updated, such as the contents of the /var
directory and several other dynamic parts of the filesystem. Furthermore,
obsolete system services must be stopped and new system services must be
started. This process is imperative and cannot be done atomically.
If the activation process of a new NixOS configuration fails, we can do a
rollback of the NixOS profile and then activate the previous NixOS configu-
ration. This brings the NixOS configuration in the previous deployment state
that should still work.
8.3 D I S T R I B U T E D AT O M I C U P G R A D I N G
Nix supports local atomic upgrades for static parts of a system. As we have
explained, atomic upgrading is based on the fact that components can be
isolated and do not interfere with each other and the fact that symlinks can
be atomically flipped in UNIX. In this section, we explore how we can expand
these concepts to distributed systems.
8.3.1 Two phase commit protocol
The two phase commit protocol is a distributed algorithm that coordinates all
the processes that participate in a distributed atomic transaction [Skeen and
Stonebraker, 1987]. When processing transactions in database systems, typi-
cally a number of tasks are performed until the commit point is reached, then
either a commit operation is executed, which makes the changes permanent or
an abort, performing a rollback so that none of its effects persists.
The two phase commit protocol implements distributed transactions by
dividing the two phases of a transaction over nodes in the network and by
turning it into an agreement problem. The transaction is initiated by a node
called the coordinator. The nodes participating in the transaction are called the
cohorts. The two phase commit protocol is implemented roughly as follows:
First, the commit-request phase is executed:
The coordinator sends a query to commit message to all cohorts and waits
until it has received a reply from all cohorts. Typically, cohorts ensure
that no other processes can interfere with the transaction.
160
Every node executes the necessary steps to reach the commit point. A
cohort replies with a yes message if all their steps have been successfully
executed, otherwise it replies with a no message.
Second, the commit phase is executed:
If all cohorts have responded with a yes message, the coordinator sends
a commit message to all cohorts. If any cohort has responded with a no
message then all the cohorts receive an abort instruction.
Cohorts receiving the commit message perform the commit operation and
cohorts receiving the abort message perform the abort operation.
After the previous operations have been performed, an acknowledgement
message is sent to the coordinator.
The protocol achieves its goal in many cases, even when a temporary sys-
tem failure occurs involving a process or communication link and is thus
widely utilised. The protocol is not completely resilient to all failures, and
in rare cases user intervention is needed to remedy an outcome. The major
disadvantage is that the protocol is a blocking protocol. If the coordinator fails
permanently, then the cohorts participating in a transaction may be blocked
forever. There are a number of improved algorithms, such as the three-phase
commit [Skeen and Stonebraker, 1987] that remedies the blocking issue with
time-outs. Furthermore, as opposed to local atomicity, distributed atomicity is
not a single operation, but a collection of operations on all nodes participating
in a transaction.
8.3.2 Mapping Nix deployment operations
As we have seen, the two phase commit protocol can be used to implement
distributed transactions by splitting the phases of local transactions and turn
them into an agreement problem. An interesting question is whether we can
use this with Nix primitives to support distributed atomic upgrades?
By looking at how the actual deployment processes of Disnix and Charon
are implemented, this mapping is quite trivial. The commit-request phase in
Disnix and Charon (which we call distribution phase in our tools) basically
consist of the following steps:
Building components
Transferring component closures
As we have seen, components built by the Nix package manager are iso-
lated and never interfere with other components. If all components have been
successfully built, all the component closures are transferred to the target
machines in the network. This process is also safe as all dependencies are
guaranteed to be included and never interfere with components belonging to
the current activate configuration.
Chapter 8. Atomic Upgrading of Distributed Systems 161
If any of these steps fail, nothing needs to be done to undo the changes, as
components part of the old configuration still exist and the new configuration
has not been activated yet. Components belonging to the new configura-
tion are considered garbage by the Nix garbage collector. The commit phase
(which we call transition phase in Disnix and Charon) consists of the following
steps:
Flipping Nix profiles
Deactivating obsolete components and activating new components
The first step flips the symlinks of the Nix profiles making the new con-
figuration the active configuration, which is an atomic operation, as we have
explained earlier. After the static configurations have become active, the Dis-
nix activation scripts or NixOS activation scripts are executed, taking care of
activating the dynamic parts of a system, which is an impure operation. Dur-
ing this phase, an end-user may observe that a system is changing, e.g. certain
parts are temporarily inaccessible.
If any of these steps fails, the coordinator has to initialise a rollback by
flipping the symlinks to the previous Nix profile generations. Furthermore,
if new services have been activated, then they must be deactivated and the
services of the previous configuration must reactivated again.
An instrument to prevent end-users noticing the temporal inconsistency
time-window is by temporarily blocking or queuing end-user access in the
activation phase. This requires cooperation of the system that is being up-
graded, as certain components must be notified. Moreover, a proxy is needed
capable of blocking or queuing messages for a specific connection type.
8.4 E X P E R I E N C E
To evaluate the usefulness of the distributed upgrade algorithm using Nix
primitives and to achieve full atomic distributed upgrades, we have extended
Disnix and we have implemented a trivial example case demonstrating its use.
8.4.1 Supporting notifications
As we have explained, in order to fully support distributed atomic upgrades,
we must temporarily block or queue connections in the activation phase. In
order to achieve this goal, the service that serves as an entry point to end-
users must be notified that the activation phase starts and that the activation
stops. In order to notify services, we have defined two additional operations
in the Disnix activation scripts system, next to the activation and deactivation
operations, described in Chapter 4.
The lock operation is called when the activation phase starts. An activation
script can notify a service, so that it can prepare itself to queue connections
or optionally reject the lock, if it is not safe to upgrade. The unlock operation
is called when the activation phase ends, so that a service can be notified to
stop queuing connections.
162
Figure 8.2 The Hello World example and the example augmented with a TCP proxy
8.4.2 The Hello World example
As an example case, we have implemented a very trivial “Hello world” client-
server application, shown on the left in Figure 8.2. This example case is briefly
mentioned in Chapter 4.
The system is composed of two services: the hello-world-server is a server
application listening on TCP port 5000. The hello-world-client is a client program,
connecting to the server through the given port and waits for user input which
is sent to the server. If the user sends the “hello” message, the server responds
by sending a “Hello world!” response message, which the client displays to
the end user.
An upgrade can be triggered, for example, by implementing a change in
the hello-world-ser ver source code and by running the following command-line
instruction:
$ disnix-env -s services.nix -i infrastructure.nix \
-d distribution.nix
Although the static parts of the system are safely stored next to existing
parts and can be atomically upgraded by flipping symlinks, upgrading has
a few nasty side effects. If a client is connected to the server while upgrad-
ing, then all the clients are disconnected (which also terminates the clients),
because the server is unreachable. Also other clients may want to establish a
connection during the upgrade phase which is impossible, because it may not
have been restarted yet.
8.4.3 Adapting the Hello World example
To counter these side effects and to make an upgrade process fully atomic,
several aspects of our example case must be adapted. Our adapted example,
shown on the right in Figure 8.2 contains a TCP proxy service: disnix-tcp-proxy
between the server and client. The purpose of the TCP proxy is to serve as
a monitor keeping track how many clients are connected and to temporarily
queue TCP connections during the activation phase.
Chapter 8. Atomic Upgrading of Distributed Systems 163
#!/bin/bash -e
export PATH=/nix/store/q23p9z1...-disnix-tcp-proxy/bin:$PATH
case "$1" in
activate)
130
nohup disnix-tcp-proxy 6000 test1 5000 /tmp/disnix-tcp-proxy-5000.lock > \
/var/log/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).log & pid=$!
echo $pid > /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid
;;
deactivate) 131
kill $(cat /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid)
rm -f /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid
;;
lock) 132
if [ -f /tmp/disnix-tcp-proxy-5000.lock ] 133
then
exit 1
else
touch /tmp/disnix-tcp-proxy-5000.lock
if [ -f /var/run/$(basename /nix/store/q23p9z1...-disnix-tcp-proxy).pid ]
then
while [ "$(disnix-tcp-proxy-client)" != "0" ] 134
do
sleep 1
done
fi
fi
;;
unlock)
135
rm -f /tmp/disnix-tcp-proxy-5000.lock
;;
esac
Figure 8.3 The Disnix activation script for the TCP proxy service
Apart from integrating a proxy into the architecture of the system that must
be deployed, we must also implement a Disnix activation script that notifies
the proxy when the activation phase starts and ends.
Figure 8.3 shows how the Disnix activation script for the TCP proxy mod-
ule is implemented. This is a shell script included with the Disnix TCP proxy
service package and invoked through the Disnix wrapper activation script,
which delegates the activation scripts operations to the bin/wrapper process in-
cluded in the service package:
130 The activate section defines the activation step of the TCP proxy. Here,
the disnix-tcp-proxy service is started, which listens on TCP port 6000 and
forwards the incoming connections to the hello-server hosted on machine
test1 listening on TCP port 5000. These values are generated by the
Disnix expression that builds the TCP proxy service. The PID of the
TCP proxy is stored in a separate PID file, which is used to eventually
stop the proxy service.
131 The deactivation section defines the deactivation step of the TCP proxy,
killing the PID of the TCP proxy process.
164
132 The lock section is invoked before the Disnix transition phase starts.
133 The first step in the lock procedure, is to check whether the lock has
been granted to somebody else. If it turns out that a lock has been
granted already, the procedure returns exit status 1, indicating an error.
Otherwise the lock is granted to the requester, by creating a lock file that
instructs the TCP proxy to queue incoming TCP connections.
134 After locking, a loop is executed, which polls the Disnix TCP proxy
through a Unix domain socket connection to check the status. The disnix-
tcp-proxy-client returns the amount of established connections from the
client to the server. The lock procedure waits until all client connec-
tions are closed (which means that the client should return 0) and then
grants permission. Meanwhile, because the lock file has been created,
the Disnix TCP proxy queues incoming TCP connections.
135 The unlock section defines the unlock procedure. Here, the lock file is
removed so that the disnix-tcp-proxy accepts and forwards incoming con-
nections again.
When running the following command-line instruction, with the augmented
architecture, we can achieve fully atomic distributed upgrades:
$ disnix-env -s services-with-proxy.nix -i infrastructure.nix \
-d distribution-with-proxy.nix
When running the procedure, all phases are executed exactly the same as
the variant without the proxy until the activation phase has been reached.
If there are any clients connected to the server, then Disnix waits until all
clients have been disconnected. Furthermore, new connections are automati-
cally queued.
If all clients have been disconnected, Disnix executes the activation phase
in which new incoming client connections are queued. If the activation phase
ends, then the proxy gets notified and forwards the incoming connections to
the new server component or to the old server component if the upgrade has
failed.
8.4.4 Other examples
So far we have described a very trivial example case to achieve fully atomic
distributed upgrades. We have also experimented with applying the same
techniques to several use cases described in Chapter 4, such as the StaffTracker
system, which uses a web application front-end instead of a command-line
utility for end-user access.
For the latter example case, merely using a proxy is not enough to achieve
full atomic upgrades. After upgrading a web application, users may still run
old versions of a web application in their browsers, which are not automati-
cally refreshed and may lead to several problems. Furthermore, web applica-
tions typically have state attached to a user session, which may get lost after
Chapter 8. Atomic Upgrading of Distributed Systems 165
an upgrade. State may have to be upgraded as well so that it works properly
with the new version of the web application. In order to properly upgrade
web applications, developers have to adapt web applications in such a way
that they can be notified of upgrade events so that they refresh themselves
automatically and that they can migrate their state properly.
Another difficulty is supporting distributed systems in which processes
communicate through stateless protocols, such as UDP. For these applications,
it is non-trivial to determine the amount of active connections and to decide
whether it is safe to enter the activation phase.
8.5 D I S C U S S I O N
What we have seen in this chapter is that creating a distributed upgrade pro-
tocol using Nix primitives is a trivial process. The non-trivial part is the last
step in which the transition from the old configuration to the new config-
uration is made final, by activating obsolete services and by activating new
services. This phase is imperative and may bring the system temporarily in
an inconsistent state observable by end-users.
Fortunately, the window in which the system is inconsistent is reduced to a
minimum most steps such as installing and removing services can be done
without interference as Disnix allows components to be safely deployed next
to each other. To completely hide the side-effect, we have developed a simple
TCP proxy and prototype example case that temporarily queues incoming
connections.
To achieve full atomicity, we require cooperation of the system that is being
upgraded, which typically require developers to implement certain features
and changes, which are not always trivial. Proxies need to be added capable
of determining when it safe to upgrade and which can temporarily queue
connections. Furthermore, other components may have to be notified and
may also need to upgrade their state.
The upgrade protocol also suffers the same weaknesses as the original two-
phase commit protocol. For example, if the coordinator node crashes during
the execution then the cohorts may be locked forever, until a system admin-
istration manually intervenes. Fortunately, in our deployment variant, we
only have to lock cohorts in the commit phase as the previous steps can be
safely executed without affecting an existing configuration. Furthermore, as
with the original two phase commit protocol, the distributed commit step is a
series of commit actions as opposed to a single operation, which brings a sys-
tem in an inconsistent state if the process gets interrupted and is not properly
recovered.
Moreover, the time window in which the system is inconsistent can also be
improved by implementing certain upgrading techniques in the system that
is deployed, such as quiescence [Kramer and Magee, 1990], tranquility [Van-
dewoude et al., 2007] or version-consistency [Ma et al., 2011].
In [Chen et al., 2007], a dynamic update system called POLUS is described
that runs multiple versions of a process simultaneously and redirects function
166
calls from the old version to the new version using the debugging API. More-
over, it can also recover tainted state by using callback functions. The system
is not used in a distributed environment however. In our approach, we only
run one instance of a particular component and the proxy queues incoming
connections, while the transition process is in progress. However, it may be
possible to make the proxy more sophisticated to support redirection from
old versions to new versions of two instances running simultaneously.
In this chapter, we have only explored how to achieve full atomic upgrades
with Disnix on service level. It is also possible to implement similar tech-
niques to support fully atomic infrastructure upgrades on infrastructure level,
but so far this has not been done yet.
8.6 C O N C L U S I O N
In this chapter, we have described the Nix upgrade property and how to
map these primitives to the two-phase commit protocol to achieve atomic dis-
tributed deployment upgrades. As a case study, we have developed a trivial
client-server example case communicating through TCP sockets and we have
extended Disnix.
Although we have achieved our goal with our trivial TCP example, achiev-
ing fully atomic distributed upgrades in general is difficult. The protocol
suffers the same weakness as the two-phase commit protocol and the ap-
proach requires cooperation of the system being upgraded to make upgrades
fully atomic (e.g. implementing techniques such as a proxy), although the
time window in which inconsistency occurs has been reduced to a minimum,
since we can install packages safely next to each other without affecting an
existing configuration. To achieve full atomicity, it is required that developers
make several modifications to the system that is being upgraded, which is not
always trivial.
Chapter 8. Atomic Upgrading of Distributed Systems 167
168
9
Deploying Mutable Components
A B S T R A C T
In the previous chapters, we have shown Nix, a purely functional package
manager and several extensions, such as Disnix, capable of deploying service-
oriented systems in a network of machines. As we have seen, Nix and its
applications only support deployment of immutable components, which never
change after they have been built. Components are made immutable because
of the underlying purely functional model in order to ensure purity and repro-
ducibility. However, not all components of software systems can be isolated in
a store of immutable objects, such as databases. Currently, these components
must be deployed by other means, i.e. in an ad-hoc manner, which makes
deployment and upgrades of such systems difficult, especially in large net-
works. In this chapter, we analyse the properties of mutable components and
we propose Dysnomia, a deployment extension for mutable components.
9.1 I N T R O D U C T I O N
Throughout this thesis we have seen that for many reasons software deploy-
ment processes have become very complicated and that this complexity is
proportional to the number of machines in a network. Moreover, upgrading
systems is even more challenging and not always yield the same results as a
clean installation.
In this thesis, we have described several tools providing automated, reliable
and reproducible deployment automation for several deployment aspects. We
have also seen in this thesis, that the fundamental basis of these tools is the
Nix package manager, a package manager borrowing concepts from purely
functional programming languages, such as Haskell. Nix provides a number
of distinct features compared to conventional deployment tools, such as a
domain-specific language to make builds reproducible by removing many
side effects, a Nix store which safely stores multiple versions and variants of
components next to each other, atomic upgrades and rollbacks, and a garbage
collector which safely removes components which are no longer in use.
In the purely functional deployment model of Nix, all components are
stored as immutable file system objects in the Nix store, which never change
after they have been built. Making components immutable ensures purity
and reproducibility of builds. However, not all components of a system can
be managed in such a model, because they may change at run-time, such
as databases. Currently, Nix-related tools manage mutable state in an ad-hoc
manner and outside the Nix store, e.g. by creating scripts which initialise
a database at first startup. As a consequence, system administrators or de-
169
rec {
stdenv = ...
fetchurl = ...
gtk = stdenv.mkDerivation {
name = "gtk-2.24.1";
src = ...
};
firefox = stdenv.mkDerivation {
name = "firefox-9.0.1";
src = fetchurl {
url = http://../firefox-9.0.1-source.tar.bz2;
md5 = "195f6406b980dce9d60449b0074cdf92";
};
buildInputs = [ gtk ... ];
buildCommand = ’’
tar xfvj $src
cd firefox-
*
./configure --prefix=$out
make
make install
’’;
};
...
}
Figure 9.1 A partial Nix expression
velopers who deploy systems using Nix-related tools still need to manually
deploy and upgrade mutable components, which can be very tedious in large
networks of machines.
In this chapter, we explore the properties of mutable components and we
propose Dysnomia, a generic deployment extension using concepts close to
Nix, to manage the deployment and upgrades of mutable components. This
tool is designed to be used in conjunction with Disnix and other Nix-related
tools.
9.2 T H E N I X D E P L O Y M E N T S Y S T E M
As we have explained earlier in Chapter 2, Nix is designed to automate the
process of installing, upgrading, configuring and removing software packages
on a system. Nix offers a number of distinct features to make deployment
more reliable.
We have seen that one of its key traits is that Nix stores packages in isolation
in a Nix store. Every component in the Nix store has a special filename, such
as: /nix/store/324pq1...-firefox-9.0.1 in which the former part: 324pq1... represents
a cryptographic hash-code derived from all build-time dependencies. If, for
example, Firefox is build with a different version of GCC or linked against
a different version of the GTK+ library, the hash code will differ and thus it
is safe to store multiple variants next to each other, because components do
not share the same name. After components have been built, they are made
immutable by removing the write permission bits.
170
Nix packages are derived from declarative specifications defined in the
purely functional Nix expression language. An example of a partial Nix ex-
pression is shown in Figure 9.1, containing an attribute set in which each
attribute refers to a function building the component from source code and
its dependencies. For instance, the firefox attribute value refers to a function
call which builds Mozilla Firefox from source code and uses GTK+ as a build
time dependency, which is automatically appended to the PATH environment
variable of the builder. The output of the build is produced in a unique direc-
tory in the Nix store, which may be: /nix/store/324pq1...-firefox-9.0.1.
While performing a build of a Nix package, Nix tries to remove many
side effects of the build process. For example, all environment variables are
cleared or changed to non-existing values, such as the PATH environment vari-
able. We have also patched various essential build tools such as gcc in such a
way that they ignore impure global directories, such as /usr/include and /usr/lib.
Optionally, builds can be performed in a chroot environment for even better
guarantees of purity.
Because Nix removes as many side effects as possible, dependencies can
only can be found by explicitly setting search directories, such as PATH to Nix
store paths. Undeclared dependencies, which may reside in global directories
such as /usr/lib cannot accidentally let a build succeed.
The purely functional deployment model of Nix has a number of advan-
tages compared to conventional deployment tools. Because most side-effects
are removed, a build function always gives the same result regardless on what
machine the build is performed, giving a very high degree of reproducibility.
Furthermore, we can efficiently upgrade systems, because components with
the same path, which are already present do not have to be rebuilt or redown-
loaded. Upgrades of systems are also always safe, because newer versions and
variants of components can be safely coexist with components belonging to
an older configuration. Finally, there is never a situation in which a package
has files belonging to an old and new version at the same time.
9.3 D I S N I X
As explained in Chapter 4, Disnix extends Nix by providing additional models
and tools to manage the deployment of service-oriented systems in a network
of machines. Disnix uses a services model to define the distributable compo-
nents available, how to build them, their dependencies and their types used
for activation. An infrastructure model is used to describe what machines are
available and their relevant properties. The distribution model maps services
defined in the services model to machines defined in the infrastructure model.
Figure 9.2 shows an example of a service model, describing two distributable
components (or services). testDatabase is a MySQL database storing records.
testWebApplication is a Java web application which depends on the MySQL
database.
Each service defines a number of attributes. The name attribute refers to its
canonical name, dependsOn defines the dependencies of the service on other
Chapter 9. Deploying Mutable Components 171
{pkgs, system, distribution}:
rec {
testDatabase = {
name = "TestDatabase";
type = "mysql-database";
pkg = pkgs.testDatabase;
};
testWebApplication = {
name = "TestWebApplication";
dependsOn = {
inherit testDatabase;
};
pkg = pkgs.testWebApplication;
type = "tomcat-webapplication";
};
}
Figure 9.2 A partial Disnix services model
services (which may be physically located on another machine) and pkg de-
fines how the service must be built (not shown here). The type attribute is
used for the activation of the components by invoking a script attached to the
given type. For example, the tomcat-webapplication type indicates that the given
component should be deployed in an Apache Tomcat container. The mysql-
database type is used to deploy an initial MySQL database from a given SQL
file containing table definitions. Disnix supports a number of predefined acti-
vation types as well as an extension interface, which can be used to implement
a custom activation type.
By running the disnix-env command with the three models as command-
line parameters, the complete deployment process of a service-oriented system
is performed, which consists of two phases. In the distribution phase, com-
ponents are built from source code (including all their dependencies) and
transferred to the target machines in the network. Because Nix components
can be safely stored next to existing components of older configurations, we
can safely perform this procedure without affecting an existing system con-
figuration. After this phase has succeeded, Disnix enters the transition phase,
in which obsolete components of the old configuration are deactivated and
new components are activated in the right order. We have seen in the previ-
ous chapter that end-user connections can be blocked/queued in this phase,
so that end-users will not notice that the system is temporarily inconsistent, a
requirement to achieve full atomicity.
In case of a change in a model, an upgrade is performed in which only
necessary parts of the system are rebuilt and reactivated. Disnix determines
the differences between configurations by looking at the hash code parts of
Nix store components. If they are the same in both configurations, they do
not have to be deployed again.
172
9.4 M U TA B L E C O M P O N E N T S
The components managed by Nix and Disnix are immutable. Mutable com-
ponents, such as databases, cannot be managed in such a model, apart from
the initial deployment step from a static schema. If a component in the Nix
store would change, we can no longer guarantee that a deterministic result
is achieved when it is used during a build of another component, because
hash codes may no longer uniquely represent a build result. Apart from the
fact that mutable components change over time, they have a number of other
distinct characteristics compared to immutable components managed in the
Nix store.
First, many mutable component types are hosted inside a container [Dearle,
2007], such as a database management system or an application server. In
order to deploy mutable components, file system operations are typically not
always enough; also the state of the container must be adapted. For example,
to deploy a database component, the DBMS must be instructed to create a
new database with the right authorisation credentials, in order to deploy a
Java web application on a Servlet container, the Servlet container must be
notified.
Second, state is often stored on the file system in a format which is close
to the underlying implementation of the container and the given platform.
These files may not be portable and they may have to reside in an unisolated
directory reserved for the container. For example, it may be impossible to
move database state files from a 32-bit instance of a container to a 64-bit in-
stance of a container, or to transfer state data from a little endian machine, to
a big endian machine. Furthermore, state files may also contain locks or may
be temporarily inconsistent, because of unfinished operations.
Because of these limitations, many containers have utilities to dump their
state data in a portable format (e.g. mysqldump) and in a consistent manner
(e.g. single transaction) and can reload these dumps back into memory, if
desired. Because such operations may be expensive, many containers can
also create incremental dumps. In this chapter, we refer to the actual files
used by a container as physical state. A dump in a portable format is a logical
representation of a physical state.
9.5 D Y S N O M I A
Nix and Disnix provide automated, reliable deployment of static parts of soft-
ware systems and networks of software systems. In this section, we explore
how we can manage mutable software components in conjunction with Nix-
related tools, in an extension which we call Dysnomia.
9.5.1 Managing mutable components
Deployment operations of Nix components are typically achieved by perform-
ing operations on the filesystem. As mentioned in Section 9.4, deployment
Chapter 9. Deploying Mutable Components 173
of mutable components cannot be typically done by filesystem operations.
Therefore, we must interact with tooling capable of activating and deacti-
vating mutable components and we need to interact with tools capable of
generating snapshots representing the logical representation of the state. Be-
cause these tools are different for each component type, we need to provide
an abstraction layer, which can be used to uniformly access all mutable com-
ponents.
To deploy mutable components, Dysnomia defines an interface with the
following operations:
activate. Activates the given mutable component in the container. Op-
tionally, a user can provide a dump of its initial state.
deactivate. Deactivates the given mutable component in the container.
snapshot. Snapshots the current physical state into a dump in a portable
format.
incremental-snapshot. Creates an incremental snapshot in a portable for-
mat.
restore. Brings the mutable component in a specified state by restoring a
dump.
The actual semantics of the operations depend on the mutable component
type. For example, the activate operation for a MySQL database implies creat-
ing a database and importing an initial schema, whereas the activate operation
for a Subversion repository means creating a Subversion repository. The snap-
shot operation of a MySQL database invokes mysqldump which stores the cur-
rent state of database in a SQL file and the snapshot operation for a Subversion
repository invokes svnadmin dump to create a Subversion dump.
9.5.2 Identifying mutable components
Nix and Nix applications use unique names of components to make deploy-
ment reproducible and to determine how to efficiently upgrade systems. For
example, if a particular component with a hash code already exists, we do not
have to build it again because its properties are the same.
In order to reproduce state in conjunction with static parts of the system,
we need to snapshot, transfer and restore the logical representation of state of
mutable components and we must be capable of distinguishing between these
snapshots based on their names so that we can decide whether we have to
deploy a new generation or whether it can remain untouched. Unfortunately,
we cannot use a hash of build-time dependencies to uniquely address mutable
component generations.
To uniquely identify logical state of mutable components several properties
are important. First, we need to know the name of the mutable component,
which refers to the database name or Subversion repository name. We need
174
Distribution phase:
Building components
Transferring component closures
Transition phase:
Flipping Nix profiles
Request lock on services
Deactivating obsolete components and activating new components
Request unlock on services
Figure 9.3 Original Disnix deployment procedure
to know in which container it is hosted, which we can identify by a con-
tainer name and an optional identifier (if there are more instances of the same
container type). Third, we need an attribute to make a distinction between
generations, which must be derived from the container that is being used. For
example, a generation number can be a MySQL binary log sequence num-
ber or a Subversion revision number. Also a time stamp could be used if a
container does not have facilities to make a distinction between generations,
although this may have practical implications, such as synchronisation issues.
For example, the logical state of a particular Subversion revision can be iden-
tified by: /dysnomia/svn/default/disnix/34124.
9.5.3 Extending Disnix
We have replaced the Disnix activation module system by Dysnomia and ex-
tended the deployment procedure of disnix-env described in Section 8.3.2, to
include deployment steps for mutable components. The original Disnix de-
ployment procedure is described in Figure 9.3.
To provide support for the deployment of mutable components, we have
to implement several changes to the Disnix deployment procedure. In the
transition phase, in which access from end users can be temporarily blocked,
we first determine the names of the logical state snapshots of mutable compo-
nents. Then we create dumps of all mutable components which have not been
snapshotted yet (which becomes obvious by checking whether a dump with
the same filename already exists) and we transfer these dumps to the coordi-
nator machine. After deactivating obsolete services in the old configuration
and activating new services in the new configurations, the state for the muta-
ble components is transferred from the coordinator to their new locations in
the network and activated. Finally, end user access is restored. The updated
deployment procedure is described in Figure 9.4.
Chapter 9. Deploying Mutable Components 175
Distribution phase:
Building components
Transferring component closures
Transition phase:
Flipping Nix profiles
Request lock on services
Snapshot and transfer mutable component states
Deactivating obsolete components and activating new components
Import mutable component snapshots
Request unlock on services
Figure 9.4 Disnix deployment procedure with state transfer
Because creating snapshots of mutable components may be expensive (es-
pecially for large databases) and because access to end users may be blocked,
this procedure may give a large time window in which the system inaccessi-
ble, which is not always desirable. Therefore, we have also implemented an
optimised version of this procedure using incremental snapshots (for mutable
components capable of doing this). In this improved version, we create dumps
of mutable components before the transition phase, allowing end users to still
access the system. After the dumps have been transferred to their new loca-
tions, we enter the transition phase (in which end-user access is blocked) and
we create and transfer incremental dumps, taking the changes into account
which have been made while performing the upgrade process. In this proce-
dure we still have downtime, but this time window is reduced to a minimum.
The optimised deployment procedure is shown in Figure 9.5.
9.6 E X P E R I E N C E
We have developed Dysnomia modules for a number of mutable component
types, which we have listed in Table 9.1. Each column represents a separate
aspect of the module, such as the container for which it has been designed,
the semantics of the activation, deactivation, snapshot and restore operations
and which attribute is used to make a distinction in naming the logical state
snapshots. Most of the modules are designed for deployment of databases,
which instruct the DBMS to create or drop a database and create snapshots
and restore snapshots using a database administration tool. Additionally, we
have implemented support for Subversion repositories which use a revision
number to distinguish between generations. Ejabberd uses a database for
176
Distribution phase:
Building components
Transferring component closures
Transition phase:
Snapshot and transfer mutable component states
Flipping Nix profiles
Request lock on services
Snapshot and transfer incremental mutable component states
Deactivating obsolete components and activating new components
Import full and incremental mutable component snapshots
Request unlock on services
Figure 9.5 Optimised Disnix deployment procedure with state transfer
storing user account settings, but has no support for incremental backups and
a timestamp is the only way to make a distinction between generations. We
have also implemented support for several other types of components, such
as web applications and generic UNIX processes. These components typically
do not change, but they still need to be activated in a container, which state
must be modified.
We have applied Dysnomia in conjunction with Disnix to the case studies
covered in Chapter 5, including the StaffTracker toy systems, ViewVC (http:
//viewvc.tigris.org) and SDS2 the industrial case study from Philips
Research. We have performed various redeployment scenarios, in which the
state of mutable components had to be captured and transferred. For the
Philips case study, redeployment took a long time in certain scenarios as the
size of a number of databases was quite large (the total size of the entire
system is 599 MiB). In worst case, a simple upgrade may take almost as much
time as a clean installation which is about 270.5 seconds if the data set is
unchanged
1
.
9.7 R E L AT E D W O R K
Many deployment tools support deployment of mutable state in some degree,
although they have no consensus about mutable components. Conventional
package managers such as RPM, modify state of parts of a system by using
1
Although this absolute number is still acceptable for nowadays’ standards, it is proportional
to the amount of data, which is not acceptable. The absolute number is not that high, because the
SDS2 data set is still relatively small.
Chapter 9. Deploying Mutable Components 177
maintainer scripts invoked during the installation of a package. The purpose
of maintainer scripts is to "glue" files from a package to files already residing
on the system. The disadvantage is that they imperatively modify parts of
the system, which cannot be trivially undone, and the operations depend on
the current state of the system. As a consequence, they may produce different
results on other machines. The Mancoosi project investigates how to provide
transactional upgrades of packages with maintainer scripts, by means of a
system model and DSL language to implement maintainer scripts [Cicchetti
et al., 2010]. By simulating whether an upgrade may succeed, they can be
made transactional. To perform rollbacks, inverse operations can be derived.
Also other model-driven deployment frameworks such as Cfengine [Burgess,
1995] (and derivatives) and the IBM Alpine project [Eilam et al., 2006] are ca-
pable of dealing with mutable state, although in an imperative and ad-hoc
manner.
An earlier attempt of managing mutable state with Nix has been done
in [den Breejen, 2008] in which a Copy-On-Write (COW) filesystem is used to
capture mutable state of containers in constant time. Although this is more
efficient, only the physical state can be captured, with all its disadvantages,
such as the fact that it may be inconsistent and non portable. Also, the state
of a container is captured as a whole, while it may also be desirable to treat
parts of the container state as separately deployable mutable components (e.g.
databases).
9.8 D I S C U S S I O N
Dysnomia provides a generic manner for deploying mutable components and
abstracts over physical and logical state. The major disadvantage of this ap-
proach is the use of the filesystem as an intermediate layer. For certain types of
mutable components this works acceptably. For large databases, a distributed
deployment scenario can be very costly, because it may take a long time to
write a snapshot to a disk, transfer the snapshot and to restore it again, even
for incremental dumps. Many modern database management systems have
replication features, which can perform these tasks more efficiently. However,
database replication is difficult to integrate in the Nix deployment model.
9.9 C O N C L U S I O N
In this chapter, we have analysed issues with deployment and upgrades of
mutable components in conjunction with Nix and we have described Dys-
nomia, an extension for deployment of mutable components. With Dysno-
mia, mutable components can be conveniently deployed in a generic manner.
A drawback is that redeployment may be very costly for certain component
types, such as very large databases.
To counter the high costs of creating snapshots and state transfers, LBFS
[Muthitacharoen et al., 2001] A Low-bandwidth Network File System may
178
be worth investigating. As a final note, Dysnomia has not yet been applied in
conjunction with NixOS and Charon.
Chapter 9. Deploying Mutable Components 179
Type Container Activation Deactivation Snapshot Restore Ordering
mysql-database MySQL Create database Drop database mysqldump mysql import Binlog sequence
postgresql-database PostgreSQL Create database Drop database pg_dump psql import Binlog sequence
subversion-repository Subversion Create repository Drop repository svnadmin dump svnadmin load SVN revision
ejabbard-database Ejabberd server Init database Drop database ejabberdctl backup ejabberdctl restore Timestamps
tomcat-webapplication Apache Tomcat Symlink directory Remove symlink - - -
apache-webapplication Apache HTTPD Symlink WAR file Remove symlink - - -
process init/Upstart Symlink job Remove symlink - - -
Table 9.1 A collection of Dysnomia types and their properties
180
10
Discovering License Constraints using
Dynamic Analysis of Build Processes
A B S T R A C T
With the current proliferation of open source software components, intellec-
tual property in general, and copyright law in particular, has become a critical
non-functional requirement for software systems. A key problem in license
compliance engineering is that the legal constraints on a product depend on
the licenses of all sources and other artifacts used to build it. The huge size of
typical dependency graphs makes it infeasible to determine these constraints
manually, while mistakes can expose software distributors to litigation. In
this chapter we show a generic method to reverse-engineer this information
from the build processes of software products by tracing system calls (e.g.,
open) to determine the composition graph of sources and binaries involved in
build processes. Results from an exploratory case study of seven open source
systems, which allowed us to discover a licensing problem in a widely used
open source package, suggest our method is highly effective.
10.1 I N T R O D U C T I O N
Today’s software developers often make use of reusable software components,
libraries, and frameworks. This reuse can be observed in diverse software
systems, from tiny to massive (e.g., LibreOffice), as well as in industrial, gov-
ernment, and academic systems. Reuse can greatly improve development
efficiency, and arguably improves quality of delivered systems.
With reuse of software the question under what license a binary should be
released becomes very important. If code is written entirely by a developer,
then the developer is free to choose whatever license he wants to. However,
when reusing existing third party components in a binary, the final license or
licenses (if any [Hemel et al., 2011; German and Hassan, 2009]) that govern
this binary is determined by the licenses under which these components were
released, plus the way these components are combined. Failing to comply
with license terms could end in costly lawsuits from copyright holders.
To correctly determine the license of an executable, we should know three
things:
How are source files combined into a final executable (e.g. static linking,
dynamic linking)?
What licenses govern the (re)use of source files that were used to build
the binary?
181
How can we derive the license of the resulting binary from the build
graph containing the source files?
Our goal in this chapter is to provide an answer for the first question. Iden-
tifying what goes into an executable is not straightforward. An executable is
created by using other executables (such as compiler or code generators), and
by transforming source code and other binary components (such as libraries)
into the executable.
The process of creating a binary is usually driven by a build tool, such as
make, cmake, ant, etc. and it is documented in a build recipe, such as a Makefile.
To complicate identifying the binary provenance of an executable, the build
process might also use a set of configuration parameters (“configure flags”) to
enable or disable specific features that will affect what is ultimately included
in the executable. Thus, the way in which a binary is built may change the licensing
constraints on that binary.
The necessary information cannot, in general, be derived by analysing
build recipes such as Makefiles, because they tend to have incomplete de-
pendency information. Therefore, we reverse-engineer the build process of the
binary by deriving its build dependency graph by tracing system calls made
by processes during the building of a binary. These system calls reveal which
files were read and written by each process, and thus allows a dependency
graph to be constructed.
The structure of this chapter is as follows. In Section 10.2, we motivate the
problem of determining licenses of binaries and give a real-world example of
a library whose license it determined by its build process. In Section 10.3, we
argue that static analysis of build processes is infeasible, and that therefore
we need a dynamic approach based on tracing of actions performed during
a build. We explain our system call tracing method in Section 10.4. To deter-
mine the feasibility of this method we applied it to a number of open source
packages (Section 10.5). This analysis revealed a licensing problem in FFmpeg,
a widely used open source component, which was subsequently confirmed
by its developers. We discuss threats to the validity of our approach in Sec-
tion 10.6. In Section 10.7 we cover related work and Section 10.8 concludes
this chapter.
10.2 A M O T I VAT I N G E X A M P L E
To motivate the need of knowing what goes into a binary we use as an exam-
ple FFmpeg, which is a library to record, convert and stream audio and video,
and has been used in both open source and proprietary video players. FFmpeg
has been frequently involved in legal disputes
1
.
Open Source components are available under various licenses, ranging
from simple permissive software licenses (BSD and MIT) allowing develop-
ers to include code in their products without publishing any source code,
to more strict licenses imposing complex requirements on derived works
1
See http://en.wikipedia.org/wiki/List_of_software_license_violations
182
(GPLv2, GPLv3, MPL). Unfortunately it is not always easy or even possible
to combine components that use different licenses. This problem is known as
license mismatch [German and Hassan, 2009]. Extra care needs to be taken
to make sure that license conditions are properly satisfied when assembling a
software product.
Incorrect use of open source software licenses opens developers up to liti-
gation by copyright holders. A failure to comply with the license terms could
void the license and make distribution of the code a copyright violation. In
Germany several cases about non-compliance with the license of the Linux
kernel (D-Link
2
, Skype
3
, Fortinet
4
) were taken successfully to court [ifrOSS,
2009]. In the US there have been similar lawsuits involving BusyBox [Software
Freedom Law Center, 2008].
The license of our motivating example differs depending on the exact files
the consuming applications depend on. As stated in FFmpeg’s documentation:
Most files in FFmpeg are under the GNU Lesser General Public License version
2.1 or later (LGPL v2.1+). [...] Some other files have MIT/X11/BSD-style
licenses. In combination the LGPL v2.1+ applies to FFmpeg.
Some optional parts of FFmpeg are licensed under the GNU General Public
License version 2 or later (GPL v2+). See the file COPYING.GPLv2 for details.
None of these parts are used by default, you have to explicitly pass --enable-gpl
to configure to activate them. In this case, FFmpeg’s license changes to GPL
v2+.
This licensing scheme creates a potential risk for developers reusing and
redistributing FFmpeg, who want to be absolutely sure there is no GPLv2+
code in the FFmpeg binary they link to. Evidence for this can be found in the
the FFmpeg forums
5
.
The reason why this is important is that if FFmpeg is licensed under the
LGPL, it allows it to be (dynamically) linked against code under any other
license (including proprietary code). But if it is licensed under the GPLv2+,
it would require any code that links to it to be licensed under a GPLv2- or
GPLv3-compatible license.
FFmpeg’s source code is divided into 1427 files under the LGPLv2.1+, 89
GPLv2+, 18 MIT/X11, 2 public domain, and 7 without a license. To do proper
legal diligence, an important question to answer is: if the library is built to
be licensed under the LGPLv2+ (i.e., without –enable-gpl), is there any file
under the GPLv2+ used during its creation (and potentially embedded into
the binary)? If there is, that could potentially turn the library GPLv2+.
Given the size of components such as FFmpeg, manual analysis of the entire
code is infeasible. Therefore, we need an automated method to assist in this
process.
2
2-6 O 224/06 (LG Frankfurt/Main)
3
7 O 5245/07 (LG München I)
4
21 O 7240/05 (LG München I)
5
http://ffmpeg.arrozcru.org/forum/viewtopic.php?f=8&t=821
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
183
patchelf: patchelf.o
g++ patchelf.o -o patchelf
patchelf.o: patchelf.cc
g++ -c patchelf.cc -o patchelf.o \
-DENABLE_FOO
install: patchelf
install patchelf /usr/bin/
Figure 10.1 A simple Makefile
10.3 G E N E R A L A P P R O A C H : T R A C I N G B U I L D P R O C E S S E S
As we saw in the previous section, it is important to be able to identify what
source files were involved in building a binary, and how they were composed
to form the binary.
10.3.1 Reverse-Engineering the Dependency Graph
Our goal therefore is to reverse-engineer the build dependency graph of a binary,
which should describe how the source files were combined to form the final
binary. Formally, the dependency graph is defined as G = (V, E), where
V = V
t
V
f
, the union of task nodes V
t
and file nodes V
f
. A task node v
t
V
t
denotes a step in the build process. (The granularity of a task depends on the
level of detail of the reverse-engineering method; they could be Make actions,
Unix processes, Ant task invocations, and so on.) It is defined as the ordered
tuple hid, . . . i, where id Ids is a symbol uniquely identifying the task, plus
auxiliary information such as the name of the program(s) involved in the task
or the name of the Makefile rule. A file node v
f
V
f
is a tuple hid, pathi,
where id Ids is a symbol uniquely identifying the file node or the special
value e, and path is the file’s path
6
.
So how do we reverse-engineer the dependency graph for an existing prod-
uct? It might seem that this is simply a matter of inspecting the build code
of the product, that is, the scripts and specification files responsible for au-
tomatically constructing the installable binary artifacts from the source code.
For instance, for a product built using Make [Feldman, 1979], we could just
study the Makefile. As a simple running example, consider PatchELF, a small
utility for manipulating Unix executables
7
. It consists of a single C++ source
file, patchelf.cc. Building this package installs a single binary in /usr/bin/patchelf.
Suppose that we want to find out what source files contributed to the build
of this binary. To do this, we might look at its Makefile, shown (simplified) in
Figure 10.1. If we only use the information included in the Makefile, follow-
ing the rules in reverse order from the binary patchelf, we conclude that patchelf
6
We explain in Section 10.4.3 why path does not uniquely identify file nodes and id is neces-
sary to disambiguate them.
7
http://nixos.org/patchelf.html
184
/usr/bin/patchelfpatchelf.cc g++
patchelf.o
g++
patchelf
install
Figure 10.2 Hypothetical dependency graph derived from an analysis of the Make-
file in Figure 10.1. Make actions are depicted as yellow rectangles and files as
ovals; input files are blue, and final binaries are in green; and temporarily created
files are in grey.
#ifdef ENABLE_FOO
#include <iostream>
#endif
int main() {
#ifdef ENABLE_FOO
std::cout << "Hello, world!" << std::endl;
#endif
return 0;
}
Figure 10.3 Example source code demonstrating an optional dependency
has a single source file: patchelf.cc. Figure 10.2 shows the dependency graph
resulting from this analysis of the Makefile.
10.3.2 Why We Cannot Use Static Analysis
In fact, the graph in Figure 10.2 is highly incomplete. Both manual inspection
and automatic static analysis of the build code are unlikely to produce correct,
complete dependency graphs. There are a number of reasons for this:
Makefiles are usually not complete
Makefiles do not list every dependency required. For instance, the file patchelf.cc
uses several header files, such as system header files from the C++ compiler
and C library, but also a header file elf.h included in the PatchELF source dis-
tribution (which was actually copied from the GNU C Library). Inspection
of the build files does not reveal these dependencies; it requires a (language-
dependent) analysis of the sources as well. Similarly, the linker implicitly
links against various standard libraries. Also note that Makefile rules do not
list all of their outputs: the install rule does not declare that it creates a file in
/usr/bin. Therefore a static analysis of the Makefile will miss many inputs and
outputs.
Build files are numerous, large, and hard to understand
Large projects can have thousands of build-related files, which exhibit a sig-
nificant amount of code churn [McIntosh et a l., 2011; de Jonge, 2005]. This
makes manual inspection infeasible; only automated methods are practical.
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
185
Dependencies might be the result of many other build processes
If the binary is created using files that are not part of the project itself (such
as external dependencies like libraries) this process has to repeated again (re-
cursively) to be able to determine which files were used to create such files.
Code generators might make it difficult to follow the build process
For example, a build might use a parser generator (such as yacc). Therefore
one might need to do static analysis of the source for files of the parser (*.y
files) and know the type of dependencies required by the generated code. This
is complicated if the source code has ad-hoc code-generators. In such case the
output of them would have to be analysed to determine what is needed to
build them.
Configuration options and compilation flags
Most build-configuration tools (Autoconf/Automake, cmake, qmake, etcetera),
allow customization of the build process by analyzing the local environment
where the system is being built: what libraries are installed, what is their loca-
tion, what features should be enabled and disabled. In some cases, this build-
configuration process might determine which of two alternative libraries will
be used (e.g. one is already installed in the target system). In essence, this
process would require the execution of the build configuration process to de-
termine the actual instance of the Makefile to be used. Such a Makefile might
further contain compilation flags that would require the dynamic analysis of
the creation of the final binary. This kind of conditionality is widely used
in open source systems, where the person building a product might enable
or disable various options, usually depending on which libraries are already
installed in the system where the product is expected to run.
For example, assume that the source code of patchelf.cc is as shown in Fig-
ure 10.3, and we attempt static analysis to determine any other dependencies
that it might have. However, the compiler flag -DENABLE_FOO indicates that
the compiler should enable the preprocessor define ENABLE_FOO, and there-
fore, any static analysis would require that this be taken into consideration.
This directive might include or exclude dependencies; in the example, en-
abling it causes a dependency on the header file iostream.
There are many build systems
Many different build systems are in use, such as Ant, Maven, countless vari-
ants of GNU Make, language-specific scriptable formalisms such as Python’s
Setuptools, and code generation layers above other build systems such as
the GNU Autotools and Perl’s MakeMaker. In practice, a static analysis tool
would be limited to a few of them at most; and build systems that use general-
purpose languages (such as Autoconf scripts or Python Setuptools) for speci-
fying build processes are practically impossible to analyse.
186
10.3.3 Dynamic Analysis through Tracing
Given all these issues, the only effective way to determine the source prove-
nance of a binary that we can hope for is to perform dynamic analysis of the
build process. That is, given concrete executions of the build systems that
produce the binary under scrutiny and its dependencies, we wish to instru-
ment those executions to discover the exact set of files that were used during
the build, and the manner in which they were used (e.g., used by a compiler,
a linker, or during the installation or configuration steps). The result of such
an instrumented build is a trace of files used to create the binary. This trace
can be annotated with licensing information of each of the nodes, and further
analysed to do license auditing of the binary.
Previous approaches to tracing have instrumented specific build tools such
as GCC and the Unix linker [Tuunanen et al., 2009] or used debug instrumen-
tation already present in GNU Make [Adams et al., 2007]. These methods are
limited to specific tools or require correct dependency information in Make-
files. In the next section, we present a generic method intended to work for
all tools used during a build.
10.4 M E T H O D : T R A C I N G T H E B U I L D P R O C E S S T H R O U G H
S Y S T E M C A L L S
A tracing of the build process seeks to discover the inputs and the outputs of
each step in a build process in order to produce a dependency graph. That
is, the instrumentation of the build process must log which files are read and
written by each build step, respectively. Therefore, if we can determine all
file system accesses made by a build process, we will be able to produce the
dependency graph of the binary being built.
Instead of instrumenting each program used during the build process, we
have decided to instrument the kernel. Any well-behave process run by the
kernel is expected to use system calls to open, read, write and close files. If we
can “listen” to these system calls we could determine, for any build process:
a) the programs being executed, b) the files being read, and c) the files being
generated.
10.4.1 Overview
In this section, we describe one implementation of this idea: namely, by log-
ging all system calls made during the build. System calls are calls made by
user processes to the operating system kernel; these include operations such
as creating or opening a file. Thus, logging these calls allows us to discover
the inputs and outputs of build steps at the granularity of operating system
processes or threads. Thus, in essence, we determine the data flow graph be-
tween the programs invoked during the build. A nice aspect of this approach
is that the necessary instrumentation is already present in most conventional
operating systems, such as Linux, FreeBSD, Mac OS X and Windows.
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
187
A major assumption this approach makes is that if a file is opened, it is
used to create the binary (either directly–it becomes part of the binary, such
as source code file) or indirectly (it is used to create the binary–such as con-
figuration file). Not all files opened will have an impact in the licensing of
the binary that is created, but we presume that because they might have an
impact, it is necessary to evaluate how each file is used, and if the specific use
has any licensing implications for the final binary.
Consider, for instance, tracing the system calls made during the execution
of the command make install with the Makefile shown in Figure 10.1. On Linux,
this can be done using the command strace
8
, which executes a program and
logs all its system calls. Thus, strace -f make install prints on standard error
a log of all system calls issued by make and its child processes. (The flag -f
causes strace to follow fork() system calls, i.e., to trace child processes.) Make
first spawns a process running gcc to build patchelf.o. Since this process creates
patchelf.o, it will at some point issue the system call
open("patchelf.o",
O_RDWR|O_CREAT|O_TRUNC, 0666)
Likewise, because it must read the source file and all included headers, the
trace will show system calls such as
open("patchelf.cc", O_RDONLY)
open("/usr/include/c++/4.5.1/string",
O_RDONLY|O_NOCTTY)
open("elf.h", O_RDONLY)
Continuing the trace of Make, we see that the linker opens patchelf.o for
reading (as well as libraries and object files such as /usr/lib/libc.so, /usr/lib/libc_-
nonshared.a and /usr/lib/crtn.o) and creates ./patchelf, while finally the install target
opens ./patchelf for reading and creates /usr/bin/patchelf. Thus we can derive a
dependency graph for the final, installed binary /usr/bin/patchelf. The nodes in
this graph are files and the processes that read, create or modify files. In the
graph for PatchELF, there will therefore be a path from the nodes patchelf.cc
and elf.h to /usr/bin/patchelf, revealing the dependencies of the latter.
Performing a trace on just one package reveals the direct dependencies of
that package, e.g., files such as patchelf.cc and /usr/lib/libc_nonshared.a. Since the
latter is a binary file that gets linked into the patchelf program, for the purposes
of license analysis, we would like to know what source files were involved
in building libc_nonshared.a. For example, if this static library contains a file
distributed under a non-permissive copyleft license, then this can have legal
consequences for the patchelf binary.
Therefore, we can perform the same trace analysis on external dependen-
cies of PatchELF, such as Glibc (the package that contains lib_nonshared.a). The
result will be a large dependency graph showing the relationships between
packages. For instance, in this graph, some source files of Glibc have paths to
/usr/lib/libc_nonshared.a, and from there, to /usr/bin/patchelf.
8
http://strace.sourceforge.net/
188
PatchELF
sources
PatchELF
trace
Straced
build
Dependency graph
(in SQLite database)
Trace
analyser
License
report
License
analyser
FFmpeg
sources
FFmpeg
trace
Straced
build
Trace
analyser
Glibc
sources
Glibc
trace
Straced
build
Trace
analyser
...
...
Straced
build
Trace
analyser
Figure 10.4 Overview of the system call tracing approach. Nodes denote artifacts,
and edges denote the tools that create them.
Figure 10.4 summarises our approach. For each project we’re interested in
and its dependencies, we perform a complete build of the package. This can
typically be done fully automatically by using package management systems
such as RPM [Foster-Johnson, 2003]; all we need to do is to trace the package
build with strace. A trace analyser is then applied to each trace to discover
the files read and written by each process. This information is stored in a
database, which, in essence, contains a dependency graph showing how each
file in the system was created. This dependency graph can then be used by
license analysis tools (which we do not develop in this chapter). For instance,
for a given binary, such a tool could use the dependency graph to identify the
sources that contributed to the binary, determine each source file’s license us-
ing a license extraction tool such as Ninka [German et al., 2010], and compute
the license on the binary.
We now discuss in detail how we produce a system call trace, how we
analyse the trace to produce the dependency graph, and how we can use the
dependency graph to answer queries about binaries.
10.4.2 Producing the Trace
Producing a system call trace is straight-forward: on Linux, we essentially just
run strace -f /path/to/builder, where builder is the script or program that builds the
package.
For performance reasons, we only log a subset of system calls. These are
the following:
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
189
All system calls that take path arguments, e.g., open(), rename() and ex-
ecve().
System calls related to process management. As we shall see below, it’s
necessary to know the relationships between processes (i.e. what the
parent process of a child is), so we need to trace forks. On Unix sys-
tems, forking is the fundamental operation for creating a new process:
it clones the current process. The child process can then run execve() to
load a different program within its address space; the underlying system
calls responsible for this are clone() and vfork().
10.4.3 Producing the Build Graph
The trace analyser reads the output of strace to reverse-engineer the build de-
pendency graph. This graph is defined as in Section 10.3.1, where each task
represents a process executed during the build. Task nodes are extended to
store information about processes: they are now a tuple hid, program, argsi,
where program is the path of the last program executed by the task (i.e., the
last binary loaded into the process’s address space by execve(), or the pro-
gram inherited from the parent) and args is the sequence of command-line
arguments.
The analyser constructs the graph G by reading the trace, line by line.
When a new process is created, a corresponding task node is added to V
t
.
When a process opens a file for reading, we add a file node v
f
to V
f
(if it
didn’t already exist), and an edge from v
f
to the task node. Likewise, when
a process opens a file for writing, we add a file node v
f
to V
f
, and an edge
from the task node to v
f
.
patchelf-0.5/bin/patchelf
patchelf-0.5.tar.bz2 tar
stripinstall
patchelf-0.5/bin/patchelf
ld
patchelf
as
patchelf.o
cc1plus
ccoPovgb.s
elf.h
patchelf.cc
gcc[...]/crtbegin.o
gcc[...]/crtend.o
gcc[...]/(...elided...)
linux-headers[...]/errno-base.h
linux-headers[...]/errno.h
linux-headers[...]/(...elided...)
glibc[...]/include/_G_config.h
glibc[...]/include/alloca.h
glibc[...]/(...elided...)
Figure 10.5 PatchELF dependency graph. External dependencies are depicted in
white.
There are some tricky aspects to system call traces that need to be taken
into account by the analyser:
System calls can use relative paths. These need to be absolutised with
respect to the current working directory (cwd) of the calling process. As
this requires us to know the cwd of each process, we need to process
chdir() calls. Since the cwd is inherited by child processes, the analyser
must also keep track of parent/child relations, and propagate the cwd of
the parent to the child when it encounters a fork operation. The function
190
absolutise(dir, path) returns the absolute representation of path relative to
dir.
System calls can be preempted, causing the return value of a system
call to be separated from its arguments. This is not usually a problem:
when we encounter an unfinished system call, we could remember the
arguments until we encounter the resumption, where we combine the
arguments with the return value and process the system call normally.
However, for forks, the child may start to execute before the parent has
returned from its fork() call. Thus, to propagate the cwd to the child,
we need to know the new process’s parent before processing any system
calls from the child. This is done in a preprocessing step (not shown)
by replacing any unfinished trace line with the combination of that line
and its subsequent resumption, and removing the latter.
Process IDs (PIDs) can wrap around; in fact, this is quite likely in large
packages because PIDs are by default limited to 32,768 in Linux. There-
fore the analyser needs to distinguish between different processes with
the same PID. For this reason, task IDs are not equal to PIDs; rather, a
unique task ID is generated in the graph every time a process creation
is encountered in the trace.
A special problem is the representation of files that are written mul-
tiple times, i.e., are created or opened for writing by several different
processes. This is quite common. For instance, it is usual in the make
install step to remove debug information from binaries by running the
strip command. This means that after linking, the binar y patchelf is first
copied from the temporary build directory to /usr/bin/patchelf by the com-
mand install, and then read and recreated by strip. Represented naively,
this gives the following graph:
install /usr/bin/patchelf strip
Such cycles in the graph make the data flow hard to follow, especially if
there are multiple processes that update a file: the order is impossible
to discern from the graph. Therefore, each modification of a file creates
a new node in the graph to represent the file, e.g.,
install /usr/bin/patchelf strip /usr/bin/patchelf
These nodes are disambiguated in the graph by tagging them with the
task ID that created them.
Note that this approach allows “dead” writes to a file to be removed
from the graph, analogous to the discovery of dead variables in data
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
191
flow analysis. For instance, in the build script echo foo > out; echo bar
> out, the second process recreates out without reading the file created
by the first process. Thus, there is no edge in the graph exiting the
first out node, and its task is not equal to the creator of the out node,
so it can be safely removed from the graph. This is an instance of the
notion developed in [Dolstra et al., 2004] that many concepts in memory
management in the programming languages domain can be carried over
to the domain of the storage of software on disk.
We are currently not interested in the dynamic library dependencies of
tools used during the build, because these generally don not have license
implications. For instance, every program executed during the build
opens the GNU C library, libc.so.6 (almost every program running under
Linux uses it), and therefore we are not interested in these. However,
we do care if the linker is the one opening a library when it’s creating a
binary (during a separate, later call to open().) So to simplify the graph,
we omit these dependencies. To detect calls to open() made due to dy-
namic linking, we use the following method: we observed that after the
dynamic linker has opened all shared libraries, it issues an arch_prctl()
system call. Thus, we ignore any opens done between a call to execve()
and arch_prctl().
Let us return to the PatchELF example described in Section 10.3. Building
this package has several automated steps beyond running make install. As with
most Unix packages, building the PatchELF package starts with unpacking
the source code distribution (a file patchelf-0.5.tar.bz2), running its configure
script, running make and finally running make install.
Figure 10.5 shows the dependency graph resulting from analysing the trace
of a build of this package. It is instructive to compare this dependency graph
with the “naive” one in Figure 10.2. The system call trace reveals that com-
piling patchelf.cc causes a dependency not only on elf.h, but also on numerous
header files from Glibc and the Linux kernel. Likewise, the linker brings in
several object files and libraries from Glibc and GCC. (Many header and li-
brary dependencies that are detected by system call tracing have been omitted
from the figure for space reasons.) The analysis also reveals that the compi-
lation of patchelf.cc is actually done by two processes, which communicate
through an intermediate temporary assembler file, ccoPovgb.s; and that the
installed executable is rewritten by strip.
In our implementation, the analyser stores the resulting graph in a SQLite
database. As sketched in Figure 10.4, other tools can use this database to
answer various queries about a binary. Multiple traces can be dumped into
the same database, creating a (possibly disconnected) graph consisting of the
output files of each package and the processes and files that created them.
This allows inter-package dependencies to be analysed.
192
10.4.4 Using the Build Graph
Once we have the full dependency graph for a binary, we can perform various
queries. The most obvious query is to ask for the list of source files that
contributed to a binary x V
f
. Answering this question is not entirely trivial.
It might seem that the sources are the files in the graph that contributed to
the binary and were not generated themselves, i.e., they are source vertices in
V
f
from which there is a path to x:
n
v
f
V
f
| deg
(v
f
) = 0 a path in G from v
f
to x
o
But as Figure 10.5 shows, this does not yield the files in which we’re in-
terested: the “source files” returned are source distributions such as patchelf-
0.5.tar.bz2, from which a tar task node produces files such as patchelf.cc. (Addi-
tional “sources” in this graph are the header files, object files and libraries in
Glibc, GCC and the kernel. However, if these packages are traced and added
to the dependency graph, then these files are no longer sources; instead, we
get more source distributions such as glibc-2.12.2.tar.bz2.) While this answer is
strictly speaking correct patchelf-0.5.tar.bz2 is the sole source of the package
it is not very precise: we want to identify specific files such as patchelf.cc.
To get the desired results, we simply transform the full dependency graph
into one from which task nodes such as tar and its dependencies have been
removed. For PatchELF, this yields a set
{patchelf.cc, elf.h, errno-base.h, errno.h, . . . }
On this set, we can apply Ninka to determine the licenses governing the
source files.
More interesting queries can take into account how files are composed. For
instance, given a binar y, we can ask for all the source files that are statically
linked into the binary. Here the graph traversal does not follow dynamic li-
brary inputs to ld nodes. It does follow other task nodes, so for instance bison
nodes are traversed into their .y sources.
10.4.5 Coarse-grained Processes
Because system call tracing identifies inputs and outputs at the granularity of
Unix processes, the main assumption underlying this approach to tracing is
that every process is part of at most one conceptual build step. However, this
is not always the case. Consider the command cp foo bar /usr/bin/. This com-
mand installs two files in /usr/bin. Conceptually, these are two build steps. But
since they are executed in a single process, we get the following dependency
graph:
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
193
foo
cp
/usr/bin/foo
/usr/bin/barbar
This is undesirable, because foo and its sources now appear as dependencies
of /usr/bin/bar, even if the two programs are other wise unrelated. We call these
processes coarse-grained.
In cases such as cp (and similar commands such as install), we can fix this
problem on an ad hoc basis by rewriting the graph as follows. We scan the
dependency graph for cp task nodes (and other tasks of which we know they
may be coarse-grained) that have more than one output, and each output file
has the same base name as an input file. We then replace this task node with
a set of task nodes, one for each corresponding pair of inputs and outputs,
e.g.
bar cp /usr/bin/bar
foo cp /usr/bin/foo
Note that both cp tasks in this graph correspond to the same process in the
trace. Since coarse-grained processes are a threat to the validity of our ap-
proach, we investigate how often they occur in the next section.
10.5 E VA L U AT I O N
The goal of this evaluation is to serve as a demonstration of the correctness of
our method, and second, its usefulness.
10.5.1 Correctness
We performed an experiment to determine how often our method produces
false negatives ( RQ1) and false positives (RQ2) on a number of open source
packages, including FFmpeg, patchelf and a number of other open source
packages, including Apache Xalan, a Java based XSLT library using the Apache
Ant build system. A false negative occurs when a file is a dependency of a bi-
nary (i.e., necessary to build the binary) but does not appear in the graph as
such. Conversely, a false positive occurs when a file appears as a dependency
of a binary in the graph, but is not actually used to build the binary.
To measure this in an objective manner, we used the following method.
First, for a given binary, we use our tracer to determine the dependency graph.
194
Package Files Files False False Recall Preci-
“unused” “used” nega- posi- sion
tives tives
patchelf 19 3 0 1 1.00 0.67
aterm 60 57 1 0 0.98 1.00
ffmpeg 986 1253 0 7 1.00 0.99
opkg 9 97 1 3 0.99 0.97
openssl 1180 847 0 5 1.00 0.99
bash 806 280 0 34 1.00 0.88
xalan 379 955 0 4 1.00 1.00
Table 10.1 RQ1 & RQ2: False negatives and positives
This gives us a list of source files from the package’s distribution that were
supposedly used in the build, and a list of source files that were not. Then:
To deter mine false negatives (RQ1), for each presumptively unused file,
we rebuild the package, where we delete the file after the configure step.
If the package then fails to build or gives different output, the file ap-
parently is used, so it is flagged as a false negative. If the package builds
the same, then it is a true negative.
To determine false positives (RQ2), for each assumed used file, we like-
wise rebuild the package after deleting the file. If the package still builds
correctly, then the file apparently is not used by the build, so it is a false
positive. Otherwise it is a true positive.
We used some optimisations in our experiment. For instance, for RQ1, we can
delete all unused files at the same time. If the package still builds correctly,
then there are no false negatives.
Table 10.1 shows the results of our experiment to determine false negatives
and positives. For each package in our evaluation, it shows the number of
apparently unused and used files (i.e. non-dependencies and dependencies),
the number of failed builds (i.e., false negatives), the number of successful
builds (i.e., false negatives), and the recall and precision.
As the table shows, the number of false negatives reported by this method
is very low, and in fact, none of them are really false negatives: the files are
used during the build, but not to build the binary. In the aterm package, we
had to include a source file called terms-dict, used by a test case which was not
compiled in the regular build process. Due to strictness of the Makefile an
error was raised. The package opkg requires a file called libopkg_test.c to build
a test case in the test suite, which was not present in the resulting executable.
The number of reported false positives is also low. Most of these false
positives occurred because a particular file was not actually used in the build
(e.g. a documentation file) and the build system did not explicitly check for
its presence (e.g. this could happen by wildcards in a build configuration
file). For example, for bash the number of false positives was higher than
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
195
the other packages, because it contains a number of documentation files and
gettext translation files, which did not cause an error in the build process if
they were removed.
One interesting mistaken false positive report came from patchelf. A false
positive was flagged because the package uses the elf.h header file, which is
also present in the GNU C Library, so the package compiles successfully even
if it is deleted from patchelf.
10.5.2 Usefulness
In [Germán et al., 2010] we analysed the Fedora 12 distribution to understand
how license auditing is performed. One of the challenges we faced was deter-
mining which files are used to create a binary. This is particularly important
in systems that include files under different licenses that do not allow their
composition (for example the same system contains a file under the BSD-4
clauses license and a file under the GPLv2, which cannot be legally combined
to form a binary).
From that study we have selected three projects with files that appear in-
compatible with the rest of the system:
FFmpeg is a library to record, convert and stream audio and video, which
we have introduced in Section 10.2. This package may be covered under
different licenses depending on which build time options are used.
cups is a printing system developed by Apple Inc. for Unix. The library
is licensed under a permissive license (the CUPS License). Version 1.4.6
contained 3 files (backend/*scsi*) under a old BSD-4 clauses license which
is not compatible with the CUPS License. For this system (and the next
two) we would like to know if these files are being used to build the
binaries of the product in a way that is in conflict with their license.
bash is a Unix command shell written for the GNU project. Recent ver-
sions have been licensed under the GPLv3+, but it contains a file under
the old BSD-4 clauses (examples/loadables/getconf.c). We analyzed version
4.1.
To determine the license of the files in the projects we used Ninka. We traced
the build of each system, and identified the files used. We then compared the
licenses of these files, to try to identify any licensing inconsistency.
9
FFmpeg
By default, FFmpeg is expected to create libraries and binary programs under
the LGPLv2+. However, we discovered that two files under the GPLv2+ were
being used during its compilation: libpostproc/postprocess.h and x86/gradfun.c.
The former is an include file and contains only constants. The latter, how-
ever, is source code, and it is used to create the library libavfilter. In other
9
Full-size renderings of graphs can be found at http://www.st.ewi.tudelft.nl/
~dolstra/icse-2012/, with problematic files showing as red nodes.
196
words, the library libavfilter is licensed under the LGPLv2.1+, but required
libavfilter/x86/gradfun.c, licensed under the GPLv2+. We manually inspected the
Makefiles, which were very complex. It was not trivial to determine if, how,
and why, it was included.
We therefore looked at the library, where we found the symbol correspond-
ing to the function located inside x86/gradfun.c, which gave us confidence that
our method was correct. With this information we emailed the developers
of FFmpeg. They responded within hours. The code in question was an op-
timized version of another function. A developer quickly prepared a patch
to disable its use, and the original authors of x86/gradfun.c were contacted to
arrange relicensing of the file under the LGPLv2.1+
10
. Three days later, this
file’s license was changed to GPLv2+
11
cups
The dependency graph of cups shows that the three files in question: backend/(scsi|scsi-
iris|scsi-linux).c were used to create a binary called /backend/scsi. Coincidentally
these files were recently removed in response to Bug #3509.
bash
The file in question, examples/loadables/getconf.c is not used while building the
library. This is consistent with the directory where it is located (examples).
Although the goal of this chapter is not to provide a system capable of cor-
rectly investigating license issues, we have shown that the information prov-
ided by the build graph tracer is sufficient to highlight potential licensing
inconsistencies. These inconsistencies could be solved automatically by devel-
oping a complementary license calculus built on top of the build tracer.
10.6 T H R E AT S T O VA L I D I T Y
In this section, we discuss threats to the validity of our work. It is worth noting
that apart from bugs in the kernel or in our analysis tool, system call tracing is
guaranteed to determine all files accessed during a build, and nothing more.
In that sense there is no risk of false positives or negatives. The real question,
however, is whether it can reconstruct a meaningful dependency graph: a
simple list of files read and written during a build is not necessarily useful.
For instance, a source file might be read during a build but not actually used
in the creation of (some of) the outputs.
Thus, with respect to external validity, as noted in Section 10.4.5, the main
threat is that coarse-grained processes (i.e. those that perform many or all
build steps in a single process) yield dependency graphs that do not accu-
rately portray the “logical” dependency graph. Thus, files may be falsely
reported as dependencies. This can in turn lead to false positives in tools that
use the graph; e.g., in a license analyser, it can cause false warnings of poten-
tial licensing problems. Note however that this is a conservative approach: it
10
http://web.archiveorange.com/archive/v/4q4BhMcVOjgXyfN3wyhB
11
http://ffmpeg.org/pipermail/ffmpeg-cvslog/2011-July/039224.html
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
197
may over-estimate the set of source files used to produce a binary, but that is
safer than accidentally omitting source files.
Therefore, our approach may not work well with coarse-grained build sys-
tems such as Ant, which tends to perform many build steps internally (al-
though Ant can also be instructed to fork the Java compiler, which circum-
vents this problem). In many cases, users can still get usable dependency
graphs by instructing the build tool to build a single target at a time; we then
get traces that only contain the files accessed for each particular target. How-
ever, in Unix based environments (where most open source components are
used) most build systems are process based, which can be analysed without
much trouble.
A further threat to external validity would be processes that access files
unnecessarily. (One might imagine a C compiler that does an open() on all
header files in its search path.) However, we have not encountered such cases.
Finally, system call tracing is not portable and differs per system. However,
most widely used operating systems support it in some variant.
With respect to the internal validity in our evaluation of RQ1 and RQ2,
where we used file deletion as an objective measure to verify if a file is or is
not a dependency, a limitation is that complex build processes can fail even
if files that are unnecessary to a specific target are missing. For instance, in a
large project, the build system might build a library in one directory that is
subsequently not used anywhere else.
Construct validity is threatened by limitations in our prototype that can
cause file accesses to be missed. Most importantly, the trace analyser does not
trace file descriptors and inter-process communication (e.g. through pipes)
yet. For instance, it fails to see that the task patch in the command cat foo.patch
| patch bar.c has a dependency on foo.patch; it only sees that patch reads and
recreates bar.c. Our prototype also lacks support for some uncommonly-used
file-related Linux system calls, such as the *at() family of calls. Implementing
the necessary support seems straight-forward.
The conclusion validity of our work is mainly limited by the limited size of
our evaluation. Although the size is limited, many open source packages use
common build tooling and very similar deployment processes, which should
not be too difficult to analyse with our tooling. The main challenge lies in sup-
porting uncommon open source packages, which deviate from these common
patterns.
10.7 R E L AT E D W O R K
Tu and Godfrey argue for a build-time architectural view in their founda-
tional paper [Tu and Godfrey, 2001]. They study three systems, Perl, JNI, and
GCC, to show how build systems can exhibit highly complex architectures.
In this chapter we use the same abstraction level, files and system processes,
to understand structural and compositional properties of built binaries. Our
dependency graphs are based on Tu and Godfrey’s modelling language for
describing build architectures, but we streamline the language to efficiently
198
communicate the information we are interested in. The systems we study
in this exploratory study are much smaller and simpler than those in Tu and
Godfrey, and in this respect, we contribute an interesting corollar y: even “triv-
ial" systems, e.g. patchelf, Figure 10.5, a single binary built from a single source,
can exhibit a surprisingly complex and nuanced build architecture. Complex
builds may be the rule, rather than the exception, even in small systems.
Tuunanen et al. used a tracing approach to discover the dependency graph
for C-based projects as part of ASLA (Automated Software License Analyzer)
[Tuunanen et al., 2009, 2006]. They modified the C compiler gcc, the linker ld
and the archive builder ar to log information about the actual inputs and out-
puts used during the build. The resulting dependency graph is used by ASLA
to compute license information for binaries and detect potential licensing con-
flicts. However, instrumenting specific tools has several limitations compared
to system call tracing. First, adding the required instrumentation code to pro-
grams could be time consuming (finding the “right” places to instrument the
code); second, many tools would need to be instrumented: compilers, linkers,
assemblers, etc. Finally, a package might contain or build a code generator
and use it to generate part of itself. Clearly, such tools cannot be instrumented
in advance.
Adams et al. analyse debug traces of GNU Make to reverse-engineer large
build systems [Adams et al., 2007]. Their study approaches the build from a
developer’s point of view, with an aim to support build-system maintenance,
refactoring, and comprehension. Adams et al. generate and analyse depen-
dency graphs from extracted trace information, similar to us. However, we
must emphasize the different aim of our study: Adams et al. reverse-engineer
the semantic source architecture of the build to help developers gain a global
picture. We reverse-engineer the logical binary architecture to help developers
and other stakeholders answer questions about composition and provenance
of binary artifacts. Adams et al. strive to understand the build system in and
of itself, whereas we exploit the build system to acquire specific information.
The main use case for this specific information is copyright analysis of
software systems. Disinterring the myriad ways by which “fixed works" of
original authorship are embedded, compiled, transformed, and blended into
other fixed works is crucial for understanding copyright [Rosen, 2004]. A
study by German et al. showed this information is important to owners and
resellers who need to understand the copyright of their systems [Germán
et al., 2010]. German et al. studied package meta-data in Fedora to find license
problems. In their study they showed that Fedora’s current package-depends-
on-package level of granularity, while helpful, is ultimately insufficient for
understanding license interactions in a large open source compilation such as
Fedora. Only a file-depends-on-file level of granularity can work, and this
study directly addresses that problem.
Build Audit [Build Audit, 2004] is a tool to extract and archive useful in-
formation from a build process by tracing system calls. Although it is capa-
ble of showing the invoked processes and involved files, it cannot show the
interaction between processes, which we need. Similarly, in [Attariyan and
Chapter 10. Discovering License Constraints using Dynamic Analysis of Build
Processes
199
Flinn, 2008; Coetzee et al., 2011] strace or ptrace techniques are used to de-
rive dependencies from system calls for parallelizing builds and to diagnose
configuration bugs.
10.8 C O N C L U S I O N
In this chapter we have proposed reverse-engineering the dependency graph
of binaries by tracing build processes specifically, we showed that we can
do so by tracing system calls issued during builds. Our proof-of-concept pro-
totype implementation suggests that system call tracing is a reliable method
to detect dependencies, though it may over-estimate the set of dependencies.
This is the essential foundation upon which we want to build a license analy-
sis tool.
Acknowledgements We wish to thank Tim Engelhardt of JBB Rechtsanwälte
for his assistance, the FFmpeg developers for their feedback, Karl Trygve
Kalleberg for many comments and discussions on license calculi, and Bram
Adams for pointing out related work.
200
Part V
Applications
201
11
Declarative Deployment of WebDSL
applications
A B S T R A C T
The tools belonging to the reference architecture for distributed software de-
ployment provide automated solutions for distributed service and infrastruc-
ture deployment tasks. In order to deploy a system designed for a particular
domain, developers or system administrators have to write a number of de-
ployment specifications each capturing a separate concern and they have to
utilise the right deployment tools in the reference architecture for the right
tasks. In order to be able to properly automate the entire deployment process,
it is required to have all the knowledge about implementation details of the
system, which is not always available. Furthermore, manually arranging the
deployment tools to execute the right deployment tasks is not always a trivial
task. In this chapter, we describe a high-level deployment tool for web appli-
cations written in the WebDSL language. WebDSL is a domain-specific lan-
guage for developing web applications with a rich data model on a high-level,
providing static consistency checking and hiding implementation details. As
with the WebDSL language, the webdsldeploy tool hides deployment details
and allows developers or system administrators to automatically and reliably
deploy WebDSL web applications in distributed environments for both pro-
duction use or for testing using high-level deployment specifications.
11.1 I N T R O D U C T I O N
In this thesis, we have described various tools solving various deployment
concerns, ranging from local components to distributed components on serv-
ice level and infrastructure level and we have seen that these tools support
various interesting non-fun ctional properties, such as reliability. In order to
fully automate the deployment process for systems in a particular domain,
these tools must be combined and the required deployment specifications
must be provided, which is not always a very trivial task.
WebDSL [Visser, 2008] was intended as a case study in domain-specific
language engineering developed at Delft University of Technology, consisting
of several declarative sub-languages each capturing a separate aspect, such as
a data model, page definitions and access control. The purpose of the domain-
specific language is to reduce boilerplate code, hide implementation details
and to provide static consistency checking. From a WebDSL program, a Java
web application is generated, which must be deployed in an environment
203
providing a Java EE application server, a MySQL database server and several
other important infrastructure components.
Although the WebDSL language provides a number of advantages during
development, deploying a WebDSL web application is still complicated, and
requires end-users to know implementation specific details. In order to de-
ploy a WebDSL application, the WebDSL compiler and its dependencies must
be installed, the Java web application must be generated and the required in-
frastructure components, including the MySQL database and Apache Tomcat
services must be installed and configured.
Apart from deploying a WebDSL application and their infrastructure com-
ponents on a single machine, it may also be desired to deploy them in a
distributed environment, for example, to support load balancing. As we have
already seen frequently in this thesis, distributed deployment processes are
even more complicated and also require more knowledge about implementa-
tion details.
In this chapter, we describe webdsldeploy, a domain-specific deployment tool
for WebDSL applications allowing developers or system administrators to de-
ploy WebDSL applications on single machines and distributed environments
for production use or testing using high-level specifications hiding implemen-
tation details.
The purpose of constructing this domain-specific tool is to explore whether
it is feasible to turn the reference architecture described in Chapter 3 into a
concrete-architecture (for which we use the web application domain as a case
study) and to see what patterns we can use.
11.2 W E B D S L
As explained earlier, WebDSL [Visser, 2008] was intended as a case study
in domain-specific language engineering enabling users to develop web ap-
plications with a rich data model from a high abstraction level, providing
additional static consistency checking and hiding implementation details.
The main motivation lies in the fact that web applications are often devel-
oped in different languages each addressing a specific concern. For instance,
in Java web applications, a data model is often described in SQL for the cre-
ation of tables, pages are written as JSP pages or Java Servlets (which itself
embed HTML), the style of a webpage is described in CSS and various dy-
namic elements of web pages are written in JavaScript.
Languages used to develop web applications are often loosely coupled; it is
very difficult to check a web application on errors during compile time. For
instance, a web application written in Java can be checked by the Java com-
piler, but often other languages are embedded inside Java code, such as SQL
code to communicate with the database backend. SQL statements are often
coded in strings, which itself is not checked by the Java compiler whether it is
valid SQL. This introduces various problems and security risks, such as SQL
injection attacks. Moreover, there is also a lot of boilerplate code and redundant
204
Figure 11.1 Stack of WebDSL sub languages
patterns that are used over and over again which significantly increases the
code size of web applications and therefore also the chance on errors.
WebDSL overcomes these issues by offering a collection of integrated do-
main specific languages. Figure 11.1 illustrates the various sub languages in
WebDSL, such as a sub language to describe access control [Groenewegen and
Visser, 2008] and workflow [Hemel et al., 2008]. High level sub languages are
translated into lower level sub languages and are then unified by the WebDSL
compiler into a single core language [Hemel et al., 2010]. Finally, the core lan-
guage is translated to a web application written in Java code, targetting Java
application servers or Python code, targetting the Google App Engine
1
.
Figure 11.1 has been taken from the WebWorkflow paper [Hemel et al.,
2008], which includes more details on the sub languages and their transla-
tions.
11.3 E X A M P L E S O F U S I N G T H E W E B D S L L A N G U A G E
As we have explained, WebDSL is a domain-specific language composed of
various sub languages each adressing a specific concern of a web application.
In this section, we show a small number of examples.
Chapter 11. Declarative Deployment of WebDSL applications 205
entity Author { 136
name :: String 137
address :: String
city :: String
}
entity Paper {
title :: String
date :: Date
author -> Author 138
}
Figure 11.2 An example WebDSL Data Model
define page authors() {
header { "All authors" }
list() {
for(a : Author) {
listitem { output(a) }
}
}
}
Figure 11.3 An example WebDSL page definition
11.3.1 Data Model
One of the aspects that WebDSL facilitates, is the storage of data and infor-
mation how to present these data entities. A WebDSL data model consists of
entity definitions which declare entity types. Entities are mapped on tables
in a relational database by using a Object Relational Mapping (ORM) imple-
mentation, such as Hibernate [The JBoss Visual Design Team, 2012] used in
the Java back-end.
Figure 11.2 illustrates an example of a WebDSL Data Model in which two
entities are defined an author that writes several papers and a paper which
is written by an author:
136 This entity statement declares the Author entity.
137
Entities consist of properties with a name and type. Types can be a value
type, indicated by ::, such as a String, Int or a Date or a more complex type,
such as WikiText and Image.
138 Types can also be associations to other entities in the data model. In this
example, we have defined that a paper is written by a single author.
11.3.2 Pages
Apart from specifying the structure of data entities, it is also desired to define
pages which display contents and present data to end-users.
1
There have been several experi ments in targetting different platforms using WebDSL, but
the Java back-end is the only one that is mature.
206
Figure 11.4 WebDSL pages rendered in a browser
define page editPaper(p : Paper) { 139
section {
header { "Edit Paper: " output(p.title) }
form {
table {
row { column {"Title:"} column { input(p.title) } }
140
row { column {"Date:"} column { input(p.date) } }
row { column {"Author:"} column { input(p.author) } }
}
submit save() { "Save" } 141
}
}
action save() { 142
p.save();
return viewPaper(p);
}
}
Figure 11.5 An example of a WebDSL action inside a form
Figure 11.3 shows a simple example of a page definition, in which we show
a list of authors to the end-user. The output element uses the data type of the
object (defined in the data model) to determine what HTML elements are
needed to display the data element. The left image in Figure 11.4 shows how
the page may be rendered.
11.3.3 Actions
Apart from displaying information, it may also be desired to implement oper-
ations that modify information, as shown in Figure 11.5. These operations are
defined as actions described in an imperative sub language. The right image
in Figure 11.4 shows how the page may be rendered:
139 The code fragment defines a page, which purpose is to edit attributes of
a paper.
Chapter 11. Declarative Deployment of WebDSL applications 207
140 input elements are used to connect data from variables to entity elements.
141 Actions can be embedded in pages with a form block and attached to a
button or a link. In this case, we define a submit button with the “Save”
label, attached to a save() action.
142 This code block defines the save() action, which stores the values in the
form in the database and redirects the user to a page that views the
paper.
11.4 D E P L O Y I N G W E B D S L A P P L I C AT I O N S
In order to deploy a WebDSL application, we must install the WebDSL com-
piler and its dependencies, then generate and build a Java web application
from the WebDSL application and finally deploy the infrastructure compo-
nents required to run a WebDSL application.
11.4.1 Deploying the WebDSL compiler
The WebDSL compiler is implemented by using the Syntax Definition Formal-
ism (SDF2) [Visser, 1997], the Stratego program transforming language [Bra-
venboer et al., 2008], Java-front [Bravenboer and Visser, 2004] which can be
used in conjunction with Stratego to generate or transform Java code and
the ATerm (Annotated Terms) library [Van den Brand et al., 2000], used to
exchange tree-like data structures in a standard and efficient manner. De-
ploying the WebDSL compiler can be done by various means. It can be de-
ployed as a stand-alone command-line compiler tool as well as part of the
WebDSL Eclipse plugin, which provides an integrated IDE for WebDSL using
Spoofax [Kats and Visser, 2010]. Typically, in order to use it properly with
tools in our reference architecture, the command-line version must be used
and all the components must be implemented as Nix packages and deployed
from source code. The WebDSL compiler and their dependencies can be built
using the standard autotools [MacKenzie, 2006] build procedure, i.e. ./config-
ure; make; make install in a straight forward manner.
11.4.2 Building a WebDSL application
In order to be able to use a WebDSL application, it must be compiled to a
Java web application. First the WebDSL application must be configured to
contain deployment settings in a configuration file called application.ini, which
contains key = value pairs describing various deployment properties, such
as the JNDI resource identifier of the database connection and the address of
the SMTP server. Apart from generating Java code from WebDSL application
code, the Java code must also be compiled and packaged in a Web Application
Archive (WAR). In order to compile and package a Java web application, the
208
Java Development Kit must be installed as well as Apache Ant
2
, used to run
the Java build processes.
11.4.3 Deploying WebDSL infrastructure components
After the Java web application has been built, it must be deployed in an en-
vironment providing the required infrastructure components. In order to run
a WebDSL application, we must install the Apache Tomcat application server,
the MySQL database server and an SMTP server, if the e-mail functionality is
used. For efficiency, a native Apache web server may be used to serve static
content of web applications. Furthermore, in order to allow an application to
have better performance, several configuration settings must be adapted, such
as the maximum heap memory usage.
The infrastructure components can be deployed on a single machine, but
may also be distributed across multiple machines. For example, the database
server may be deployed on a different machine as the Apache Tomcat web ap-
plication. Also multiple redundant instances of the application and database
servers may be deployed to provide load balancing.
11.5 W E B D S L D E P L O Y
To provide automatic, reliable and reproducible deployment of WebDSL ap-
plications from a high-level deployment specification, we have developed
webdsldeploy, a Nix function that produces a NixOS logical network config-
uration from a WebDSL deployment specification. As we have described in
Chapter 6, the NixOS network specification is used in conjunction with a
physical network specification to deploy a network of NixOS machines and
can also be used to generate a network of virtual machines, which can be
cheaply instantiated for testing, as we have described in Chapter 7. In this sec-
tion, we show a number of examples ranging from very simple specifications
for single machines to more complicated expressions specifying properties of
the underlying distributed infrastructure.
11.5.1 Single machine deployment
Figure 11.6 shows a simple use case of the WebDSL deployment function,
which purpose is to deploy two WebDSL applications, including all their de-
pendencies and infrastructure components on a single machine. This is the
most common use case of the WebDSL deployment function.
143 This import statement invokes the WebDSL build function, which gener-
ates a logical network specification.
144
The applications parameter is a list of attribute sets specifying the Web-
DSL applications that must be deployed. In our example, we deploy
2
http://ant.apache.org
Chapter 11. Declarative Deployment of WebDSL applications 209
import <webdsldeploy/generate-network.nix> { 143
applications = [
144
{ name = "yellowgrass";
src = /home/sander/yellowgrass;
rootapp = true;
}
{ name = "webdslorg";
src = /home/sander/webdslorg;
}
];
databasePassword = "admin"; 145
adminAddr = "s.vanderburg@tudelft.nl"; 146
}
Figure 11.6 single.nix: A WebDSL deployment scenario on a single machine
two WebDSL applications named Yellowgrass and webdsl.org. Each at-
tribute set takes a number of common attributes. The name specifies
the name of the application, the src attribute specifies where the source
code of the WebDSL application can be found (which can refer to a loca-
tion on the filesystem, or bound to a function, such as fetchsvn to obtain
it from a Subversion repository). The rootapp is an optional attribute,
which defaults to false. If it is set to true then the resulting web appli-
cation is called ROOT and can be accessed from the root directory of
particular virtual host configured on Apache Tomcat. There can be only
one ROOT application per virtual host.
145 WebDSL applications require a password to connect to a database. The
databasePassword parameter sets the password.
146 The webserver proxies require an adminstrator e-mail address, which
can be configured through setting the adminAddr parameter.
11.5.2 Distributing infrastructure components
Figure 11.7 shows a more complicated deployment expression, in which the
infrastructure components are divided across two machines:
147 If the distribution parameter is specified, then the infrastructure compo-
nents can be divided across machines in the network.
148 The machine test1 is configured to host the Apache Tomcat and Apache
webserver proxies.
149 The machine test2 is configured to host the MySQL database server.
11.5.3 Implementing load-balancing
Figure 11.8 shows an even more complicated example with three machines,
in which the first one serves as a reverse proxy server redirecting incoming
210
import <webdsldeploy/generate-network.nix> {
applications = [
{ name = "yellowgrass";
src = /home/sander/yellowgrass;
rootapp = true;
}
];
databasePassword = "admin";
adminAddr = "s.vanderburg@tudelft.nl";
distribution = { 147
test1 = { 148
tomcat = true;
httpd = true;
};
test2 = {
149
mysql = true;
};
}
}
Figure 11.7 distr ibuted.nix: A distributed WebDSL deployment scenario in which the
database is deployed on a separate machine
import <webdsldeploy/generate-network.nix> {
applications = [
{ name = "yellowgrass";
src = /home/sander/yellowgrass;
rootapp = true;
}
];
databasePassword = "admin";
adminAddr = "s.vanderburg@tudelft.nl";
distribution = {
test0 = {
150
proxy = true;
};
test1 = { 151
tomcat = true;
httpd = true;
mysqlMaster = true;
};
test2 = { 152
tomcat = true;
httpd = true;
mysqlSlave = 2;
};
}
}
Figure 11.8 loadbalancing.nix: A WebDSL deployment scenario in which multiple
application servers and database servers are used
calls to the other web servers. The other two machines are both redundant
application, web and database servers:
Chapter 11. Declarative Deployment of WebDSL applications 211
150 The machine test0 is configured to run a reverse proxy server and serves
as the entry point to end-users. The reverse proxy server forwards in-
coming connections to machines hosting a webserver (httpd component)
using the round-robin scheduling method. In our example, connections
are forwarded to test1 or test2.
151 The machine test1 is configured to run an Apache Tomcat server and an
Apache webserver proxy. Furthermore, it hosts a MySQL master server,
to which all database write actions are redirected.
152 The machine test2 is configured to use the same infrastructure compo-
nents as the test1 machine. Instead of a MySQL master server, it hosts
a MySQL read slave server, which is regularly synchronised with the
master server. Furthermore, database read operations from a WebDSL
application are forwarded to any of the MySQL servers. Each MySQL
read slave requires a unique id, which starts with 2 (1 is assigned to the
master server).
11.5.4 Usage
There are two ways to use the WebDSL deployment function. The first use
case is to build a virtual network of machines, which can be used to easily
test a WebDSL application in an environment closely resembling the produc-
tion environment. The following example generates and runs a single virtual
machine defined in Figure 11.6:
$ nixos-build-vms single.nix
$ ./result/bin/nixos-run-vms
Another use case is to use charon and a physical network specification to
automatically deploy a WebDSL application in a network of machines. The
following example deploys the load balancing example, shown in Figure 11.8,
in a network of Amazon EC2 virtual machines:
$ charon create loadbalancing.nix ec2.nix
$ charon deploy
11.6 I M P L E M E N TAT I O N
In order to generate a logical network specification from a WebDSL specifica-
tion, we must derive how to build and configure a WebDSL application and
the NixOS configuration settings for each machine in the network.
11.6.1 Building and configuring a WebDSL application
Figure 11.9 shows how the WebDSL application build function is implemented:
153 This expression is essentially a nested function. The outer function is
used to compose the inner-function by providing the required WebDSL
212
{stdenv, webdsl, apacheAnt}: 153
{ name, src, dbserver, dbname, dbuser, dbpassword, dbmode ? "update"
, rootapp ? false, replication ? false }: 154
let
war = stdenv.mkDerivation { 155
inherit name src;
buildInputs = [ webdsl apacheAnt ];
buildPhase = ’’ 156
( echo "appname=${name}"
echo "db=jndi"
echo "dbmode=
${dbmode}"
echo "dbjndipath=java:/comp/env/jdbc/${dbname}"
echo "indexdir=/var/tomcat/temp/indexes/${name}"
echo "rootapp=${if rootapp then "true" else "false"}"
) > application.ini
webdsl war
’’;
installPhase = ’’ 157
mkdir -p $out/webapps
cp .servletapp/
*
.war $out/webapps
’’;
};
driverClassName = if replication
then "com.mysql.jdbc.ReplicationDriver" else "com.mysql.jdbc.Driver"; 158
replicationParameter = if replication
then "&amp;roundRobinLoadBalance=true"
else "";
159
in
stdenv.mkDerivation { 160
inherit name;
buildCommand = ’’
mkdir -p $out/conf/Catalina
cat > $out/conf/Catalina/
${if rootapp then "ROOT" else name}.xml <<EOF
<Context>
<Resource name="jdbc/${dbname}" auth="Container" type="javax.sql.DataSource"
driverClassName="${driverClassName}"
username="${dbuser}" password="${dbpassword}"
url="jdbc:mysql://${dbserver}:3306/${dbname}?${replicationParameter}" />
</Context>
EOF
mkdir -p $out/webapps
ln -s
${war}/webapps/
*
.war $out/webapps
’’;
}
Figure 11.9 webdslbuild.nix: Nix function for building a WebDSL application
build-time dependencies, such as the standard UNIX environment, the
WebDSL compiler and Apache Ant. Typically, this expression is first
composed by calling it with the right arguments, as we have seen earlier
in Section 2.2.3.
154 The inner function defines the actual build function which should be
invoked by users to build WebDSL applications. The inner function
header takes a number of arguments. The name argument corresponds
to the name of the WebDSL application, the src arguments refers to the
Chapter 11. Declarative Deployment of WebDSL applications 213
location of the source code of the WebDSL application. The inner func-
tion also takes a number of database related function arguments, the
rootapp argument defining whether the application should be accessible
from the root path in a virtual host and the replication parameter indicat-
ing whether a master-slave MySQL replication infrastructure should be
used.
155 This local attribute is bound to a build result that produces a Web Ap-
plication Archive (WAR) from WebDSL application source code.
156 The build phase procedure generates an application.ini file from the func-
tion arguments defined at 154 . Then it invokes the webdsl war command-
line instruction, which generates Java code, compiles the Java code and
assembles a web application archive.
157 The install phase installs the generated WAR file in the webapps/ direc-
tory of the resulting Nix package.
158 The local attribute driverClassName specifies the name of the JDBC driver
that the generated WebDSL application should use. If no master-slave
configuration is used then the ordinary com.mysql.jdbc.Driver is used. If
the replication option is enabled then the more advanced
com.mysql.jdbc.ReplicationDriver is used.
159 If the master-slave replication feature is enabled the JDBC connection
must be configured to forward read operations to the read slaves using
the round-robin scheduling method.
160 In the actual body of the function, we generate a wrapper component
consisting of a symlink to the WAR file we have generated at 155 and a
context XML configuration file. As we have seen earlier in Chapter 4,
this configuration file is used by Apache Tomcat, to configure various
application resources, such as database connections. In our example,
we generate a JDBC resource for the settings provided by the function
arguments, including the authentication and replication settings.
11.6.2 Generating a logical network specification
Figure 11.10 contains a simplified code fragment written in the Nix expression
language, showing how the network generator is implemented:
161 The code fragment is a function taking several arguments. The pkgs
arguments refers to the Nixpkgs collection and imports the default col-
lection if it is not specified. The applications parameter is as we have
seen earlier in Figure 11.6 a list of attribute sets defining several prop-
erties of WebDSL applications. The distribution parameter specifies as
we have seen earlier in Figure 11.7 which infrastructure components
should be deployed to which machine. If the parameter is omitted, it
assigns all the infrastructure components to a single machine called test.
214
{ pkgs ? import <nixpkgs> {}, applications
, distribution ? { test = { tomcat = true; httpd = true; mysql = true; ...}}
, databasePassword, adminAddr
}: 161
with pkgs.lib;
let
webdslBuild = import <webdsldeploy/webdslbuild.nix> { 162
inherit (pkgs) stdenv webdsl apacheAnt;
};
webapps = map (application: 163
webdslBuild {
inherit (application) name src;
rootapp = if application ? rootapp then application.rootapp else false;
dbname = if application ? databaseName
then application.databaseName
else application.name;
...
}
) applications;
in
mapAttrs (targetName: options:
164
{pkgs, ...}:
{
require = [ <webdsldeploy/modules/webdsltomcat.nix> ... ]; 165
} //
optionalAttrs (options ? tomcat && options.tomcat) { 166
webdsltomcat.enable = true;
webdsltomcat.webapps = webapps;
...
} //
optionalAttrs (options ? mysql && options.mysql) {
167
webdslmysql.enable = true;
webdslmysql.databasePassword = databasePassword;
...
} //
optionalAttrs (options ? httpd && options.httpd) {
168
webdslhttpd.enable = true;
webdslhttpd.adminAddr = adminAddr;
...
} //
...
) distribution
Figure 11.10 generate-network.nix: Simplified structure of the WebDSL deployment
function
162 Here the webdslBuild attribute is bound to a statement that imports and
composes the webdslBuild function, which we have shown earlier in Fig-
ure 11.9.
163 Here, we iterate over every application and we invoke the webdslBuild
function, with the required parameters. The result of the map operation
is a list of derivations that produce the WAR files.
164 Here, we iterate over all attributes in the distribution function parameter.
For each attribute, we generate a NixOS configuration that capture all
Chapter 11. Declarative Deployment of WebDSL applications 215
required infrastructure properties. The result is a logical network speci-
fication as we have shown in Chapter 6.
165 The require attribute is used by the NixOS module system to include an
aribitrary NixOS modules. Here, we import all the custom NixOS mod-
ules that capture WebDSL specific aspects, such as webdsltomcat, webd-
slmysql, and webdslhttpd.
166 This section sets the Apache Tomcat properties of the WebDSL environ-
ment if the tomcat attribute of the distribution element is defined and set
to true. Later in Figure 11.11, we show how the related NixOS module is
defined.
167 This section sets the MySQL properties of the WebDSL environment if
the mysql attribute of the distribution element is defined and set to true.
168 This section sets the Apache webserver properties of the WebDSL envi-
ronment if the httpd attribute of the distribution element is defined and
set to true.
The actual implementation of the function is a bit more complicated and
solves many more problems. For example, to properly support MySQL master-
slave replication, we have to generate a connection string that consists of the
hostname of the master server, followed by the slave hostnames each sepa-
rated by a comma. In order to generate such a string, we have to search for
the right attributes in the distribution. A similar approach is required to config-
ure the reverse proxy server to support web server load balancing.
11.6.3 Configuring WebDSL infrastructure aspects
The last missing piece in the puzzle is how various infrastructure aspects
are configured. In Figure 11.10 we configure properties of various WebDSL
NixOS modules. Figure 11.11 shows the NixOS module that captures Apache
Tomcat specific aspects:
169 This section defines all the options of the Apache Tomcat that hosts com-
piled WebDSL applications. As we have seen in Figure 11.10, we have to
decide whether to enable the module and what compiled WebDSL web
applications it should host.
170 This section defines the relevant Apache Tomcat NixOS configuration
settings (Apache Tomcat is a module included with NixOS), if the mod-
ule is enabled. Here, we enable the Apache Tomcat NixOS module and
we set the web applications that it should host. The WebDSL NixOS
module properties are identical to the ones of the Apache Tomcat NixOS
module.
171 This section defines a collection of libraries that are shared between the
Servlet container and the web applications. As mentioned earlier, all
216
{config, pkgs, ...}:
with pkgs.lib;
let
cfg = config.webdsltomcat;
in {
options = { 169
webdsltomcat = {
enable = mkOption {
description = "Whether to enable the WebDSL tomcat configuration";
default = false;
};
webapps = mkOption {
description = "WebDSL WAR files to host";
default = [];
};
};
};
config = mkIf cfg.enable {
170
services.tomcat.enable = true;
services.tomcat.webapps = cfg.webapps;
services.tomcat.commonLibs =
[ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ]; 171
};
}
Figure 11.11 webdsltomcat.nix: NixOS module capturing WebDSL related Apache
Tomcat configuration aspects
WebDSL applications require a JDBC connection to a MySQL database.
Therefore, we have to include the MySQL JDBC driver.
11.7 E X P E R I E N C E
We have used the WebDSL deployment function for a number of WebDSL
applications, such as Yellowgrass
3
, webdsl.org
4
and researchr
5
. We were able
to deploy complete WebDSL environments in physical networks, VirtualBox
virtual machines, Amazon EC2 virtual machines and QEMU virtual machines
(with the NixOS test driver). We can use the webdsldeploy function both with
charon to actually deploy machines and with the NixOS test driver to run them
non-interactively and to automatically perform testcases.
The webdsldeploy function also supports a number of interesting upgrade
operations. For example, by adding or removing a redundant MySQL slave,
application server or web server components to machines in the distribution, a
WebDSL application can be scaled up or scaled down, with almost no down-
time (because most operations are performed by Nix do not affect running
systems, as we have explained in many previous chapters).
3
http://yellowgrass.org
4
http://webdsl.org
5
http://researchr.org
Chapter 11. Declarative Deployment of WebDSL applications 217
11.8 D I S C U S S I O N
In this chapter, we have constructed a domain-specific deployment tool us-
ing domain-specific deployment expressions. We can transform a domain-
specific deployment expression into a logical network specification, by using
a number of Nix utility functions and the NixOS module system to capture
infrastructure component properties relevant to the domain.
Although the webdsldeploy function provides several interesting features, it
also has a number of limitations. The major limitation is that it is not capable
of migrating state. For example, if a user or developer moves a database
from one machine to another, then the database contents is not automatically
migrated. Although we have developed a solution for state in Chapter 9,
the cost of supporting mutable components may be high and require further
investigation. The size of the researchr database is (at the time writing this
thesis) 60 GiB, which is too costly to write entirely to the filesystem for each
database change. As a practical issue, Dysnomia has not been integrated in
NixOS yet.
WebDSL data models may also evolve. In some cases, the underlying
database may no longer be compatible with newer versions of a WebDSL
applications. To deal with evolving data models, a separate prototype tool
has been developed, called Acoda [Vermolen et al., 2011]. In Acoda, a devel-
oper can specify the evolution of the original data model in a domain-specific
sub language (which is intended to be integrated in WebDSL) and migrate
the data to the new data model automatically. However, Acoda has not been
integrated in the webdsldeploy function yet. Furthermore, in order to support
Acoda properly, supporting state migrations is an essential precondition.
The webdsldeploy function only uses the infrastructure deployment tools
from the reference architecture for distributed software deployment. It is also
possible to utilise service deployment aspects, for example, to use Disnix to
deploy the databases and web applications. However, the application archi-
tecture of a WebDSL application is relatively simple (it only consists of a web
application and database) and therefore it is not really beneficial to imple-
ment a service deployment layer. However, there is also ongoing work of
implementing web ser vice support in WebDSL, which may significantly in-
crease the deployment complexity of WebDSL applications. In the future, we
may extend webdsldeploy to utilise Disnix to deploy all service components.
11.9 C O N C L U S I O N
In this chapter, we have described webdsldeploy, a domain-specific deployment
function for WebDSL applications. This function uses a high-level deployment
expression from which a logical network specification is derived, which can
be used by charon to deploy a network of machines and by the NixOS test
driver, to launch a network of efficient virtual machines. A major limitation
of the tool is that state migration is not supported. As a consequence, for
some types of upgrades the database contents is not migrated.
218
12
Supporting Impure Platforms: A .NET Case
Study
A B S T R A C T
The tools in our reference architecture offer various benefits, such as relia-
bility and reproducibility guarantees, which are not always trivial to achieve
with conventional deployment tools. However, in order to use any of these
tools properly, we have made several assumptions on the system that is being
deployed, such as the fact that every component can be properly isolated. In
practice, we may want to use these tools to deploy large existing codebases,
which were typically deployed by other means and in impure environments.
In these situations, adopting Nix or any Nix application is a complicated pro-
cess with a number of technical challenges. In this chapter, we explain how
we have implemented deployment support in Nix and Disnix for .NET soft-
ware on Microsoft Windows platforms, so that we can automate deployment
processes of service-oriented systems utilising .NET technology. We have con-
ducted these experiments at Philips Healthcare.
12.1 I N T R O D U C T I O N
In the previous chapters, we have described the components belonging to the
reference architecture for distributed software deployment and we have ex-
plained that they offer a number of distinct advantages compared to conven-
tional deployment tools. Moreover, we have shown how we can turn the refer-
ence architecture for distributed software deployment into a domain-specific
deployment tool for WebDSL applications.
It may also be possible that we want to apply any of these tools on large
code bases, which have been previously deployed by other means, in impure
environments and possibly in an ad-hoc manner. In these situations, adopting
Nix or any Nix application to automate the entire deployment process is a
complicated process as certain facilities must be implemented and a number
of technical challenges must be overcome, such as changes in the architecture
of the system that must be deployed.
In our research project, we have collaborated with Philips Healthcare, who
develop and maintain a medical imaging platform implemented using .NET
technology on the Microsoft Windows operating system. In Microsoft Win-
dows environments, many software packages are proprietary and typically
deployed by using tools implementing imperative deployment concepts. These
packages can often not be isolated and they have implicit assumptions on the
219
environment (such as certain locations on the file systems), for which we have
to implement a number of workarounds.
In this chapter, we describe how we have implemented .NET deployment
support in Nix and Disnix
1
.
12.2 M O T I VAT I O N
There are a number of reasons why we want to implement .NET deployment
support in Nix. One of the reasons is that installing or upgrading .NET ap-
plications offers the same deployment benefits that Nix has, such as being
able to store multiple versions/variants next to each other, dependency com-
pleteness, atomic upgrades and rollbacks and a garbage collector which safely
removes components no longer in use.
Another reason is that we can utilise the applications built around Nix,
such as Hydra, the continuous build and integration server, to build and test
.NET applications in various environments including environmental depen-
dencies and Disnix, to manage the deployment of a service-oriented applica-
tions and web applications developed using .NET technology in a network
machines. Finally, Nix and related tools are designed as generic tools (i.e.
not developed for a particular component technology). Being able to support
.NET applications is a useful addition. In this chapter, we investigate what
means are necessary to support automatic deployment using tools in our ref-
erence architecture.
12.3 G L O B A L A S S E M B LY C A C H E ( G A C )
Traditional .NET applications reference library assemblies from a central cache
called the Global Assembly Cache (GAC) [Microsoft, 2010a] designed as a so-
lution to counter the well-known DLL hell [Pfeiffer, 1998]. Its purpose is
comparable with the Nix store. Although the GAC solves several common
deployment issues with Windows applications, it has a number of drawbacks
compared to the Nix store:
It only provides isolation for library assemblies. Other components such
as executables, compilers, configuration files, or native libraries are not
supported.
A library assembly must have a strong name, which gives a library a
unique name. A strong name is composed of several attributes, such as
a name, version number and culture. Furthermore, the library assembly
is signed with a public/private key pair.
Creating a strong-named assembly is in many cases painful. A devel-
oper must take care that the combination of attributes is always unique.
For example, for a new release the version number must be increased.
1
More technical details can be found in my following blog posts: [van der Burg, 2011b,c,d]
220
Because developers have to take care of this, people typically do not use
strong names for internal release cycles, because it is too much work.
Creating a strong-named assembly may go wrong. It may be possible
that a developer forgets to update any of these strong name attributes,
which makes it possible to create a different assembly with the same
strong name. Then isolation in the GAC cannot be provided.
In contrast to the GAC, you can store any type of component in the Nix
store, such as executables, configuration files, compilers etc. Furthermore,
the Nix store uses hash codes derived from all build-time dependencies of a
component, which always provides unique component file names.
12.4 D E P L O Y I N G . N E T A P P L I C AT I O N S W I T H N I X
In order to be able to deploy software components implemented in .NET tech-
nology with the Nix package manager, we must provide build-time support
consisting of a function that builds components and which is capable of find-
ing the required buildtime dependencies and we must implement runtime
support so that the produced executables in the Nix store are capable of find-
ing their required runtime dependencies.
12.4.1 Using Nix on Windows
As we have explained earlier, the Nix package manager is support on various
operating systems, including Microsoft Windows through Cygwin. Cygwin
acts as a UNIX compatibility layer on top of the Win32 API.
12.4.2 Build-time support
Typically, software systems implemented with .NET technology are compo-
nentised by dividing a system in a collection of Visual Studio projects
2
. Visual
Studio projects can be built from the Visual Studio IDE or non-interactively
by running the MSBuild command-line tool.
Invoking the .NET framework
The MSBuild tool which is used to build Visual Studio projects non-interactively
is included with the .NET framework. The .NET framework is tightly in-
tegrated into the Windows operating system and also requires several reg-
istry settings to be configured. For these reasons, we cannot isolate the .NET
framework in the Nix store in a pure manner as it has dependencies on com-
ponents which reside outside the Nix store, which cannot be reproduced in a
pure manner.
2
A Visual Studio project is not necessarily a component in a strict sense and would only
map to Szyperski’s definition [Szyperski et al., 2002] if developers take a number of important
constraints into account.
Chapter 12. Supporting Impure Platforms: A .NET Case Study 221
{stdenv, dotnetfx}: 172
{ name, src, slnFile, targets ? "ReBuild"
, options ? "/p:Configuration=Debug;Platform=Win32"
, assemblyInputs ? [] 173
}:
stdenv.mkDerivation { 174
inherit name src; 175
buildInputs = [ dotnetfx ]; 176
installPhase = ’’
177
for i in ${toString assemblyInputs}; do
windowsPath=$(cygpath --windows $i)
AssemblySearchPaths="$AssemblySearchPaths;$windowsPath"
done
export AssemblySearchPaths
ensureDir $out
outPath=$(cygpath --windows $out)\\
MSBuild.exe
${slnFile} /nologo /t:${targets} \
/p:OutputPath=$outPath ${options} ...
’’;
}
Figure 12.1 A Nix function for building a Visual Studio C# project
To make the MSBuild tool available during the build phase of a Nix compo-
nent, we have created a proxy component, which is a Nix package containing
a symlink to a particular MSBuild.exe executable hosted on the host system
outside the Nix store. Every variant of the .NET framework is stored in a
separate directory on the host filesystem and has its own version of MSBuild.
For example, by creating a symlink to
/cygdrive/c/WINDOWS/Microsoft.NET/Framework/v4.0.30319/MSBuild.exe, we can imple-
ment a proxy component for the .NET framework version 4.0.
The major obstacle in using a proxy component is that dependencies are not
automatically included in the closure of a package, as the actual dependencies
reside on the host system, and must be installed by other means. For example,
if we want to copy a closure from one machine to another, then we must
manually ensure that the required .NET framework is present on the target
machine.
Another weakness of proxy components is that we cannot always guaran-
tee proper isolation. Fortunately, the .NET framework is stored in a sepa-
rate folder, instead of a global namespace, such as C:\WINDOWS\System32. The
name and version number in the path name ensure that the .NET framework
is isolated, although it uses a nominal dependency scheme and does not take
all related build attributes into account, such as the system architecture.
Implementing a Nix build function
Figure 12.1 shows how we have implemented a function capable of building
Visual Studio C# projects:
172
This expression is essentially a nested function. The outer function is
used to compose the inner-function by providing the required .NET
build-time dependencies, such as the standard UNIX environment (stdenv)
and the .NET framework (dotnetfx). Typically, this expression is first com-
222
posed by calling it with the right arguments, as we have seen earlier in
Section 2.2.3.
173 The inner function defines the actual build function which should be
invoked by users to build .NET applications. The inner function header
takes a number of arguments. The name argument corresponds to the
name of the resulting component that ends up in the Nix store path,
the src arguments refers to the location of the source code of the .NET
application, the slnFile argument refers to a Visual Studio Solution file
that defines how to build a project, the targets argument specifies which
build targets have to be processed (defaults to: ReBuild) the options argu-
ment specifies additional build parameters to MSBuild, and assemblyInputs
argument is a list of derivations containing library assemblies (which
themselves may be produced by other Nix derivation functions) which
should be found at build-time.
174
In the body of the inner-function, we invoke the stdenv.mkDerivation, which
as we have seen many times earlier, builds a package from source code
and a number of given dependencies.
175 Here, we pass the given name and src attributes to the builder function,
which takes care of unpacking it.
176 The buildInputs parameter specifies all build-time dependencies for the
standard builder. Here, we pass dotnetfx as a build input, so that the
MSBuild tool can be found.
177 The installPhase defines all the installation steps. Here, we compose the
AssemblySearchPaths environment variable, which is a semi-colon sepa-
rated string of path names in which the MSBuild tool looks for its build-
time dependencies. Because Nix is using Cygwin, the path names are
in UNIX format with forward slashes. We use the cygpath command to
convert them to Windows path names using backslashes. After compos-
ing the environment variable, we create the output directory in the Nix
store and we run MSBuild to build the Visual Stud io project. The MSBuild
tool has been instructed to store the output in the Nix store.
12.4.3 Run-time support
Apart from building .NET applications, we must also be able to run them
from the Nix store. As we have explained earlier, native ELF executables use
the RPATH header, which contains Nix store paths of all librar y dependencies.
Furthermore, for other types of components, such as Java, Perl or Python
programs, we can use environment variables, such as CLASSPATH, PERL5LIB
and PYTHONPATH to specify where runtime dependencies can be found. Nix
provides several convenience functions that automatically wrap programs in
shell scripts that automatically set these environment variables.
Chapter 12. Supporting Impure Platforms: A .NET Case Study 223
Unfortunately the .NET runtime does not have facilities such as an environ-
ment variable to search for runtime dependencies
3
. The .NET runtime locates
run-time dependencies roughly as follows [Microsoft, 2010b]:
First, the .NET runtime tries to determine the correct version of the as-
sembly (only for strong named assemblies)
If the strong named assembly has been bound before in memory, that
version will be used.
If the assembly is not already in memory, it checks the Global Assembly
Cache.
If all the previous steps fail, it probes the assembly, by looking in a config
file or by using some probing heuristics. Probing heuristics search in
the same base directory of the executable or a number of specified sub
directories. Only strong-named assemblies can be probed from arbitrary
filesystem locations and remote URLs, however they are always cached
in the Global Assembly Cache.
Because Nix stores all components including library assemblies in
unique folders in Nix store, this gives some challenges. We cannot use the
Global Assembly Cache (GAC) as it is an impurity in the Nix deployment
process. Hence, if an executable is started from the Nix store, the required
libraries cannot be found, because probing heuristics always look for libraries
in the same base directory as the executable and are never present.
Implementing a wrapper for .NET applications
To allow a .NET executable assembly to find its runtime dependencies, we
have implemented a different and more complicated wrapper function. The
wrapper is not a script, but a separate executable assembly that implements
an event handler, which the .NET runtime invokes when a runtime depen-
dency cannot be found and uses the reflection API to load library assemblies
from arbitrary Nix store paths. We have based this wrapper on information
provided by the following Knowledge Base article: [Microsoft Support, 2007].
Figure 12.2 shows the layout of the wrapper class, which dynamically loads
all required runtime dependencies and starts the Main() method of the main
executable:
178 The AssemblySearchPaths field defines an array of strings containing Nix
store path names to all library assemblies of the executable.
179 The ExePath field refers to the actual executable assembly having the
Main() method.
3
The Mono runtime (http://mono-project.com) an alternative .NET implementation
does by setting the MONO_PATH environment variable, however Mono does not provide com-
patibility for all Microsoft APIs
224
using System;
using System.Reflection;
using System.IO;
namespace HelloWorldWrapper
{
class HelloWorldWrapper
{
private String[] AssemblySearchPaths = {
@"C:\cygwin\nix\store\23ga...-ALibrary",
@"C:\cygwin\nix\store\833p...-BLibrary"
}; 178
private String ExePath =
@"C:\cygwin\nix\store\27f2...-Executable\Executable.exe"; 179
private String MainClassName =
"SomeExecutable.Executable"; 180
static void Main(string[] args) 181
{
new HelloWorldWrapper(args);
}
public HelloWorldWrapper(string[] args) 182
{
...
}
private Assembly MyResolveEventHandler(object sender,
ResolveEventArgs args) 183
{
...
}
}
}
Figure 12.2 The layout of the wrapper class written in C# code
180 The MainClassName field contains the name of the actual class name hav-
ing the Main() method.
181 This is the Main() method of the wrapper, which creates a new wrapper
instance that sets the assembly resolve event handler and dynamically
loads all the runtime dependencies.
182 This method is the constructor of the HelloWorldWrapper, which purpose
is to configure the assembly resolve event handler. The implementation
is shown in Figure 12.3.
183 This method defines our custom assembly resolve handler using the
reflection API to dynamically load all required runtime dependencies.
The implementation is shown in Figure 12.4.
Figure 12.3 shows the implementation of the HelloWorldWrapper constructor.
Here, we add our custom MyResolveEventHandler to the AppDomain controller,
then we use the reflection API to load the actual executable assembly and
then we execute its main method.
Chapter 12. Supporting Impure Platforms: A .NET Case Study 225
public HelloWorldWrapper(string[] args)
{
// Attach the resolve event handler to the AppDomain
// so that missing library assemblies will be searched
AppDomain currentDomain = AppDomain.CurrentDomain;
currentDomain.AssemblyResolve +=
new ResolveEventHandler(MyResolveEventHandler);
// Dynamically load the executable assembly
Assembly exeAssembly = Assembly.LoadFrom(ExePath);
// Lookup the main class
Type mainClass = exeAssembly.GetType(MainClassName);
// Lookup the main method
MethodInfo mainMethod = mainClass.GetMethod("Main");
// Invoke the main method
mainMethod.Invoke(this, new Object[] {args});
}
Figure 12.3 The implementation of the HelloWorldWrapper constructor
private Assembly MyResolveEventHandler(object sender,
ResolveEventArgs args)
{
// This handler is called only when the common language
// runtime tries to bind to the assembly and fails.
Assembly MyAssembly;
String assemblyPath = "";
String requestedAssemblyName =
args.Name.Substring(0, args.Name.IndexOf(","));
// Search for the right path of the library assembly
foreach(String curAssemblyPath in AssemblySearchPaths)
{
assemblyPath = curAssemblyPath + "/" +
requestedAssemblyName + ".dll";
if(File.Exists(assemblyPath))
break;
}
// Load the assembly from the specified path.
MyAssembly = Assembly.LoadFrom(assemblyPath);
// Return the loaded assembly.
return MyAssembly;
}
Figure 12.4 The implementation of the MyResolveEventHandler method
Figure 12.4 shows the implementation of the MyResolveEventHandler method,
which loads the runtime dependencies through the reflection API. This method
basically searches in all directories definined in the AssemblySearchPaths of the
presence of a required library assembly, then it loads the resulting library
assembly using the reflection API.
In this section we have shown a concrete implementation of a wrapper class
for a particular assembly. Normally, this class is generated by a Nix function
226
{dotnetenv, MyAssembly1, MyAssembly2}: 184
dotnetenv.buildSolution { 185
name = "My.Test.Assembly";
src = /path/to/source/code;
slnFile = "Assembly.sln"; 186
assemblyInputs = [ 187
dotnetenv.assembly20Path
MyAssembly1
MyAssembly2
];
}
Figure 12.5 Building a .NET library assembly
called buildWrapper which takes the name of the executable assembly and main
class as additional parameters. This function saves end-users the burden of
developing a wrapper class themselves.
There is only one minor caveat however. The access modifier of the Main()
method in .NET applications is inter nal by default, which means that we cannot
access it from other assemblies. Therefore, we also have to change the access
modifier of the real Main() method to public.
12.4.4 Usage
Packaging library assemblies
Figure 12.5 shows an example of a Nix function invocation used to build a
Visual Studio project:
184 Similar to ordinary Nix expressions, this expression is also a function
taking several input arguments, such as dotnetenv, which provides the
Visual Studio build function (shown in the previous code fragment) and
the library assemblies which are required to build the project.
185 In the body of the function, we call the buildSolution function that we have
defined earlier in Figure 12.1 with the required parameters.
186 The slnFile parameter refers to the path to the Visual Studio solution file,
which is used to build the component.
187 The assemblyInputs parameter refers to an arbitrary number of library
assemblies that must be used while building the Visual Studio project.
The first assembly dotnetenv.assembly20Path refers to the .NET framework
2.0 system assemblies, which are made available to the builder by a Nix
proxy component (similar to the .NET framework). The other assemblies
originate from the function arguments of this expression.
Packaging executable assemblies
The Nix expression in Figure 12.5 is can be used to build a librar y assembly,
but for executable assemblies we require a wrapper, as we have explained ear-
Chapter 12. Supporting Impure Platforms: A .NET Case Study 227
{dotnetenv, MyAssembly1, MyAssembly2}:
dotnetenv.buildWrapper { 188
name = "My.Test.Assembly";
src = /path/to/source/code;
slnFile = "Assembly.sln";
assemblyInputs = [
dotnetenv.assembly20Path
MyAssembly1
MyAssembly2
];
namespace = "TestNamespace"; 189
mainClassName = "MyTestApplication"; 190
mainClassFile = "MyTestApplication.cs"; 191
}
Figure 12.6 Building a .NET executable assembly
lier. By invoking the dotnetenv.buildWrapper function and a number of additional
arguments, we can automatically build a wrapped executable assembly.
Figure 12.6 shows an example of a Nix function invocation producing a
wrapped executable assembly. The structure of this expression is similar to
the previous expression, with the following differences:
188 The dotnetenv.buildWrapper function is invoked which generates the wrap-
per class shown in Figure 12.2 based on a number of given parameters,
instead of the dotnetenv.buildSolution function.
189 The namespace parameter specifies the namespace of the main class.
190 The mainClassName parameter specifies the name of the main class.
191 The mainClassFile specifies the path to the file containing the main class.
Composing packages
As with ordinary Nix expressions, we also have to compose Visual Studio
components by calling the build function in the previous code fragment with
the right parameters, as shown in Figure 12.7:
192 The dotnetenv attribute is bound to an expression that incorporates the
buildSolution and buildWrapper functions.
193 Here, the build function of the Test Assembly defined in Figure 12.5 is
invoked and called with its required dependencies.
By using the expression in Figure 12.7 and invoking the following command-
line instruction:
$ nix-env -f pkgs.nix -iA MyTestAssembly
The assembly in our example is built from source, including all library
dependencies and the output is produced in:
/nix/store/ri0zzm2hmwg01w2wi0g4a3rnp0z24r8p-My.Test.Assembly
228
rec {
dotnetfx = ...
stdenv = ...
dotnetenv = import ../dotnetenv { 192
inherit stdenv dotnetfx;
};
MyAssembly1 = import ../MyAssembly1 {
inherit dotnetenv;
};
MyAssembly2 = import ../MyAssembly1 {
inherit dotnetenv;
};
MyTestAssembly = import ../MyTestAssembly { 193
inherit dotnetenv MyAssembly1 MyAssembly2;
};
}
Figure 12.7 pkgs.nix: A partial Nix expression composing .NET packages
12.5 D E P L O Y I N G . N E T S E RV I C E S W I T H D I S N I X
Supporting local deployment of .NET packages through Nix is an essential
precondition to be able to deploy .NET services with Disnix. As with the
Nix package manager, Disnix can also be used on Windows through Cygwin.
However, we must also be able to support the deployment of .NET services on
.NET infrastructure components, such as Microsoft SQL server and Internet
Information Services (IIS).
To support these new types of services, we have developed the following
activation scripts:
mssql-database. This activation script loads a schema on initial startup if
the database does not exists. It uses the OSQL.EXE [Visual C# tutorials,
2011] tool included with Microsoft SQL server, to automatically execute
the SQL instructions from shell scripts to check whether the database
exists and to create the tables if needed.
iis-webapplication. This activation script activates or deactivates a web ap-
plication on Microsoft Internet Information Services. It uses the MSDe-
ploy.exe [Microsoft, 2010c] tool to automatically activate a web applica-
tion or deactivate a web application.
12.5.1 Porting the StaffTracker example
As an example case, we have ported the StaffTracker, a motivating example
shown in Chapter 5, from Java to .NET technology, using C# as an implemen-
tation language, ADO.NET as database manager, WCF to implement web
services and ASP.NET to create the web application front-end.
Apart from porting the code, we have to modify the deployment specifi-
cations as well. For example, we have to specify Disnix expressions for each
Chapter 12. Supporting Impure Platforms: A .NET Case Study 229
{dotnetenv}:
{zipcodes}:
dotnetenv.buildSolution {
name = "ZipcodeService";
src = ../../../../services/webservices/ZipcodeService;
baseDir = "ZipcodeService";
slnFile = "ZipcodeService.csproj";
targets = "Package";
preBuild = ’’
sed -e ’s|.\SQLEXPRESS|
${zipcodes.target.hostname}\SQLEXPRESS|’ \
-e ’s|Initial Catalog=zipcodes|Initial catalog=${zipcodes.name}|’ \
-e ’s|User ID=sa|User ID=${zipcodes.target.msSqlUsername}|’ \
-e ’s|Password=admin123$|Password=${zipcodes.target.msSqlPassword}|’ \
Web.config
’’;
}
Figure 12.8 Disnix expression for the ZipcodeService service component
service component utilising our .NET deployment infrastructure described in
Section 12.4.
Figure 12.8 shows the Disnix expression for the ZipcodeService part of the
StaffTracker architecture shown in Chapter 5. As with ordinary Disnix expres-
sions, the expression is a nested function taking the intra-dependencies and
inter-dependencies. In the body, we invoke the dotnetenv.buildSolution function
described in Section 12.4 to build a .NET web service.
Apart from porting all Disnix expressions, we must also update the Disnix
services model and change the service types of the web services and web
applications to iis-webapplication so that they are deployed to a IIS web ser ver
and the databases to mssql-database so that they are deployed to a Microsoft
SQL server instance. By running the following command-line instruction,
all the ser vices of the .NET StaffTracker system are built from source code,
transferred to the right machines in the network and activated (a result is
shown in Figure 12.9):
$ disnix-env -s services.nix -i infrastructure.nix \
-d distribution.nix
12.6 E X P E R I E N C E
Apart from the StaffTracker example, we have tried applying the .NET deploy-
ment functions on the Philips medical imaging platform.
The platform was previously deployed by a custom implemented build
system, which builds Visual Studio projects in a predefined sequential order.
Moreover, before the actual build process starts, a number of template source
files in the entire source tree are modified.
As a consequence, the exact dependencies among the Visual Studio projects
were not explicitly known. Furthermore, it is also difficult to perform correct
builds of components separately and in parallel, to save build time. Rebuilds
are also expensive as they were performed by processing all Visual Studio
230
Figure 12.9 The .NET StaffTracker deployed in a Windows network using Disnix
projects and check which files have changed, which may still take a significant
amount of time.
In order to utilise any of the tools in our reference architecture for dis-
tributed software deployment, we must model all the components and their
build instructions in Nix expressions, which is complicated, as we have to
find out what the exact dependencies are and how the projects can be isolated
(they may have cross-references to files in other projects as well as implicit as-
sumptions on the environment) and labourious as the platform is very large
and consists of many components. Making the transition to the Nix package
manager cannot be done in a straight forward manner
4
. Furthermore, because
of the size and complexity of the platform it is also not doable to make all the
changes to improve the deployment process on the entire platform in a short
time window.
To improve the deployment process of the medical imaging perform, we
have implemented a new custom build system utilising MSBuild, providing
some degree of compatibility with the old build system, in which we grad-
ually incorporate techniques described in this thesis, such as a dependency
based build system and build results which are stored in separate directories.
Although they are isolated, they are not guaranteed to be pure and they only
distinguish variants on a selected number of attributes instead of hash codes.
Furthermore, the system also automatically extracts dependencies from Vi-
sual Studio projects, such as the referenced files and library assemblies, so that
a minimal amount of manual specification is needed by the developers. Other
4
This is not entirely true. We can also consider the entire platform as a single component, but
this does not give end-users any benefits in using our tools. The advantages manifest themselves
if the system that is being deployed is properly componentised.
Chapter 12. Supporting Impure Platforms: A .NET Case Study 231
types of dependencies can be annotated with custom tags in the Visual Studio
solution files. Also, the architectural changes are gradually implemented to
remove implicit references and cross-references between projects. Currently,
we are capable of deploying various subsets of the platform using the Nix
package manager, such as several utility programs.
12.7 D I S C U S S I O N
In this chapter, we have explained how we can deploy .NET components with
Nix and Disnix, by implementing a Nix convenience function capable of per-
forming Visual Studio project builds and which has the ability to find the
specified required run-time dependencies. We have also demonstrated how
isolated executable assemblies can find their run-time dependencies by means
of a wrapper. In order to make it work in a strict deployment as Nix, we have
to make certain compromises, by means of proxy components, because certain
build-time dependencies are tightly integrated to the host system and must
be deployed by other means.
The deployment functions described in this chapter have a number of limi-
tations. The first limitation is the lack of dependency completeness. Although
most components are captured in a closure, we also have proxy components
(such as the .NET framework) referring to components residing outside the
Nix store. These dependencies are not included when transferring a closure
to another machine. However, the .NET framework is a common package in-
cluded with most recent of versions of Microsoft Windows, so this should not
impose any serious problems.
Second, native Windows applications do not implement the same system
features as UNIX applications and cannot be jailed with the chroot() system
call, which may make certain package builds impure if they reference tools
outside the Nix store. Therefore, developers must manually ensure that builds
have no dependencies on components outside the Nix store. Furthermore,
even if were able to chroot, we cannot use it, as proxy components still must
be able to reference tools on the host system.
As explained earlier, Disnix is only designed for deployment of service
components and does not manage the underlying infrastructure components.
To provide complimentary infrastructure deployment, we have developed
DisnixOS as described in Chapter 6. However, we cannot use DisnixOS to
deploy Windows machines as the system services and infrastructure compo-
nents, such as Microsoft SQL server, cannot be isolated in the Nix store due to
their proprietary nature. Infrastructure deployment must be done manually
or by other deployment means.
As a final note, it pays off to have a close look at Szyperski’s definition
of component software [Szyperski et al., 2002] and to take these into account
while designing a system. The first important aspect is that all explicit context
dependencies must be known, which was sometimes not the case with the
medical imaging platform components (implemented as Visual Studio pro-
jects). Sometimes libraries could be implicitly found in global system direc-
232
tories (e.g. C:\WINDOWS\System32), which may allow a build to succeed even
though a dependency has not been specified. Invoking these Visual Studio
builds in Nix resulted in build errors, because of unspecified dependencies.
Furthermore, there were also a number of cross-references between files in
various Visual Studio projects that are (in principle) external dependencies
that were not explicitly specified.
Another important aspect is that every component must be treated as a unit
of independent deployment. Without explicitly knowing the dependencies of
a component, it is very unlikely that a particular component can be safely and
independently deployed on an external machine. Finally, every component
is in principle subject to composition by third parties, which requires imple-
menters to implement means to specify or replace dependencies at build and
run-time. A number of projects had implicit assumptions on the locations of
their dependencies, which may yield problems if a component is deployed
elsewhere, or a dependency had to be replaced by a different version or vari-
ant.
Properly componentising a system in advance saves the burden of imple-
menting expensive refactorings on the architecture of the system to improve
the deployment process.
12.8 C O N C L U S I O N
In this chapter, we have shown how we can deploy .NET applications on
the Microsoft Windows platform. Systems depending using .NET technology
are (in principle) impure as we cannot completely deploy all its components
through the Nix package manager, such as the .NET framework.
In order to be able to use our techniq ues, we had to make a small number of
compromises by means of proxy components, at the expense of dependency
completeness. However, the costs are not that high as the the .NET framework
is a common dependency present on many installations. Moreover, in order to
allow a .NET executable assembly to find its required run-time dependencies,
we had to implement a complex wrapper facility. Although the wrapper is
complicated, its complexity is hidden by the dotnetenv.buildWrapper function that
automatically generates the wrapper class with its required properties.
By using these facilities we can deploy .NET applications with the Nix
package manager, build and test them on Hydra and deploy service oriented
systems using .NET technology with Disnix.
Furthermore, we have described our experiences with migrating a large
code base to the Nix deployment system. A complete transition is not doable
in a short time window, as due to architectural erosion, not all components
can be easily isolated. Therefore, we have developed a custom build system
gradually incorporating relevant build techniques.
Chapter 12. Supporting Impure Platforms: A .NET Case Study 233
234
Part VI
Conclusion
235
13
Conclusion
In the introduction, we have explained that non-functional requirements on
software systems and various advancements in software engineering have
made the software deployment process consisting of all activities to make a
software system available for use very complicated and error prone. For the
latest generation of systems, which are typically offered as services through
the Internet, many new challanges have arisen next to the deployment chal-
lenges that have arisen in the past. Apart from technical challenges also non-
functional deployment requirements must be met, such as reliability, privacy
and the licenses under which components are governed.
As a solution, we have proposed a reference architecture for distributed
software deployment that implements various deployment concerns. This
reference architecture captures properties of a family of domain-specific de-
ployment tools and can be turned into a concrete architecture for a domain-
specific deployment tool by creating a tool utilising the concrete components
defined in the reference architecture and by integrating custom components.
In the other chapters of this thesis, we have covered the individual com-
ponents of the reference architecture implementing service and infrastructure
deployment, the most important general concepts and a number of applica-
tions, including an example of a domain-specific deployment tool.
13.1 S U M M A RY O F C O N T R I B U T I O N S
Each of the chapters in thesis thesis have a number of contributions. We
summarise these contributions below:
A reference architecture for distributed software deployment, incorpo-
rating components that solve various deployment aspects and utilise a
combination of the layered virtual machines and the purely functional
batch processing architectural patterns, to make deployment reliable,
reproducible and efficient (Chapter 3).
A toolset to automatically deploy heterogeneous service-oriented sys-
tems in heterogeneous networks in a reliable and reproducible manner
(Chapter 4).
A modular, extensible toolset incorporating deployment planning algo-
rithms described in the literature, to dynamically map services to ma-
chines, based on their technical and non-functional requirements (Chap-
ter 5).
237
A framework to reliably and efficiently redeploy service-oriented sys-
tems in case of an event that may occur in a network of machines (Chap-
ter 5).
An infrastructure to efficiently instantiate virtual machines and to per-
form distributed system integration tests (Chapter 7).
An extension and case-study to make distributed service upgrades fully
atomic (Chapter 8).
A prototype tool to support reliable and reproducible deployment of
mutable software components (Chapter 9).
A prototype tool to trace build processes and a method to derive how
source artifacts are combined into the resulting executables (Chapter 10).
An example of a domain-specific deployment tool for WebDSL applica-
tions (Chapter 11).
A study of applying our tools in unconventional and impure environ-
ments (Chapter 12).
13.2 E VA L U AT I O N
The central goal of this thesis is to construct a reference architecture for dis-
tributed software deployment, which can be used to realise a concrete archi-
tecture for a domain-specific deployment tool. We have used the Nix package
manager as a fundamental basis and we have developed a number of addi-
tional components, capturing several deployment concerns, such as services,
the infrastructure and the licenses under which components are governed.
In Chapter 3, we have also listed several important quality attributes that
we want to achieve with this reference architecture. The primary means to
achieve this goal is to use the purely functional batch processing architectural
pattern. In this section, we evaluate how well we have achieved these quality
attributes.
13.2.1 Reliability
As we have seen in Chapter 3, the reliability attribute covers dependency
completeness, immutability, atomicity and recovery.
Dependency completeness
Dependency completeness, which means that all required dependencies must
be present, correct, and the system must be able to find them, is very well
covered in both Disnix and DisnixOS (which incorporates charon). In Disnix,
described in Chapter 4, dependency completeness at buildtime is checked
statically, as ser vice build expressions define both intra-dependencies and
inter-dependencies as function arguments. Properties of these arguments are
238
used in the build process of the service, resulting in an error if any of these
dependencies is omitted.
At runtime, dependency completeness for intra-dependencies is ensured
by using the same mechanisms as the Nix package manager. By scanning for
hash occurrences of dependencies inside components and by ensuring that
these dependencies are deployed first.
For inter-dependencies, the generated manifest file captures all the runtime
dependencies of services and disnix-activate (invoked by the high-level disnix-
env tool) ensures that all the activation and deactivation steps of services are
performed in the right order and never run into a situation that an inter-
dependency is missing. However, we cannot ensure that machines (which are
technically also dependencies of a deployment scenario) are always present.
For mutable components described in Chapter 9, the dependencies are
relatively simple. They require the presence of the container service and
command-line tools capable of interacting with the container. The latter
property is ensured, because Dysnomia is also built through Nix, and the
Dysnomia modules contain absolute references to store paths containing the
command-line tools (Nix ensures that they are always present if Dysnomia is
deployed, as described in Chapter 2).
The presence of running container components cannot be automatically
ensured directly through Dysnomia. In our experiments, we typically derive
container properties from NixOS configurations and by configuring Dysnomia
in a NixOS module, which ensures that the container components are present
and correct. However, on non-NixOS systems, other means must be used to
ensure that the container components are present and correct.
Immutability
Immutability, the fact that we store all components in isolation and that they
are never modified imperatively, is achieved due to the fact that interaction
between most of the components in the reference architecture is done through
Nix derivation functions implementing the purely functional batch processing
architectural pattern. As we have seen in Chapter 2, Nix derivation functions
execute procedures in isolated environments, in which many side effects are
removed, and never overwrite or remove files belonging to another compo-
nent. As a result, these procedures (nearly) always yield the same result,
based on their declared input arguments.
Atomicity
Atomicity is a property that ensures that we either have the old configuration
or the new configuration, but never an inconsistent mix of the two.
Local atomicity is a direct consequence of using the purely functional batch
processing architectural pattern, as their outputs are stored in a purely fun-
cional cache, such as the Nix store or the Dysnomia snapshot hierarchy. Their
file names are always unique and thus there is never a situation in which a
component has files belonging to one version and another.
Chapter 13. Conclusion 239
Full distributed atomicity cannot be achieved through this architectural
pattern, but must be implemented in the deployment tools, e.g., Disnix and
Charon, and several properties must be implemented in the application that
is being deployed, as we have seen in Chapter 8. Implementing application
specific upgrade modifications is not always trivial, for example for web ap-
plications and stateless connections.
So far, we have only experimented with full distributed atomic upgrades on
service level and we have implemented a simple prototype example case sup-
porting these kind of upgrades. However, applying fully distributed atomic
upgrades in general is hard to achieve we require cooperation with the sys-
tem that is being deployed.
Recovery
Recovery is the ability to repair a deployed system if a machine with required
dependencies disappear. As with atomicity, recovery is also a property that
must be implemented in tools, such as Disnix and Charon. In Chapter 5, we
have shown a solution that automatically redeploys a service-oriented system
in case of an event using Disnix. In the evaluation results, we have seen that
for most types of events, the downtime is very minimal and suggests that our
deployment model works quite well for this purpose.
However, our recovery approach only works on service level and not on
infrastructure level. For example, system services are not reconfigured if nec-
essary (such as the amount of memory they should consume), which may be
required to do for certain types of events. Also the deployment planning algo-
rithms are limited to simple network topologies and there is no sophisticated
support for state migration.
13.2.2 Reproducibility
Reproducibility, the ability to exactly reproduce a configuration elsewhere, is
also a quality attribute that is ensured by using the purely functional batch
processing architectural pattern. Because side-effects are removed, executing
these functions should always yield the same result, regardless on what ma-
chine the function is executed and assuming that the machine can be trusted.
The mechanisms that Disnix (and Charon) use to remotely deploy com-
ponents are a major example of utilising the reproducibility properties to its
fullest. Moreover, another prominent example is the distributed system inte-
gration test facilities built on top of NixOS (as described in Chapter 7), which
uses sharing of the host systems’ Nix store for efficiency.
13.2.3 Genericity
Genericity, being able to support systems using various programming lan-
guages, component technologies and operating systems, is a quality attribute
that is achieved thanks to the fact that arbitrary processes can be invoked
in derivation functions implementing the purely functional batch processing
240
architectural pattern. These processes can be implemented in any program-
ming language, such as C, Java, Haskell, PHP, Perl and Python, and using any
component technology.
We have also seen in Chapter 12, that we can invoke processes hosted out-
side the Nix store, such as MSBuild, used to compile Visual Studio C# projects
by creating proxy components. However, using proxy components reduces
dependency completeness and reproducibility.
Genericity for mutable components is achieved in Dysnomia, that merely
requires interfaces and their semantics implemented in a plugin system. Hence,
we are capable of supporting various mutable components, such as MySQL
databases, Subversion repositories etc.
13.2.4 Extensibility
Extensibility is achieved by exactly the same means as the genericity quality
attribute. For example, in the Dynamic Disnix framework described in Chap-
ter 5, we also invoke deployment planning methods from derivation functions
allowing to invoke tools implemented in any language.
13.2.5 Efficiency
Several efficiency measures are a direct consequence of using the Nix ex-
pression language and the purely functional batch processing architectural
pattern.
All components in our reference architecture use models implemented in
the Nix expression language, which is a purely functional domain-specific
language. As a consequence, it supports laziness and only evaluates functions
(with certain function arguments) when it is required. As an optimisation in
purely functional languages, the result of a function invocation can be cached
and therefore it only has to be executed once.
Furthermore, the Nix store serves the purpose as a persistent cache of
derivation function results. As a result, if a specific component has been
built before (or a configuration has been generated before), we can return the
result from the Nix store instead of evaluating the function again.
Because the dependencies for each process invocation are explicitly known
and these functions have no side effects, we can also easily parallelise tasks
that do not depend on each other.
Dysnomia has a similar persistent cache using a different mechanism and
can use a name based comparison strategy to check whether to execute a
deployment operation. However, in our experiments we have seen that using
the filesystem as an intermediate layer may be very expensive for large data
sets.
13.3 R E S E A R C H Q U E S T I O N S R E V I S I T E D
Research Question 1
Chapter 13. Conclusion 241
How can we automatically deploy heterogeneous service-oriented systems in a
network of machines in a reliable, reproducible and efficient manner?
As we have seen in Chapter 4, services are a bit different compared to ordi-
nary components. Apart from local dependencies (called intra-dependencies),
they also have dependencies on services which may reside on other machines
in a network (the inter-dependencies).
In Chapter 4, we have described Disnix, as an extension to Nix, in order to
deploy services in a network of machines. As we have seen, Disnix uses three
models each capturing a separate concern to perform the complete deploy-
ment process.
Local reliability, reproducibility and efficiency are ensured thanks to the
fact that Disnix builds completely around Nix and implements the purely
functional batch processing pattern. To ensure reliability in a distributed set-
ting, we have ensured that also all inter-dependencies are always present and
activated in the right order.
In Chapter 8, we have described how to make distributed upgrades fully
atomic. Basically, we perform all build actions in a purely functional manner
first and then we distribute all components. These steps can be performed in
a non-destructive manner without affecting running systems. In the activa-
tion phase, we first notify services, so that they can temporarily block/queue
connections so that end users cannot observe that the system is being up-
graded. However, to achieve full atomicity application specific changes must
be implemented, such as implementing a proxy.
In Chapter 6 we have described DisnixOS. DisnixOS (that incorporates
charon) uses similar concepts in its deployment procedure as Disnix, includ-
ing a final imperative activation phase in which system configurations are
activated.
As with Disnix, most deployment actions are also non-destructive, depen-
dencies are always complete and only what is necessary gets evaluated, which
are implications of using the purely functional batch processing architectural
pattern.
We have not yet implemented fully atomic distributed upgrades on infras-
tructure level.
Research Question 2
How can we efficiently deal with deployment related events occurring in a net-
work of machines?
In Chapter 5, we have shown an extended architecture consisting of a dis-
covery service generating an infrastructure model, a distribution generator,
which generates a distribution model and Disnix which performs redeploy-
ment when an event occurs.
In order to dynamically deploy service-oriented systems, we must generate
a distribution model, in which we map services to machines in the network
based on their technical and non-functional properties.
As we cannot solve this problem in a generic manner, we have implemented
an extensible mechanism in Chapter 5, which can be used to invoke various
242
deployment planning algorithms. This mechanism can be used to develop a
custom strategy that maps services to machines in the network. The extension
mechanism implements the purely functional batch processing architectural
pattern.
In the evaluation results, we have seen that although the overall deploy-
ment times may be long in some cases, the time in which the system is incon-
sistent (the activation phase) is often very short for most types of events. This
suggests that our framework is sufficiently effective for practical purposes.
Research Question 3
How can we declaratively perform distributed system integration tests?
To perform distributed system integration tests, we have described an ap-
proach in Chapter 7, which cheaply instantiates networks of virtual machines,
based on the fact that we can share a host system’s Nix store (so that no disk
images have to be generated) and that we can capture all static aspects of an
entire operating system and deploy them in a reliable and reproducible man-
ner. Thanks to these features we can instantiate virtual networks in a matter
of seconds.
Furthermore, by implementing a test driver that remotely executes opera-
tions in these virtual machines, we can write test scripts that can be automat-
ically performed. In Chapter 7, we have described a collection of test cases,
that normally would not be tested automatically at all.
Research Question 4
How can we automatically and reliably deploy mutable software components in
a purely functional deployment model?
In Chapter 9, we have developed Dysnomia, which can be used to automat-
ically deploy mutable components. As Dysnomia only requires an interface, it
cannot ensure reliability directly, although the modules we have implemented
all invoke the proper command-line tools capturing state in a portable and
consistent manner. If plugin developers ensure portability and consistency
for a particular type, it can be considered reliable.
Furthermore, in order to isolate snapshots safely, we have designed a new
naming convention that always produces a unique name. The Dysnomia mod-
ule designer has to map an order attribute to the underlying implementation,
such as a revision number.
The major drawback of Dysnomia is, as we have described earlier, effi-
ciency. We are using the filesystem as an intermediate layer, which is very
expensive for certain component types.
Research Question 5
How can we determine under which license components are governed and how
they are used in the resulting product?
We have developed a prototype tool in Chapter 10, in which we instrument
Nix build functions with strace, which traces all the system calls invoked dur-
ing a build process. From these traces we can produce a build graph showing
Chapter 13. Conclusion 243
the involved processes and their inputs and outputs. From such graphs we
can also derive which source files are used and under which licenses these
are governed, by a tool such as ninka. The evaluation results suggest that the
method is highly effective.
Research Question 6
How can we apply the components of the reference architecture?
As a case study, we have developed a domain-specific deployment tool for
WebDSL in Chapter 11. The tool is composed of a function that produces
a NixOS network specification from a high-level specification, which can be
used by Charon and the NixOS test driver. To generate a NixOS network
specification, we use a number of utility functions and the NixOS module
system to compose various domain-specific system aspects. One of the major
limitations of our case study is that we do not support state migrations.
In Chapter 12, we have applied Nix and Disnix on the Microsoft Windows
operating system to deploy Visual Studio C# projects. As we have seen, certain
aspects of the system cannot be isolated and deployed in a pure manner.
As a “compromise” we have implemented proxy components referring to
components hosted outside the Nix store. Furthermore, in order to run exe-
cute assemblies from an (isolated) Nix store, we have to implement a complex
wrapper class, capable of loading the required library assemblies.
With these modifications, we were capable of deploying .NET components
with Nix and Disnix at the slight expense of purity and reproducibility.
However, we have also seen that in order to support large codebases, also
the architecture must be partially re-engineered, which is a non-trivial and
very expensive process. Therefore, it is desirable to develop a (partially) au-
tomated solution capable of deriving the components and their dependencies
and to generate build expressions from it.
13.4 R E C O M M E N D AT I O N S F O R F U T U R E W O R K
Although we have implemented all components in our reference architecture
for distributed software deployment and we have achieved most of our quality
attributes for all of these components, there is room for improvement and
extra features to make deployment processes more convenient.
13.4.1 Network topologies
One of the major limitations of Disnix is that they only support simple net-
work topologies in which machines are directly reachable. Although for most
use cases this is sufficient and there are also other means to cope with com-
plex hierarchies, e.g. by configuring gateways, it limits the use of more ad-
vanced deployment planning algorithms, such as Avala [Mikic-Rakic et al.,
2005]. Therefore, it would be useful to add network topology features to
Disnix, so that these more advanced deployment planning algorithms can be
244
used. Another use case may be to construct tree like network structures, to
make the distribution of large closures more efficient.
Another assumption that we made is that we consider all network connec-
tions stable, but in some environments such as hospitals, certain devices may
only have limited connectivity and may belong to different networks at cer-
tain points in time. Therefore, it may be worth investigating deployment in
ad-hoc networks. An example of such an approach is described by Roussain
and Guidec [2005]. Their approach, however, has a number of drawbacks,
such as the fact that they use an imperative underlying deployment process
with all its disadvantages. As a consequence, it is difficult to guarantee that all
dependencies are present, specified and correct. Furthermore, their approach
is Java specific.
13.4.2 Infrastructure-level self-adaptability
In Chapter 5, we have described a framework that redeploys service-oriented
systems in case of an event in the network. In order to allow this approach
to succeed, we always require at least one machine configuration capable of
hosting a particular machine in the network and their machine configurations
are always static.
Therefore, it may also be desirable to investigate self-adaptibility on in-
frastructure level, e.g., by composing system configurations based on the re-
quirements of services and by upgrading or limiting the system resources that
certain system services consume.
13.4.3 More declarative system integration testing of service-oriented systems
In some of our case studies, such as SDS2, we have seen that bindings be-
tween services can be ver y fragile and require proper handling of connection
failures. For example, it may be very tempting to write a Java web service
connecting to a database, as described in Figure 13.1.
In the constructor, the database connection is retrieved from the applica-
tion servers’ connection pool, but never released. If at some point in time, a
connection failure occurs, then it will never be restored. As a consequence,
the getRecords() method can never return data and the web service must be re-
deployed in order to make it operational again. For large systems, consisting
of many inter-dependent services this is undesirable, as it requires system ad-
ministrators to redeploy many services, to know what the exact dependencies
are and to do the redeployment in the right order, which may be complicated
and error prone.
Instead, services must be more robust and catch the proper exceptions and
restore the database connection if it has been disconnected. In SDS2, we have
identified these issues after deploying it in a distributed environment and
running it for a long time and we have implemented proper connection error
handling so that these errors no longer occur.
Chapter 13. Conclusion 245
import java.util.
*
;
import java.sql.
*
;
import javax.sql.
*
;
import javax.naming.
*
;
public class MyService
{
private Connection connection;
public MyService() throws Exception
{
InitialContext ctx = new InitialContext();
DataSource ds = (DataSource)ctx.lookup("java:comp/env/jdbc/MyDB");
connection = ds.getConnection();
}
public ArrayList<String> getRecords() throws Exception
{
PreparedStatement pstmt =
connection.prepareStatement("select record from records");
ResultSet result = pstmt.executeQuery();
ArrayList<String> list = new ArrayList<String>();
while(result.next())
list.add(result.getString(1));
return list;
}
}
Figure 13.1 MyService.java: An example of a fragile Java web service
Such behaviour can be automatically detected by our distributed system in-
tegration test facilities described in Chapter 7. However, the service-oriented
testing facilities have a number of drawbacks. First, we must always manually
provide a Disnix infrastructure model and distribution model that properly
deploys all the service components, which may be very labourious and te-
dious to do for a system that is continuously evolving. Second, we must also
manually execute test operations on the right machines in the network.
As a solution, we can use the vertex coloring deployment planning algo-
rithm from the Dynamic Disnix framework (as described in Chapter 5) to gen-
erate a distribution model. Furthermore, we can also annotate services in the
Disnix service model with predicates (command-line instructions that check
for its validity) and a strategy that traverses the inter-dependency closures,
disables and re-enables connections and checks these predicates whether the
service still works. The only thing the end-user has to specify is predicates
that check for validity and a machine configuration capable of running all
types of services. Currently, we have a very basic implementation available,
which is not very mature yet.
13.4.4 Test scalability
Another limitation of the distributed system integration approach is that all
virtual machines are located on one single machine, which is inadequate for
large scale testing. Therefore, it is worth to explore an extended approach
246
in which a network of physical machines hosting virtual machines can be
generated to perform large scale test cases.
13.4.5 Patterns for fully distributed atomic upgrades
As we have seen in Chapter 8, in order to achieve fully atomic upgrades
cooperation with the system that is being deployed is required. So far, we only
have developed a TCP proxy that “drains” connections during the activation
phase, consisting of blocking or queuing incoming connections and waiting
until all already active connections are closed, so that end users are not able
to observe that a system is changing.
For example, to upgrade running web applications, more cooperation is
needed. Apart from draining the connections, we also have to refresh run-
ning web applications (which requires developers to implement a push noti-
fication mechanism) and eventually migrate state. It would be interesting to
identify such patterns and make these available through a library or another
abstraction facility.
13.4.6 Implementing a license calculus system
In Chapter 10, we have shown how we can construct graphs from dynami-
cally traced build processes, showing the tasks involved and their inputs and
outputs. Currently, licensing issues must still be analysed manually. For ex-
ample, we must manually judge if several combined source files governed
under different software licenses are compatible and what their resulting li-
cense is. Therefore, it is interesting to construct a license calculus capable of
determining under which license a resulting artifact is governed
1
.
A possible solution is to construct a set of rules for each combination of
licenses and link type (which can be determined from the process that uses
files that are governed under a specific license) and by rewriting the entire
graph until a license is determined for a resulting build artifact.
13.4.7 More sophisticated state deployment
Although we have implemented a prototype tool called Dysnomia in Chap-
ter 9 that is capable of deploying mutable components, it has various draw-
backs, such as efficiency issues. Therefore, it is desirable to investigate this
subject further and eventually develop a better discipline that is more effi-
cient. It has turned out that this problem is very difficult to efficiently solve
in a generic manner.
13.4.8 Extracting deployment specifications from codebases
In Chapter 12, we have implemented an approach capable of automatically
deploying .NET components. However, in order to fully deploy a large code-
1
This does not mean that such a tool could always judge whether a particular composition of
components is always legal, as only lawyers can do that.
Chapter 13. Conclusion 247
base with our tools, we need deployment specifications for all components,
which must be manually written. Manually writing deployment specifications
for large codebases can be very expensive, tedious and labourious.
However, most of the relevant deployment attributes are already captured
in configuration files of the IDE, such as Eclipse or Visual Studio IDE project
files. Therefore, it would be interesting to develop an automated approach
capable of generating Nix expressions from IDE project files to reduce the
amount of maintenance. Some IDEs have integration with build systems,
such as Apache Maven [Apache Software Foundation, 2010]. However, these
tools are restricted to manage build and test processes of components and
not designed to manage the full chain of deployment activities, such as dis-
tributed deployment of entire systems into network of machines. Further-
more, these tools are technology specific and do not implement a number
of important quality attributes, such as dependency completeness and repro-
ducibility. Therefore, they must be properly invoked from Nix expressions,
which ensures these attributes, which is not always a straight forward process,
as we have explained in Chapter 12.
In the Nix project, we already have two tools capable of extracting Nix ex-
pressions from specifications in a different format. We have cabal2nix
2
, which
coverts Haskell package specifications to Nix expressions and npm2nix
3
to con-
vert Node.js
4
package manager specifications.
13.4.9 Case studies with service composability
One of the missing aspects in supporting service orientation in hospital en-
vironments, as described in Section 1.2, is providing a solution for flexible
service compositions. In order to make a specific service available for use in
a heterogeneous environment consisting of devices with varying properties
and capabilities, we may have to offer a fat client variant of a service for pow-
erful workstations and a thin client variant for mobile phones. Possibly, also
a number of hybrid variants depending on the amount of system resources
available must be offered.
Although our reference architecture has the facilities to deploy such ser-
vices, there is still a need for a concrete, hands-on case study showing deploy-
ment of variants of the same service. In order to achieve this goal, intensive
cooperation with industry is required.
13.4.10Software deployment research in general
As a final note which is not specific to this thesis we also think that the
software deployment discipline in general deserves more attention in the re-
search world. For many subfields in software engineering there are confer-
ences and workshops available, which are specialised in these fields, such as
2
https://github.com/NixOS/cabal2nix
3
https://bitbucket.org/shlevy/npm2nix
4
http://nodejs.org
248
software language engineering, testing, reverse engineering, repository min-
ing and program comprehension. However, for software deployment there is
no subfield conference available
5
.
The papers used throughout these thesis are published at a wide range of
conferences and workshops, such as reliability engineering and self-adaptive
systems. Sometimes using software deployment in a different domain is ap-
preciated, but sometimes it is not and misunderstood. Also the software de-
ployment discipline is relatively unknown in the research world and there
is no consensus yet what good directions are and how to properly evalu-
ate software deployment techniques. Therefore it may be worth to revive
the Working Conference on Component Deployment (CD) or to organise a
software deployment workshop and to investigate the current software de-
ployment practices and to develop new research directions. Recently, a new
workshop on Release Engineering (RELENG)
6
has been announced that has a
big overleap with software deployment.
5
There used to be the Working Conference on Component Deployment (CD): http://www.
informatik.uni-trier.de/~ley/db/conf/cd/cd2005.html, which has no longer been
held since 2005.
6
http://releng.polymtl.ca
Chapter 13. Conclusion 249
250
Bibliography
Abate, P., Cosmo, R. D., Treinen, R., and Zacchiroli, S. (2013). A modular
package manager architecture. Information and Software Technology, 55(2):459
474. (Cited on page 61.)
Abate, P., Di Cosmo, R., Boender, J., and Zacchiroli, S. (2009). Strong de-
pendencies between software components. In 3rd International Symposium on
Empirical Software Engineering and Measurement (ESEM) , pages 8999. (Cited
on page 47.)
Adams, B., Tromp, H., Schutter, K. D., and Meuter, W. D. (2007). Design
recovery and maintenance of build systems. In ICSM, pages 114123. (Cited
on pages 187 and 199.)
Akkerman, A., Totok, A., and Karamcheti, V. (2005). Infrastructure for Au-
tomatic Dynamic Deployment of J2EE Applications in Distributed Environ-
ments. In CD 05: Proc. of the 3rd Working Conf. on Component Deployment,
pages 1732. Springer-Verlag. (Cited on pages 14, 66, and 93.)
Apache Software Foundation (2010). Apache maven. http://maven.
apache.org. (Cited on pages 71 and 248.)
Artho, C., Suzaki, K., Di Cosmo, R., Treinen, R., and Zacchiroli, S. (2012).
Why do software packages conflict? In 9th IEEE Working Conference on Mining
Software Repositories (MSR), pages 141150. (Cited on page 47.)
Attariyan, M. and Flinn, J. (2008). Using causality to diagnose configuration
bugs. In 2008 USENIX Annual Technical Conference (USENIX 08). USENIX.
(Cited on page 199.)
Baresi, L. and Pasquale, L. (2010). Live goals for adaptive service composi-
tions. In Proc. of the 2010 ICSE Workshop on Software Engineering for Adaptive
and Self-Managing Systems, SEAMS 10, pages 114123, New York, NY, USA.
ACM. (Cited on page 116.)
Bass, L., Clements, P., and Kazman, R. (2003). Software Architecture in Practice.
Addison Wesley, second edition. (Cited on pages 15 and 49.)
Beck, K., Beedle, M., van Bennekum, A., Cockburn, A., Cunningham, W.,
Fowler, M., Grenning, J., Highsmith, J., Hunt, A., Jeffries, R., Kern, J., Marick,
B., Martin, R. C., Mellor, S., Schwaber, K., Sutherland, J., and Thomas, D.
(2001). Manifesto for agile software development. Available at http://
www.agilemanifesto.org. (Cited on page 13.)
251
Begnum, K. (2006). Managing large networks of virtual machines. In LISA06:
Proc. of the 20st Conference on Large Installation System Administration Con-
ference, pages 205214, Berkeley, CA, USA. USENIX. (Cited on pages 127
and 153.)
Bravenboer, M., Kalleberg, K. T., Vermaas, R., and Visser, E. (2008). Strate-
go/XT 0.17. A language and toolset for program transformation. Science of
Computer Programming, 72(1-2):5270. Special issue on experimental software
and toolkits. (Cited on page 208.)
Bravenboer, M. and Visser, E. (2004). Concrete syntax for objects: domain-
specific language embedding and assimilation without restrictions. In OOP-
SLA 04: Proceedings of the 19th annual ACM SIGPLAN conference on Object-
oriented programming, systems, languages, and applications, pages 365383, New
York, NY, USA. ACM. (Cited on page 208.)
Brélaz, D. (1979). New methods to color the vertices of a graph. Commun.
ACM, 22(4):251256. (Cited on page 112.)
Build Audit (December 6, 2004). Build Audit. http://buildaudit.
sourceforge.net. (Cited on page 199.)
Buildbot project (2012). Buildbot manual. Available at http:
//buildbot.net/buildbot/docs/current/manual/index.html.
(Cited on page 42.)
Burgess, M. (1995). Cfengine: a site configuration engine. Computing Systems,
8(3). (Cited on pages 60, 122, 123, 152, and 178.)
Camara, J., Canal, C., and Salaun, G. (2009). Behavioural self-adaptation
of services in ubiquitous computing environments. In Proc. of the 2009
ICSE Workshop on Software Engineering for Adaptive and Self-Managing Sys-
tems, pages 2837, Washington, DC, USA. IEEE Computer Society. (Cited
on page 116.)
Cappos, J., Baker, S., Plichta, J., Nyugen, D., Hardies, J., Borgard, M., John-
ston, J., and Hartman, J. H. (2007). Stork: package management for dis-
tributed VM environments. In LISA07: Proceedings of the 21st conference on
Large Installation System Administration Conference, pages 116, Berkeley, CA,
USA. USENIX. (Cited on page 152.)
Caron, E., Chouhan, P. K., and Dail, H. (2006). GoDIET: A Deployment
Tool for Distributed Middleware on Grid 5000. Technical Report RR-5886,
Laboratoire de l’Informatique du Parallélisme (LIP). (Cited on pages 14, 66,
and 94.)
Carzaniga, A., Fuggetta, A., Hall, R. S., Heimbigner, D., van der Hoek, A.,
and Wolf, A. L. (1998). A characterization framework for software deploy-
ment technologies. Technical Report CU-CS-857-98, University of Colorado.
(Cited on page 3.)
252
Caseau, Y. (2005). Self-adaptive middleware: Supporting business process
priorities and service level agreements. Advanced Engineering Informatics,
19(3):199 211. (Cited on page 116.)
Chan, K. S. M. and Bishop, J. (2009). The design of a self-healing compo-
sition cycle for web services. In Proc. of the 2009 ICSE Workshop on Software
Engineering for Adaptive and Self-Managing Systems, pages 2027, Washington,
DC, USA. IEEE Computer Society. (Cited on page 116.)
Chen, H., Yu, J., Chen, R., Zang, B., and Yew, P.-C. (2007). Polus: A powerful
live updating system. In Proceedings of the 29th international conference on
Software Engineering, ICSE 07, pages 271281, Washington, DC, USA. IEEE
Computer Society. (Cited on page 166.)
Chen, X. and Simons, M. (2002). A component framework for dynamic re-
configuration of distributed systems. In CD 02: Proc. of the IFIP/ACM Work-
ing Conf. on Component Deployment, pages 8296. Springer-Verlag. (Cited on
pages 66 and 94.)
Cheng, B. H., Giese, H., Inverardi, P., Magee, J., de Lemos, R., Andersson, J.,
Becker, B., Bencomo, N., Brun, Y., Cukic, B., Serugendo, G. D. M., Dustdar,
S., Finkelstein, A., Gacek, C., Geihs, K., Grassi, V., Karsai, G., Kienle, H.,
Kramer, J., Litoiu, M., Malek, S., Mirandola, R., Müller, H., Park, S., Shaw,
M., Tichy, M., Tivoli, M., Weyns, D., and Whittle, J. (2008). 08031 soft-
ware engineering for self-adaptive systems: A research road map. In Cheng,
B. H. C., de Lemos, R., Giese, H., Inverardi, P., and Magee, J., editors, Soft-
ware Engineering for Self-Adaptive Systems, number 08031 in Dagstuhl Seminar
Proceedings, Dagstuhl, Germany. Schloss Dagstuhl - Leibniz-Zentrum fuer
Informatik, Germany. (Cited on page 115.)
Cicchetti, A., Ruscio, D., Pelliccione, P., Pierantonio, A., and Zacchiroli, S.
(2010). A model driven approach to upgrade package-based software sys-
tems. In Evaluation of Novel Approaches to Software Engineering, volume 69,
pages 262276. Springer Berlin Heidelberg. (Cited on pages 48 and 178.)
Cicchetti, A., Ruscio, D. D., Pelliccione, P., Pierantonio, A., and Zacchiroli,
S. (2009). Towards a model driven approach to upgrade complex software
systems. In 4th International Conference on Evaluation of Novel Approaches to
Software Engineering (ENASE 2009). (Cited on page 126.)
Coetzee, D., Bhaskar, A., and Necula, G. (2011). apmake: A reliable parallel
build manager. In 2011 USENIX Annual Technical Conference (USENIX 11).
USENIX. (Cited on page 200.)
Costanza, P. (2002). Dynamic Replacement of Active Objects in the Gilgul
Programming Language. In CD 02: Proc. of the IFIP/ACM Working Conf. on
Component Deployment, pages 125140. Springer-Verlag. (Cited on pages 66
and 94.)
Bibliography 253
de Jonge, M. (2005). Build-level components. IEEE Transactions on Software
Engineering, 31(7):588600. (Cited on page 185.)
de Jonge, M., van der Linden, W., and Willems, R. (2007). eServices for
Hospital Equipment. In Krämer, B., Lin, K.-J., and Narasimhan, P., editors,
5th Intl. Conf. on Service-Oriented Computing (ICSOC 2007), pages 391 397.
(Cited on pages 66 and 101.)
Dearle, A. (2007). Software deployment, past, present and future. In FOSE
07: 2007 Future of Software Engineering, pages 269284, Washington, DC,
USA. IEEE Computer Society. (Cited on page 173.)
Deb, D., Oudshoorn, M. J., and Paxton, J. (2008). Self-managed deployment
in a distributed environment via utility functions. In 20th International Con-
ference on Software Engineering and Knowledge Engineering (SEKE 08). (Cited
on page 116.)
den Breejen, W. (2008). Managing state in a purely functional deployment
model. Master’s thesis, Faculty of Science, Utrecht University, The Nether-
lands. (Cited on page 178.)
Di Cosmo, R. and Vouillon, J. (2011). On software component co-
installability. In Proceedings of the 19th ACM SIGSOFT symposium and the
13th European conference on Foundations of software engineering, ESEC/FSE 11,
pages 256266, New York, NY, USA. ACM. (Cited on page 47.)
Di Cosmo, R., Zacchiroli, S., and Trezentos, P. (2008). Package upgrades in
foss distributions: details and challenges. In Proceedings of the 1st International
Workshop on Hot Topics in Software Upgrades, HotSWUp 08, pages 7:17:5,
New York, NY, USA. ACM. (Cited on page 47.)
Dijkstra, E. W. (1959). Communication with an Automatic Computer. PhD thesis,
University of Amsterdam. (Cited on page 4.)
Dolstra, E. (2005a). Efficient upgrading in a purely functional compo-
nent deployment model. In Proceedings of the 8th international conference on
Component-Based Software Engineering, CBSE’05, pages 219234, Berlin, Hei-
delberg. Springer-Verlag. (Cited on page 30.)
Dolstra, E. (2005b). Secure sharing between untrusted users in a transparent
source/binary deployment model. In Proceedings of the 20th IEEE/ACM inter-
national Conference on Automated software engineering, ASE 05, pages 154163,
New York, NY, USA. ACM. (Cited on page 33.)
Dolstra, E. (2006). The Purely Functional Software Deployment Model. PhD
thesis, Faculty of Science, Utrecht University, The Netherlands. (Cited on
pages iv, 4, 15, 23, and 73.)
Dolstra, E., Bravenboer, M., and Visser, E. (2005). Service configuration man-
agement. In SCM 05: Proceedings of the 12th international workshop on Software
254
configuration management, pages 8398, New York, NY, USA. ACM. (Cited on
page 37.)
Dolstra, E. and Löh, A. (2008). NixOS: A purely functional Linux distribu-
tion. In ICFP 2008: 13th ACM SIGPLAN Intl. Conf. on Functional Programming.
ACM. (Cited on pages 15, 30, and 35.)
Dolstra, E., Löh, A., and Pierron, N. (2010). NixOS: A purely functional
Linux distribution. Journal of Functional Programming, 20. (Cited on pages 15,
35, and 38.)
Dolstra, E., Vermaas, R., and Levy, S. (2013). Charon: Declarative provi-
sioning and deployment. In 1st International Workshop on Release Engineering
(RELENG). IEEE Computer S ociety. To appear. (Cited on pages 122 and 130.)
Dolstra, E. and Visser, E. (2008). The Nix Build Farm: A Declarative Ap-
proach to Continuous Integration. In Mens, K., van den Brand, M., Kuhn, A.,
Kienle, H. M., and Wuyts, R., editors, International Workshop on Advanced Soft-
ware Development Tools and Techniques (WASDeTT 2008). (Cited on page 41.)
Dolstra, E., Visser, E., and de Jonge, M. (2004). Imposing a memory manage-
ment discipline on software deployment. In Proc. 26th Intl. Conf. on Software
Engineering (ICSE 2004), pages 583592. IEEE Computer Society. (Cited on
pages 15, 23, 73, and 192.)
Dumitras, T., Tan, J., Gho, Z., and Narasimhan, P. (2007). No more Hot-
Dependencies: toward dependency-agnostic online upgrades in distributed
systems. In HotDep’07: Proc. of the 3rd workshop on Hot Topics in System De-
pendability, page 14, Berkeley, CA, USA. USENIX Association. (Cited on
page 94.)
Edgewall Software (2009). Trac integrated SCM & project management.
http://trac.edgewall.org/. (Cited on page 151.)
Eilam, T., Kalantar, M., Konstantinou, A., Pacifici, G., Pershing, J., and
Agrawal, A. (2006). Managing the configuration complexity of distributed
applications in internet data centers. Communications Magazine, IEEE,
44(3):166 177. (Cited on pages 15, 60, 94, 95, and 178.)
El Maghraoui, K., Meghranjani, A., Eilam, T., Kalantar, M., and Konstanti-
nou, A. V. (2006). Model driven provisioning: bridging the gap between
declarative object models and procedural provisioning tools. In Proceedings
of the ACM/IFIP/USENIX 2006 International Conference on Middleware, Middle-
ware 06, pages 404423, New York, NY, USA. Springer-Verlag New York,
Inc. (Cited on pages 60 and 95.)
Feldman, S. I. (1979). Make—a program for maintaining computer programs.
Software—Practice and Experience, 9(4):25565. (Cited on pages 61 and 184.)
Fischer, J., Majumdar, R., and Esmaeilsabzali, S. (2012). Engage: a deploy-
ment management system. In PLDI, pages 263274. (Cited on page 95.)
Bibliography 255
Foster-Johnson, E. (2003). Red Hat RPM Guide. John Wiley & Sons. Also at
http://fedora.redhat.com/docs/drafts/rpm-guide-en/. (Cited
on pages 7, 23, 124, and 189.)
Fowler, M. and Foemmel, M. (2005). Continuous integration. http://www.
martinfowler.com/articles/continuousIntegration.html. Ac-
cessed 11 August 2005. (Cited on page 150.)
Free Software Foundation (2012a). What is copyleft? Available at http:
//www.gnu.org/copyleft. (Cited on page 7.)
Free Software Foundation (2012b). What is free software? http://www.
gnu.org/philosophy/free-sw.html. (Cited on page 7.)
FreeBSD project (2012). About freebsd ports. Available at http://www.
freebsd.org/ports. (Cited on page 46.)
Freedesktop Project (2010a). Avahi. http://avahi.org. (Cited on
page 105.)
Freedesktop Project (2010b). freedesktop.org - software/dbus. http://
www.freedesktop.org/wiki/Software/dbus. (Cited on page 90.)
Freeman, S., Mackinnon, T., Pryce, N., and Walnes, J. (2004). Mock roles, not
objects. In Vlissides , J. M. and Schmidt, D. C., editors, Companion to the 19th
Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems,
Languages, and Applications (OOPSLA 2004), pages 236246. ACM. (Cited on
page 153.)
Garlan, D., Cheng, S.-W., Huang, A.-C., Schmerl, B., and Steenkiste, P. (2004).
Rainbow: architecture-based self-adaptation with reusable infrastructure.
Computer, 37(10):4654. (Cited on page 117.)
German, D. M. and Hassan, A. E. (2009). License integration patterns: Ad-
dressing license mismatches in component-based development. In Proceed-
ings of the 31st International Conference on Software Engineering, ICSE 09, pages
188198, Washington, DC, USA. IEEE Computer Society. (Cited on pages 181
and 183.)
German, D. M., Manabe, Y., and Inoue, K. (2010). A sentence-matching
method for automatic license identification of source code files. In Proceed-
ings of the IEEE/ACM international conference on Automated software engineering,
ASE 10, page 437446, New York, NY, USA. ACM. (Cited on page 189.)
Germán, D. M., Penta, M. D., and Davies, J. (2010). Understanding and
auditing the licensing of open source software distributions. In ICPC, pages
8493. (Cited on pages 196 and 199.)
Graphviz (2010). Graphviz. http://www.graphviz.org. (Cited on
page 91.)
256
Grechanik, M., Xie, Q., and Fu, C. (2009). Maintaining and evolving GUI-
directed test scripts. In ICSE 09: 31st Intl. Conf. on Software Engineering,
pages 408418, Los Alamitos, CA, USA. IEEE Computer Society. (Cited on
page 141.)
Groenewegen, D. and Visser, E. (2008). Declarative access control for
webdsl: Com bining language integration and separation of concerns. In
Schwabe, D. and Curbera, F., editors, International Conference on Web En-
gineering (ICWE’08), Yorktown Heights, New York, USA. IEEE. (Cited on
page 205.)
Grundy, J., Kaefer, G., Keong, J., and Liu, A. (2012). Guest editors’ introduc-
tion: Software engineering for the cloud. IEEE Software, 29:2629. (Cited on
page 9.)
Guennoun, K., Drira, K., Wambeke, N. V., Chassot, C., Armando, F., and Ex-
posito, E. ( 2008). A framework of models for QoS-oriented adaptive deploy-
ment of multi-layer communication services in group cooperative activities.
Computer Communications, 31(13):3003 3017. (Cited on page 116.)
Hall, R. S., Heimbigner, D., and Wolf, A. L. (1999). A cooperative approach
to support software deployment using the software dock. In ICSE 99: Proc.
of the 21st Intl. Conf. on Software Engineering, pages 174183, New York, NY,
USA. ACM. (Cited on pages 14, 61, 94, and 127.)
Hemel, A., Kalleberg, K. T., Vermaas, R., and Dolstra, E. (2011). Finding soft-
ware license violations through binary code clone detection. In Proceedings
of the 8th working conference on Mining Software Repositories, MSR 11, pages
6372, New York, NY, USA. ACM. (Cited on page 181.)
Hemel, Z., Kats, L. C. L., Groenewegen, D. M., and Visser, E. (2010). Code
generation by model transformation: a case study in transformation modu-
larity. Software and System Modeling, 9(3):375402. (Cited on page 205.)
Hemel, Z., Verhaaf, R., and Visser, E. (2008). Webworkflow: An object-
oriented workflow modeling language for web applications. In Czar necki,
K., editor, International Conference on Model Driven Engineering Languages and
Systems (MODELS’ 08), Lecture Notes in Computer Science. Springer. (Cited
on page 205.)
Hewson, J. A., Anderson, P., and Gordon, A. D. (2012). A declarative ap-
proach to automated configuration. In Proceedings of the 26th international
conference on Large Installation System Administration: strategies, tools, and tech-
niques, lisa’12, pages 5166, Berkeley, CA, USA. USENIX Association. (Cited
on page 127.)
Heydarnoori, A. and Mavaddat, F. (2006). Reliable deployment of
component-based applications into distributed environments. 3rd Intl. Conf.
on Information Technology: New Generations, 0:5257. (Cited on pages 108
and 112.)
Bibliography 257
Heydarnoori, A., Mavaddat, F., and Arbab, F. (2006). Deploying loosely cou-
pled, component-based applications into distributed environments. IEEE In-
ternational Conference on the Engineering of Computer-Based Systems, 0:93102.
(Cited on pages 108 and 112.)
Hibler, M., Ricci, R., Stoller, L., Duerig, J., Guruprasad, S., Stack, T., Webb,
K., and Lepreau, J. (2008). Large-scale virtualization in the Emulab network
testbed. In 2008 USENIX Annual Technical Conference, pages 113128, Berke-
ley, CA, USA. USENIX. (Cited on page 153.)
Hudak, P. (1989). Conception, evolution, and application of functional pro-
gramming languages. ACM Comput. Surv., 21(3):359411. (Cited on pages 15
and 23.)
ifrOSS (20042009). GPL court decisions in Europe. http://www.ifross.
org/ifross_html/links_en.html#Urteile. (Cited on page 183.)
Kats, L. C. L. and Visser, E. (2010). The Spoofax language workbench. Rules
for declarative specification of languages and IDEs. In Rinard, M., editor,
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented
Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-
21, 2010, Reno, NV, USA, pages 444463. (Cited on page 208.)
Konstantinou, A. V., Eilam, T., Kalantar, M., Totok, A. A., Arnold, W., and
Snible, E. (2009). An architecture for virtual solution composition and de-
ployment in infrastructure clouds. In Proceedings of the 3rd international work-
shop on Virtualization technologies in distributed computing, VTDC 09, pages
918, New York, NY, USA. ACM. (Cited on pages 60 and 95.)
Kramer, J. and Magee, J. (1990). The evolving philosophers problem:
dynamic change management. IEEE Transactions on Software Engineering,
16(11):12931306. (Cited on page 166.)
Kramer, J. and Magee, J. (2007). Self-managed systems: an architectural
challenge. In 2007 Future of Software Engineering, FOSE 07, pages 259268,
Washington, DC, USA. IEEE Computer Society. (Cited on page 115.)
Larson, P., Hinds, N., Ravindran, R., and Franke, H. (2003). Improving the
Linux Test Project with kernel code coverage analysis. In Proceedings of the
2003 Ottawa Linux Symposium. (Cited on page 149.)
Laszewski, G. v., Blau, E., Bletzinger, M., Gawor, J., Lane, P., Martin, S., and
Russell, M. (2002). Software, component, and service deployment in compu-
tational grids. In CD 02: Proc. of the IFIP/ACM Working Conf. on Component
Deployment, pages 244256. Springer-Verlag. (Cited on page 94.)
Ma, X., Baresi, L., Ghezzi, C., Panzica La Manna, V., and Lu, J. (2011).
Version-consistent dynamic reconfiguration of component-based distributed
systems. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th Eu-
ropean conference on Foundations of software engineering, ESEC/FSE 11, pages
245255, New York, NY, USA. ACM. (Cited on page 166.)
258
MacKenzie, D. (2006). Autoconf, Automake, and Libtool: Autoconf, Automake,
and Libtool. Red Hat, Inc. Also at http://sources.redhat.com/
autobook/autobook/autobook.html. (Cited on pages 26 and 208.)
Malek, S., Medvidovic, N., and Mikic-Rakic, M. (2012). An extensible frame-
work for improving a distributed software system’s deployment architecture.
IEEE Transactions on Software Engineering, 38(1):73100. (Cited on page 107.)
Mancinelli, F., Boender, J., di Cosmo, R., Vouillon, J., Durak, B., Leroy, X., and
Treinen, R. (2006). Managing the complexity of large free and open source
package-based software distributions. In Proceedings of the 21st IEEE/ACM In-
ternational Conference on Automated Software Engineering, ASE 06, pages 199
208, Washington, DC, USA. IEEE Computer Society. (Cited on page 47.)
MaxMind (2010). MaxMind - GeoIP | IP Address Location Technology.
http://www.maxmind.com/app/ip-location. (Cited on page 99.)
McIntosh, S., Adams, B., Nguyen, T. H., Kamei, Y., and Hassan, A. E. (2011).
An empirical study of build maintenance effort. In Proceedings of the 33rd
International Conference on Software engineering, ICSE 11, pages 141150, New
York, NY, USA. ACM. (Cited on page 185.)
Medvidovic, N. and Malek, S. (2007). Software deployment architecture and
quality-of-service in pervasive environments. In International workshop on
Engineering of software services for pervasive environments, ESSPE 07, pages 47
51, New York, NY, USA. ACM. (Cited on page 107.)
Microsoft (2010a). Global assembly cache. http://msdn.microsoft.
com/en-us/library/yf1d93sz.aspx. (Cited on pages 46 and 220.)
Microsoft (2010b). How the runtime locates assemblies. http://msdn.
microsoft.com/en-us/library/yx7xezcf%28v=vs.110%29.aspx.
(Cited on page 224.)
Microsoft (2010c). Installing iis 7 on windows vista and windows
7. http://learn.iis.net/page.aspx/28/installing-iis-on-
windows-vista-and-windows-7. (Cited on page 229.)
Microsoft (2012). Windows Installer. http://msdn.microsoft.com/en-
us/library/cc185688%28VS.85%29.aspx. (Cited on page 46.)
Microsoft Support (2007). How to load an assembly at runtime that is located
in a folder that is not the bin folder of the application. http://support.
microsoft.com/kb/837908. (Cited on page 224.)
Mikic-Rakic, M., Malek, S., and Medvidovic, N. (2005). Improving Availabil-
ity in Large, Distributed Component-Based Systems Via Redeployment. In
Dearle, A. and Eisenbach, S., editors, Component Deployment, volume 3798 of
Lecture Notes in Computer Science, pages 8398. Springer Berlin / Heidelberg.
(Cited on pages 115, 116, and 244.)
Bibliography 259
Mikic-Rakic, M. and Medvidovic, N. (2002). Architecture-level support for
software component deployment in resource constrained environments. In
CD 02: Proc. of the IFIP/ACM Working Conf. on Component Deployment, pages
3150. Springer-Verlag. (Cited on page 94.)
Miło´s, G., Murray, D. G., Hand, S., and Fetterman, M. A. (2009). Satori: En-
lightened page sharing. In 2009 USENIX Annual Technical Conference, pages
115, Berkeley, CA, USA. USENIX. (Cited on page 152.)
Moser, M. and O’Brien, T. (2011). The Hudson Book. Oracle Inc., 1 edition.
(Cited on page 42.)
Muthitacharoen, A., Chen, B., and Mazières, D. (2001). A low-bandwidth
network file system. In Proceedings of the 18th ACM Symposium on Operating
Systems Principles (SOSP 01), pages 174187, Chateau Lake Louise, Banff,
Canada. (Cited on page 178.)
Nadiminti, K., Assuncao, M. D. D., and Buyya, R. (2006). Distributed systems
and recent innovations: Challenges and benefits. InfoNet Magazine, 16(3):15.
(Cited on page 157.)
Open Source Initiative (2012). The open source definition. http://www.
opensource.org/docs/OSD. (Cited on page 7.)
Opscode (2011). Chef - opscode. http://www.opscode.com/chef. (Cited
on pages 60, 122, and 126.)
Oreizy, P., Medvidovic, N., and Taylor, R. N. (1998). Architecture-based run-
time software evolution. In Proceedings of the 20th international conference on
Software engineering, ICSE 98, pages 177186, Washington, DC, USA. IEEE
Computer Society. (Cited on pages 115 and 116.)
Pant, R. (2009). Organizing a digital technology department of medium size
in a media company. Available at http://www.rajiv.com/blog/2009/
03/17/technology-department. (Cited on page 13.)
Papazoglou, M. P., Traverso, P., Dustdar, S., and Leymann, F. (2007). Service-
oriented computing: State of the art and research challenges. Computer,
40:3845. (Cited on pages 8 and 65.)
Pfeiffer, T. (1998). Windows DLLs: Threat or Menace? Available at http:
//drdobbs.com/184410810. (Cited on pages 7 and 220.)
Puppet Labs (2011). Puppet platform: An open source platform for en-
terprise systems management. http://www.puppetlabs.com/puppet/
puppet-platform. (Cited on pages 60, 122, and 126.)
Raymond, E. S. (2003). The Art of Unix Programming. Thyrsus Enter-
prises. Also available at: http://www.faqs.org/docs/artu. (Cited on
page 57.)
260
Red Hat, Inc. (2009). Red Hat Enterprise Linux 5 Virtualization Guide. Red Hat,
Inc., fourth edition. (Cited on page 147.)
Reimer, D., Thomas, A., Ammons, G., Mummert, T., Alpern, B., and Bala, V.
(2008). Opening black boxes: Using semantic information to combat virtual
machine image sprawl. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS
International Conference on Virtual Execution Environments, pages 111120.
ACM. (Cited on page 153.)
Rigole, P. and Berbers, Y. (2006). Resource-driven collaborative component
deployment in mobile environments. In Proc. of the International Conference on
Autonomic and Autonomous Systems, ICAS 06, Washington, DC, USA. IEEE
Computer Society. (Cited on page 116.)
Rosen, L. (2004). Open Source Licensing: Software Freedom and Intellectual Prop-
erty Law. Prentice Hall. (Cited on page 199.)
Roussain, H. and Guidec, F. (2005). Cooperative component-based software
deployment in wireless ad hoc networks. In CD 05: Proc. of the 3rd Work-
ing Conf. on Component Deployment, pages 115. Springer-Verlag. (Cited on
pages 94 and 245.)
Rutherford, M. J., Anderson, K. M., Carzaniga, A., Heimbigner, D., and Wolf,
A. L. (2002). Reconfiguration in the Enterprise JavaBean Component Model.
In CD 02: Proc. of the IFIP/ACM Working Conf. on Component Deployment,
pages 6781. Springer-Verlag. (Cited on pages 66 and 93.)
Rutherford, M. J., Carzaniga, A., and Wolf, A. L. (2008). Evaluating test suites
and adequacy criteria using simulation-based models of distributed systems.
IEEE Transactions on Software Engineering, 34(4):452470. (Cited on pages 14
and 153.)
Schneider, B. (1996). Applied Cryptography. John Wiley & Sons, second edi-
tion. (Cited on page 24.)
Skeen, D. and Stonebraker, M. (1987). A formal model of crash recovery in a
distributed system. In Concurrency control and reliability in distributed systems,
pages 295317, New York, NY, USA. Van Nostrand Reinhold Co. (Cited on
pages 160 and 161.)
Software Freedom Law Center (2008). BusyBox Developers Agree To End
GPL Lawsuit Against Verizon. http://www.softwarefreedom.org/
news/2008/mar/17/busybox-verizon/. (Cited on page 183.)
Stevens, W. R. and Rago, S. A. (2005). Advanced Programming in the UNIX
Environment. Addison-Wesley, second edition. (Cited on page 140.)
Sudmann, N. P. and Johansen, D. (2002). S oftware deployment using mo-
bile agents. In CD 02: Proc. of the IFIP/ACM Working Conf. on Component
Deployment, pages 97107. Springer-Verlag. (Cited on page 94.)
Bibliography 261
Szyperski, C., Gruntz, D., and Murer, S. (2002). Component Software: Beyond
Object-Oriented Programming. Addison Wesley, second edition. (Cited on
pages 6, 221, and 232.)
Tanenbaum, A. S. and Woodhull, A. S. ( 1997). Operating Systems: Design and
Implementation. Prentice Hall, second edition. (Cited on page 5.)
Taylor, R. N., Medvidovic, N., and Dashofy, E. M. (2010). Software Architec-
ture: Foundations, Theory, and Practice. John Wiley & Sons, Inc. (Cited on
pages 15, 49, 50, 56, and 57.)
The JBoss Visual Design Team (2012). Hibernate community documen-
tation. Available at http://docs.jboss.org/hibernate/orm/4.1/
quickstart/en-US/html. (Cited on page 206.)
The Joint Task Force on Computing Curricula (2004). Curriculum guide-
lines for undergraduate degree programs in software engineering. Software
Engineering 2004. (Cited on page 3.)
The Linux Foundation (2010). Linux Standard Base Core Specification
4.1. Available at http://refspecs.linux-foundation.org/LSB_4.
1.0/LSB-Core-generic/LSB-Core-generic/book1.html. (Cited on
page 46.)
The Linux Foundation (2012). FHS 3.0 Draft 1. Available
at http://www.linuxfoundation.org/collaborate/workgroups/
lsb/fhs-30-draft-1. (Cited on page 35.)
Traugott, S. and Brown, L. (2002). Why order matters: Turing equivalence in
automated systems administration. In Proceedings of the 16th Systems Admin-
istration Conference (LISA 02), pages 99120. USENIX. (Cited on pages 125
and 152.)
Trezentos, P., Lynce, I., and Oliveira, A. L. (2010). Apt-pbo: solving the soft-
ware dependency problem using pseudo-boolean optimization. In Proceed-
ings of the IEEE/ACM international conference on Automated software engineering,
ASE 10, pages 427436, New York, NY, USA. ACM. (Cited on page 47.)
Tu, Q. and Godfrey, M. W. (2001). The build-time software architecture view.
In ICSM, pages 398407. (Cited on page 198.)
Tuunanen, T., Koskinen, J., and Kärkkäinen, T. (2009). Automated soft-
ware license analysis. Automated Software Engineering, 16:455490. (Cited
on pages 187 and 199.)
Tuunanen, T., Koskinen, J., and Kärkkäinen, T. (2006). Retrieving open source
software licenses. In Damiani, E., Fitzgerald, B., Scacchi, W., Scotto, M., and
Succi, G., editors, Open Source Systems, IFIP Working Group 2.13 Foundation on
Open Source Software, volume 203 of IFIP, pages 3546. Springer. (Cited on
page 199.)
262
Van den Brand, M. G. T., de Jong, H. A., Klint, P., and Olivier, P. A. (2000).
Efficient annotated terms. Softw. Pract. Exper., 30(3):259291. (Cited on
page 208.)
van der Burg, S. (2011a). An evaluation and comparison of GoboLinux.
Available at: http://sandervanderburg.blogspot.com/2011/
12/evaluation-and-comparison-of-gobolinux.html. (Cited on
page 48.)
van der Burg, S. (2011b). Deploying .NET applications with the Nix package
manager. Available at: http://sandervanderburg.blogspot.com/
2011/09/deploying-net-applications-with-nix.html. (Cited on
page 220.)
van der Burg, S. (2011c). Deploying .NET applications with the Nix package
manager (part 2). Available at: http://sandervanderburg.blogspot.
com/2011/09/deploying-net-applications-with-nix_14.html.
(Cited on page 220.)
van der Burg, S. (2011d). Deploying .NET services with Disnix. Available
at: http://sandervanderburg.blogspot.nl/2011/10/deploying-
net-services-with-disnix.html. (Cited on page 220.)
van der Burg, S. (2011e). Disnix user’s guide. Available at: http://nixos.
org/disnix/docs.html. (Cited on page 82.)
van der Burg, S. (2011f). Free (and open-source) software. Available
at: http://sandervanderburg.blogspot.com/2011/02/free-and-
open-source-software.html. (Cited on page 7.)
van der Burg, S. (2011g). On Nix, NixOS and the Filesystem Hierarchy Stan-
dard (FHS). Available at: http://sandervanderburg.blogspot.com/
2011/11/on-nix-nixos-and-filesystem-hierarchy.html. (Cited
on page 48.)
van der Burg, S. (2012). A generic approach for deploying and upgrading
mutable software components. In Dig, D. and Wahler, M., editors, Fourth
Workshop on Hot Topics in Software Upgrades (HotSWUp). IEEE Computer So-
ciety. (Cited on page 21.)
van der Burg, S., Davies, J., Dolstra, E., German, D. M., and Hemel, A. (2012).
Discovering software license constraints: Identifying a binary’s sources by
tracing build processes. Technical Report TUD-SERG-2012-010, Software En-
gineering Research Group, Delft, The Netherlands. (Cited on page 21.)
van der Burg, S. and Dolstra, E. (2010a). Automated deployment of a hetero-
geneous service-oriented system. In 36th EUROMICRO Conference on Software
Engineering and Advanced Applications (SEAA). IEEE Computer Society. (Cited
on pages 20, 73, and 127.)
Bibliography 263
van der Burg, S. and Dolstra, E. (2010b). Automating system tests using
declarative virtual machines. In 21st IEEE International Symposium on Software
Reliability Engineering (ISSRE). IEEE Computer Society. (Cited on page 20.)
van der Burg, S. and Dolstra, E. (2010c). Declarative testing and deployment
of distributed systems. Technical Report TUD-SERG-2010-020, Software En-
gineering Research Group, Delft, The Netherlands. (Cited on page 21.)
van der Burg, S. and Dolstra, E. (2010d). Disnix: A toolset for distributed
deployment. In van den Brand, M., Mens, K., Kienle, H., and Cleve, A.,
editors, Third International Workshop on Academic Software Development Tools
and Techniques (WASDeTT-3). WASDeTT. (Cited on pages 20, 73, and 92.)
van der Burg, S. and Dolstra, E. (2011). A self-adaptive deployment frame-
work for service-oriented systems. In 6th International Symposium on Software
Engineering for Adaptive and Self-Managing Systems (SEAMS). ACM. (Cited on
page 20.)
van der Burg, S. and Dolstra, E. (2012). Disnix: A toolset for distributed
deployment. Science of Computer Programming, (0):–. (Cited on page 20.)
van der Burg, S., Dolstra, E., and de Jonge, M. (2008). Atomic upgrading of
distributed systems. In Dumitras, T., Dig, D., and Neamtiu, I., editors, First
ACM Workshop on Hot Topics in Software Upgrades (HotSWUp). ACM. (Cited
on pages 21, 73, and 92.)
van der Burg, S., Dolstra, E., de Jonge, M., and Visser, E. (2009). Software
deployment in a dynamic cloud: From device to ser vice orientation in a hos-
pital environment. In Bhattacharya, K., Bichler, M., and Tai, S., editors, First
ICSE 2009 Workshop on Software Engineering Challenges in Cloud Computing.
IEEE Computer Society. (Cited on page 20.)
van der Hoek, A., Hall, R. S., Heimbigner, D., and Wolf, A. L. (1997). Soft-
ware release management. In Proceedings of the 6th European SOFTWARE
ENGINEERING conference held jointly with the 5th ACM SIGSOFT international
symposium on Foundations of software engineering, ESEC 97/FSE-5, pages 159
175, New York, NY, USA. Springer-Verlag New York, Inc. (Cited on page 4.)
Vandewoude, Y., Ebraert, P., Berbers, Y., and D’Hondt, T. (2007). Tranquil-
ity: A low disruptive alternative to quiescence for ensuring safe dynamic
updates. IEEE Transactions on Software Engineering, 33(12):856868. (Cited on
page 166.)
Vermolen, S. D., Wachsmuth, G., and Visser, E. (2011). Generating database
migrations for evolving web applications. In Proceedings of the 10th ACM
international conference on Generative programming and component engineering,
GPCE 11, pages 8392, New York, NY, USA. ACM. (Cited on page 218.)
Visser, E. (1997). Syntax Definition for Language Prototyping. PhD thesis, Uni-
versity of Amsterdam. (Cited on page 208.)
264
Visser, E. (2008). WebDSL: A case study in domain-specific language engi-
neering. In Lammel, R., Saraiva, J., and Visser, J., editors, Generative and Trans-
formational Techniques in Software Engineering (GTTSE 2007), Lecture Notes in
Computer Science. Springer. (Cited on pages 20, 58, 203, and 204.)
Visual C# tutorials (2011). Installing sql server 2008 express.
http://visualcsharptutorials.com/2011/05/installing-
sql-server-2008-express. (Cited on page 229.)
Wang, Y., Rutherford, M. J., Carzaniga, A., and Wolf, A. L. (2005). Au-
tomating experimentation on distributed testbeds. In Proceedings of the 20th
IEEE/ACM International Conference on Automated Software Engineering, pages
164173. ACM. (Cited on page 153.)
Wegner, P. (1976). Research paradigms in computer science. In Proceed-
ings of the 2nd international conference on Software engineering, ICSE 76, pages
322330, Los Alamitos, CA, USA. IEEE Computer S ociety Press. (Cited on
page iv.)
Wichadakul, D. and Nahrstedt, K. (2002). A Translation System for Enabling
Flexible and Efficient Deployment of QoS-Aware Applications in Ubiquitous
Environments. In CD 02: Proc. of the 1st Working Conf. on Component Deploy-
ment, pages 287310. Springer-Verlag. (Cited on page 116.)
Bibliography 265
266
Samenvatting
E E N R E F E R E N T I E A R C H I T E C T U U R V O O R
G E D I S T R I B U E E R D E S O F T WA R E D E P L O Y M E N T
Sander van der Burg
Vandaag de dag zijn veel software systemen groter en complexer dan men
denkt. Om die rede is in de laatste 30 jaar software engineering een belangrij-
ke wetenschappelijke onderzoeksdiscipline geworden binnen de informatica.
Binnen deze onderzoeksdiscipline worden diverse activiteiten die te maken
hebben met het bouwen van software bestudeerd, onderzocht en verbeterd,
waaronder het specificeren van eisen van het systeem, het maken van sys-
teemontwerpen, het schrijven en begrijpen van programmeercode en het tes-
ten van systemen.
Naast het er voor zorgen dat een systeem correct gebouwd wordt en het te
laten doen wat de opdrachtgever heeft gevraagd, moeten ze ook beschikbaar
voor gebruik worden gemaakt aan eindgebruikers of getest worden binnen
een afgesloten testomgeving, zodat kan worden gecontroleerd op juistheid.
Dit proces wordt binnen het software engineering domein software deployment
genoemd, dat kan worden omschreven als alle activiteiten die noodzakelijk
zijn om een systeem beschikbaar te maken voor gebruik. Voorbeelden van
zulke activiteiten zijn het bouwen software componenten, het overbrengen van
componenten van de leverancier naar de klant, het activeren van software com-
ponenten en upgraden. Software deployment is een steeds complexer wordend
probleem en is in het algemeen onderschat door zowel mensen in de onder-
zoekswereld als mensen in de praktijk.
S O F T WA R E D E P L O Y M E N T C O M P L E X I T E I T
De rede dat software deployment zo complex is geworden heeft diverse oor-
zaken. Vroeger toen de eerste digitale computers werden gebouwd, was soft-
ware deployment een niet bestaand probleem. Programma’s werden direct
in machinetaal geschreven voor één specifiek soort apparaat. Programmeren
was over het algemeen heel complex en naast het feit dat een programma
correct moest werken, was het ook vereist voor een programmeur precies te
weten hoe een machine werkt en hoe randapparatuur, zoals printers, aange-
stuurd konden worden. Zodra nieuwe computers verschenen werden oude
programma’s weggegooid en opnieuw geschreven.
Om met deze complexiteit om te gaan zijn er diverse nieuwe technieken
ontwikkeld. Er zijn nieuwe hogere programmeertalen ontwikkeld, die dich-
ter staan bij het domein van een probleem, dan de details van de machine.
Ook zijn er besturingssystemen gemaakt die de complexiteit van randappa-
ratuur verbergen, zodat deze eenvoudig vanuit programma’s kunnen worden
267
aangestuurd. Daarnaast zijn programma’s ook niet langer zelfstandig ze
gebruiken kant-en-klare software componenten die de programmeur diver-
se standaardfunctionaliteit verschaffen, zoals het maken van grafische user-
interfaces, die vensters en knoppen bevatten. Al deze factoren hebben de
productiviteit van programmeurs en de kwaliteit van computerprogramma’s
aanzienlijk verbeterd.
Bovendien worden computerprogramma’s ook niet langer meer specifiek
ontwikkeld voor één soort apparaat. Vandaag de dag worden computer pro-
gramma’s als diensten via het Internet beschikbaar gemaakt (zoals Facebook
7
en Twitter
8
) en moeten gebruikt kunnen worden op een breed scala van ap-
paraten variërend van bureau computers, tablets tot mobiele telefoons.
Helaas hebben al deze ontwikkelingen ook een negatieve kant. Omdat pro-
gramma’s niet langer zelfstandige objecten zijn, moet men om er zeker van te
zijn dat ze goed werken er ook zeker van zijn dat alle afhankelijkheden aan-
wezig zijn (zoals andere software componenten en de compilers) en correct
zijn. Hoewel dit logisch klinkt, blijkt in de praktijk toch vaak bepaalde com-
ponenten te ontbreken of niet van de juiste versie te zijn, dat er voor zorgt dat
een programma vaak helemaal niet werkt. Daarnaast moeten programma’s
soms ook worden geupgraded, omdat het opnieuw installeren van systemen
vaak tijdrovend is. Vaak zijn deze processen gevaarlijk en destructief als een
upgrade faalt, dan kan een systeem in een toestand achter blijven waarin het
niet meer bruikbaar is. Daarnaast is het ook lastig (of soms onmogelijk) om
weer terug te gaan naar een oude versie. Naast de moeilijkheden van de de-
ployment activiteiten zelf, worden deze ook vaak handmatig uitgevoerd door
systeembeheerders of ontwikkelaars, dat veel tijd kost en een grote kans op
fouten geeft. Tijdens zulke activiteiten is vaak een systeem tijdelijk volledig
of ten dele niet beschikbaar, wat niet altijd wenselijk is.
Moderne generatie systemen zijn nog complexer dan gewone desktop pro-
gramma’s. Hoewel ze er voor eindgebruikers vaak eenvoudig uitzien, zijn
ze opgebouwd uit componenten gemaakt in verschillende programmeertalen,
die verschillende soorten componenten gebruiken en die zich op verschillende
machines bevinden in een data center of op het Internet. Deze componenten
zijn afhankelijk van elkaar en het ontbreken van een afhankelijkheid kan er toe
leiden dat een systeem niet langer functioneert. Verder kunnen deze compo-
nenten ook geïnstalleerd staan op machines met verschillende eigenschappen,
zoals besturingssystemen en een verschillende hoeveelheid systeembronnen.
Verder zijn er ook allerlei niet-functionele aspecten van belang tijdens het
deployen van moderne generatie systemen. Bijvoorbeeld, als een systeem pri-
vacy gevoelige informatie verschaft, is het vaak niet wenselijk een dergelijk
component te deployen op een plaats waar het algemeen toegankelijk is. Ver-
der kunnen er ook machines in een netwerk uitvallen, wat er toe kan leiden
dat delen van een systeem wegvallen en het niet langer functioneert. Daarom
kan het wenselijk zijn om meerdere redundante instanties te deployen, die
als uitval mogelijkheid kunnen dienen. Een ander belangrijk aspect is dat er
7
http://www.facebook.com
8
http://www.twitter.com
268
vaak gebruik wordt gemaakt door software componenten van externe partijen,
zoals vrije en open source software componenten. Hoewel deze vaak koste-
loos beschikbaar zijn, moeten er wel bepaalde licentievoorwaarden opgevolgd
worden. Daarom is het van essentieel belang te weten welke componenten er
gebruikt worden en wat hun voorwaarden beschrijven.
Niet-functionele deploymentaspecten zijn vaak specifiek tot een bepaald
domein. Voor het medische domein kan privacy erg belangrijk zijn, terwijl
voor het financiële betrouwbaarheid een grote rol speelt.
Tot slot is het vandaag de dag ook wenselijk om snel waarde te kunnen
leveren veel software producten worden ontwikkeld d.m.v. Agile ontwikke-
lingsmethoden, waarin telkens korte iteraties van ca. 2 tot 4 weken worden
uitgevoerd en waarin aan het eind van een iteratie (sprint) telkens een nieuw
prototype van het product wordt opgeleverd. Om deze te kunnen leveren
of testen is het snel en juist uitvoeren van software deployment activiteiten
cruciaal.
E E N R E F E R E N T I E A R C H I T E C T U U R V O O R G E D I S T R I B U E E R -
D E S O F T WA R E D E P L O Y M E N T
Om alle eerder genoemde redenen is er een volledig geautomatiseerde deploy-
ment oplossing nodig dat software op een betrouwbare, herproduceerbare,
generieke, uitbreidbare en efficiënte manier kan deployen. In eerder onder-
zoek gedaan door Eelco Dolstra, is de Nix package manager ontwikkeld, een
software tool dat software componenten geautomatiseerd kan deployen op
een betrouwbare, herproduceerbare, generieke en efficiënte wijze. Verder is
Nix een belangrijke basis voor twee andere toepassingen NixOS is een Linux
distributie dat volledig gedeployed wordt via Nix en gehele systeemconfigu-
raties kan uitrollen op basis van een model. Hydra is een bouw en testomge-
ving gebouwd op Nix.
Hoewel Nix en de Nix toepassingen een aantal significante voordelen bie-
den t.o.v. conventionele deployment oplossingen, zijn er bepaalde aspecten
van moderne generatie systemen die niet ondersteund worden. Een van de as-
pecten is dat Nix beperkt is tot het beheren van lokale of intra-afhankelijkheden.
Moderne generatie systemen bestaan uit componenten die op verschillende
plaatsen staan die ook afhankelijkheden op elkaar hebben
(inter-afhankelijkheden). Deze moeten ook gemodelleerd en geautomatiseerd
kunnen worden. Verder biedt Nix geen oplossingen om om te kunnen gaan
met niet-functionele eisen, zoals privacy, betrouwbaarheid, het herstellen van
deployment bij een storing, of het nakomen van licentievoorwaarden van ex-
terne componenten.
Om een oplossing te kunnen bieden voor moderne generatie systemen is
een nieuwe geautomatiseerde oplossing nodig dat deze wenselijke kwaliteits-
attributen biedt. Echter zijn moderne generatie systemen zo divers en variëren
niet-functionele eisen zodanig per domein, dat het vrijwel onmogelijk is een
universele oplossing te bieden dat al deze systemen ondersteund. In plaats
daarvan, definiëren wij een referentiearchitectuur voor gedistribueerde deploy-
Samenvatting 269
ment deze referentiearchitectuur bevat diverse componenten die gedistri-
bueerde software deployment activiteiten automatiseren op een betrouwbare,
herproduceerbare, generieke en efficiënte wijze. Bovendien biedt de referen-
tiearchitectuur integratie mogelijkheden voor domein-specifieke componen-
ten die oplossingen bieden voor wenselijke niet-functionele eisen.
De referentiearchitectuur kan door een ontwikkelaar worden getransfor-
meerd in een concrete architectuur voor een geautomatiseerde deployment
tool, dat gewenste functionele en niet-functionele deployment aspecten im-
plementeerd voor een zeker domein. Een belangrijk ingrediënt om er voor
te zorgen dat onze gewenste kwaliteitsattributen worden nagestreefd, gebrui-
ken wij een belangrijke architectuurpatroon dat we purely functional batch proces-
sing noemen, waar mee we de componenten met elkaar integreren. Daarnaast
hebben wij Nix en NixOS in de referentiearchitectuur geïntegreerd dat zorgt
voor de deployment van lokale systemen en een aantal nieuwe componen-
ten ontwikkeld die er voor zorgen dat cruciale gedistribueerde deployment
activiteiten kunnen worden uitgevoerd.
D E A S P E C T E N VA N D E R E F E R E N T I E A R C H I T E C T U U R
Een groot deel van dit proefschrift is gewijd aan nieuwe componenten in de
referentiearchitectuur die diverse software deployment aspecten automatise-
ren, waaronder:
Disnix (Hoofdstuk 4) is een tool dat Nix uitbreid, door het mogelijk te
maken gedistribueerde componenten te modelleren (inclusief de inter-
afhankelijkheden) en ze automatisch te deployen in een netwerk van ma-
chines met verschillende eigenschappen, op een betrouwbare, herprodu-
ceerbare, generieke en efficiënte wijze. Hoofdstuk 8 besteedt aandacht
aan het atomaire upgrade aspect van Disnix, dat ook in andere onderdelen
van de referentiearchitectuur gebruikt kan worden.
Dynamic Disnix (Hoofdstuk 5) is een generieke extentie bovenop Disnix,
dat het mogelijk maakt componenten dynamisch te deployen op ma-
chines in een netwerk, op basis van hun functionele en niet-functionele
eigenschappen die door gebruikers zelf gedefinieerd kunnen worden.
Daarnaast kan deze extentie ook een systeem automatisch en efficiënt
herdeployen, als er een gebeurtenis optreedt, zoals een machine die door
storing wegvalt in een netwerk.
DisnixOS (Hoofdstuk 6) is een aanvulling op Disnix dat naast gedis-
tribueerde componenten door Disnix, ook netwerken van NixOS sys-
teemconfiguraties kan deployen op zowel fysieke hardware als virtuele
hardware bij bijvoorbeeld een cloud provider, zoals Amazon EC2, via
een tool genaamd charon.
NixOS test driver (Hoofdstuk 7) toont een gedistribueerde test methode
op basis van NixOS. Doordat Nix en NixOS herproduceerbaarheid erg
goed kan garanderen, kunnen wij ook efficiënt netwerken van virtuele
270
machines maken voor test doeleinden, door componenten die hetzelfde
zijn te delen.
Dysnomia (Hoofdstuk 9) beschrijft een prototype dat dynamische com-
ponenten kan deployen. Componenten in Nix en Nix toepassingen kun-
nen echter nadat ze gebouwd worden niet meer worden aangepast, om
betrouwbaarheid en herproduceerbaarheid te garanderen. Echter, som-
mige type componenten zoals databases kunnen niet op deze manier
worden beheerd. Dysnomia biedt een oplossing om deze componenten
op betrouwbare, herproduceerbare en generieke wijze te beheren.
grok-trace (Hoofdstuk 10) beschrijft een prototype dat dynamisch kan
achterhalen welke bestanden en op welke manier ze gebruikt worden
in een bouw proces. In combinatie met een licentie analyse tool kan
worden bepaald of de licenties op de juiste manier worden nageleefd.
De laatste twee hoofdstukken beschrijven hoe deze aspecten kunnen wor-
den toegepast. Hoofdstuk 11 beschrijft een voorbeeld waarin we onderdelen
uit de referentiearchitectuur gebruiken om de deployment van WebDSL appli-
caties te automatiseren, een web ontwikkelomgeving dat wij intern gebruiken
op de TU Delft. Hoofdstuk 12 beschrijft hoe we in ongewone omgevingen,
met ongewone kenmerken, de onderdelen kunnen toepassen zoals een soft-
ware systeem gebruik makend van .NET technologie.
R E S U LTA AT
Uiteindelijk zijn we in staat geweest de onderdelen in de referentiearchitec-
tuur te implementeren en een groot deel van onze gewenste kwaliteitsattribu-
ten te bereiken. Een belangrijke kanttekening is wel dat om deze onderdelen
optimaal te benutten voor een willekeurig systeem, deze wel goed opgedeeld
moet zijn in componenten, die allemaal gemodelleerd moeten zijn.
Verder zijn niet alle onderdelen even volwassen, zijn er openstaande punten
die voor verbetering vatbaar zijn en nieuwe gebieden die onderzocht moeten
worden. Belangrijke openstaande aspecten zijn dat dynamische componenten
niet efficiënt gedeployed kunnen worden en dat alleen eenvoudige netwerk
topologieën ondersteund worden.
Samenvatting 271
272
Curriculum Vitae
Sander van der Burg was born on May 24th 1984 in Numansdorp, The Nether-
lands.
E D U C AT I O N
2008 2012
PhD Software Engineering
Delft University of Technology
Dissertation title: “A Reference Architecture for Distributed Software
Deployment”
Followed courses: Scientific Writing in English, “The Art of Presenting
Science”, Philips Informatics Infrastructure Basic training.
2005 2009
M.Sc. Computer Science
Delft University of Technology
Specialisation: Software Evolution, Model-driven Software Engineer-
ing
Free-elective courses: Computer graphics, Speech and language pro-
cessing, Methodology and Ethics
2001 2005 B.ICT. Computer Science
Rotterdam University of Applied Sciences
Specialisation: Software Engineering
Free elective courses: Linux assembly, Sailing license, Visual Basic,
Astronomy
1996 2001 HAVO
RSG Hoeksche Waard
Languages: Dutch, English, French 1 (speech and listening skills)
Profile: Nature & Technology
Free elective courses: Economics 1, Computer Science
W O R K E X P E R I E N C E
2012 present Software Architect
Conference Compass B.V.
2008 2012 PhD candidate
Delft University of Technology
273
Working on a PhD dissertation titled: “A Reference Architecture for
Distributed Software Deployment”
Developing software tools related to software deployment and domain-
specific languages: NixOS, Disnix, WebDSL
Teaching assistant for the “Concepts of programming languages” lab
course, covering several programming language paradigms, such as
Scala (functional programming), C (low-level) and JavaScript (prototype-
based object orientation)
2009 2012 Visiting researcher
Koninklijke Philips Electronics N.V.
Applying deployment research on a medical imaging software platform
2008 Intern Healthcare Systems Architecture
Koninklijke Philips Electronics N.V.
Development of the first prototype of Disnix
Applying development techniques on an experimental medical system
2002 2008 IT specialist
Bosman Watermanagement B.V.
Web application framework development based on PHP, Apache and
MySQL
Fat-client Java application development
Integration of custom applications with an ERP system
Supporting system administration tasks and management of Linux,
Novell Netware, and Microsoft Windows 2000 machines.
2003 and 2005 Intern Hone/Link ADM
IBM Netherlands B.V.
Developing a system measuring the usage of Java EE applications in an
IBM zSeries mainframe environment for cost distribution.
274
Titles in the IPA Dissertation Series Since 2007
H.A. de Jong. Flexible Heterogeneous Software
Systems. Faculty of Natural Sciences, Mathe-
matics, and Computer Science, UvA. 2007-01
N.K. Kavaldjiev. A run-time reconfigurable
Network-on-Chip for streaming DSP applications.
Faculty of Electrical Engineering, Mathematics
& Computer Science, UT. 2007-02
M. van Veelen. Considerations on Model-
ing for Early Detection of Abnormalities in Lo-
cally Autonomous Distributed Systems. Fac-
ulty of Mathematics and Computing Sciences,
RUG. 2007-03
T.D. Vu. Semantics and Applications of Process
and Program Algebra. Faculty of Natural Sci-
ences, Mathematics, and Computer Science,
UvA. 2007-04
L. Brandán Briones. Theories for Model-based
Testing: Real-time and Coverage. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2007-05
I. Loeb. Natural Deduction: Sharing by Presen-
tation. Faculty of Science, Mathematics and
Computer Science, RU. 2007-06
M.W.A. Streppel. Multifunctional Geometric
Data Structures. Faculty of Mathematics and
Computer Science, TU/e. 2007-07
N. Trˇcka. Silent Steps in Transition Systems and
Markov Chains. Faculty of Mathematics and
Computer Science, TU/e. 2007-08
R. Brinkman. Searching in encrypted data. Fac-
ulty of Electrical Engineering, Mathematics &
Computer Science, UT. 2007-09
A. van Weelden. Putting types to good use. Fac-
ulty of Science, Mathematics and Computer
Science, RU. 2007-10
J.A.R. Noppen. Imperfect Information in Soft-
ware Development Processes. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2007-11
R. Boumen. Integration and Test plans for Com-
plex Manufacturing Systems. Faculty of Me-
chanical Engineering, TU/e. 2007-12
A.J. Wijs. What to do Next?: Analysing and Op-
timising System Behaviour in Time. Faculty of
Sciences, Division of Mathematics and Com-
puter Science, VUA. 2007-13
C.F.J. Lange. Assessing and Improving the Qual-
ity of Modeling: A Series of Empirical Studies
about the UML. Faculty of Mathematics and
Computer Science, TU/e. 2007-14
T. van der Storm. Component-based Configura-
tion, Integration and Delivery. Facult y of Natu-
ral Sciences, Mathematics, and Computer Sci-
ence,UvA. 2007-15
B.S. Graaf. Model-Driven Evolution of Soft-
ware Architectures. Faculty of Electrical Engi-
neering, Mathematics, and Computer Science,
TUD. 2007-16
A.H.J. Mathijssen. Logical Calculi for Reason-
ing with Binding. Faculty of Mathematics and
Computer Science, TU/e. 2007-17
D. Jarnikov. QoS framework for Video Streaming
in Home Networks. Faculty of Mathematics and
Computer Science, TU/e. 2007-18
M. A. Abam. New Data Structures and Algo-
rithms for Mobile Data. Faculty of Mathematics
and Computer Science, TU/e. 2007-19
W. Pieters. La Volonté Machinale: Understand-
ing the Electronic Voting Controversy. Faculty of
Science, Mathematics and Computer Science,
RU. 2008-01
A.L. de Groot. Practical Automaton Proofs
in PVS. Faculty of Science, Mathematics and
Computer Science, RU. 2008-02
M. Bruntink. Renovation of Idiomatic Cross-
cutting Concerns in Embedded Systems. Faculty
of Electrical Engineering, Mathematics, and
Computer Science, TUD. 2008-03
A.M. Marin. An Integrated System to Manage
Crosscutting Concerns in Source Code. Faculty
of Electrical Engineering, Mathematics, and
Computer Science, TUD. 2008-04
N.C.W.M. Braspenning. Model-based Integra-
tion and Testing of High-tech Multi-disciplinary
Systems. Faculty of Mechanical Engineering,
TU/e. 2008-05
275
M. Bravenboer. Exercises in Free Syntax:
Syntax Definition, Parsing, and Assimilation of
Language Conglomerates. Faculty of Science,
UU. 2008-06
M. Torabi Dashti. Keeping Fairness Alive: De-
sign and Formal Verification of Optimistic Fair
Exchange Protocols. Faculty of Sciences, Divi-
sion of Mathematics and Computer Science,
VUA. 2008-07
I.S.M. de Jong. Integration and Test Strategies
for Complex Manufacturing Machines. Faculty
of Mechanical Engineering, TU/e. 2008-08
I. Hasuo. Tracing Anonymity with Coalgebras.
Faculty of Science, Mathematics and Com-
puter Science, RU. 2008-09
L.G.W.A. Cleophas. Tree Algorithms: Two Tax-
onomies and a Toolkit. Faculty of Mathematics
and Computer Science, TU/e. 2008-10
I.S. Zapreev. Model Checking Markov Chains:
Techniques and Tools. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2008-11
M. Farshi. A Theoretical and Experimental Study
of Geometric Networks. Faculty of Mathematics
and Computer Science, TU/e. 2008-12
G. Gulesir. Evolvable Behavior Specifications Us-
ing Context-Sensitive Wildcards. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2008-13
F.D. Garcia. Formal and Computational Cryptog-
raphy: Protocols, Hashes and Commitments. Fac-
ulty of Science, Mathematics and Computer
Science, RU. 2008-14
P. E. A. Dürr. Resource-based Verification for Ro-
bust Composition of Aspects. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2008-15
E.M. Bortnik. Formal Methods in Support of
SMC Design. Faculty of Mechanical Engineer-
ing, TU/e. 2008-16
R.H. Mak. Design and Performance Analysis
of Data-Independent Stream Processing Systems.
Faculty of Mathematics and Computer Sci-
ence, TU/e. 2008-17
M. van der Horst. Scalable Block Processing Al-
gorithms. Faculty of Mathematics and Com-
puter Science, TU/e. 2008-18
C.M. Gray. Algorithms for Fat Objects: Decom-
positions and Applications. Faculty of Mathe-
matics and Computer Science, TU/e. 2008-19
J.R. Calamé. Testing Reactive Systems with Data
- Enumerative Methods and Constraint Solving.
Faculty of Electrical Engineering, Mathemat-
ics & Computer Science, UT. 2008-20
E. Mumford. Drawing Graphs for Cartographic
Applications. Faculty of Mathematics and
Computer Science, TU/e. 2008-21
E.H. de Graaf. Mining Semi-structured Data,
Theoretical and Experimental Aspects of Pattern
Evaluation. Faculty of Mathematics and Natu-
ral Sciences, UL. 2008-22
R. Brijder. Models of Natural Computation:
Gene Assembly and Membrane Systems. Fac-
ulty of Mathematics and Natural Sciences,
UL. 2008-23
A. Koprowski. Termination of Rewriting and
Its Certification. Faculty of Mathematics and
Computer Science, TU/e. 2008-24
U. Khadim. Process Algebras for Hybrid Sys-
tems: Comparison and Development. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2008-25
J. Markovski. Real and Stochastic Time in Pro-
cess Algebras for Performance Evaluation. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2008-26
H. Kastenberg. Graph-Based Software Specifica-
tion and Verification. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2008-27
I.R. Buhan. Cryptographic Keys from Noisy Data
Theory and Applications. Faculty of Electrical
Engineering, Mathematics & Computer Sci-
ence, UT. 2008-28
R.S. Marin-Perianu. Wireless Sensor Networks
in Motion: Clustering Algorithms for Service Dis-
covery and Provisioning. Faculty of Electrical
Engineering, Mathematics & Computer Sci-
ence, UT. 2008-29
M.H.G. Verhoef. Modeling and Validating Dis-
tributed Embedded Real-Time Control Systems.
Faculty of Science, Mathematics and Com-
puter Science, RU. 2009-01
M. de Mol. Reasoning about Functional Pro-
grams: Sparkle, a proof assistant for Clean. Fac-
ulty of Science, Mathematics and Computer
Science, RU. 2009-02
M. Lormans. Managing Requirements Evolu-
tion. Faculty of Electrical Engineering, Mathe-
matics, and Computer Science, TUD. 2009-03
276
M.P.W.J. van Osch. Automated Model-based
Testing of Hybrid Systems. Faculty of Mathe-
matics and Computer Science, TU/e. 2009-04
H. Sozer. Architecting Fault-Tolerant Soft-
ware Systems. Faculty of Electrical Engi-
neering, Mathematics & Computer Science,
UT. 2009-05
M.J. van Weerdenburg. Efficient Rewriting
Techniques. Faculty of Mathematics and Com-
puter Science, TU/e. 2009-06
H.H. Hansen. Coalgebraic Modelling: Applica-
tions in Automata Theory and Modal Logic. Fac-
ulty of Sciences, Division of Mathematics and
Computer Science, VUA. 2009-07
A. Mesbah. Analysis and Testing of Ajax-based
Single-page Web Applications. Faculty of Electri-
cal Engineering, Mathematics, and Computer
Science, TUD. 2009-08
A.L. Rodriguez Yakushev. Towards Getting Ge-
neric Programming Ready for Prime Time. Fac-
ulty of Science, UU. 2009-9
K.R. Olmos Joffré. Strategies for Context Sensi-
tive Program Transformation. Faculty of Science,
UU. 2009-10
J.A.G.M. van den Berg. Reasoning about Java
programs in PVS using JML. Faculty of Sci-
ence, Mathematics and Computer Science,
RU. 2009-11
M.G. Khatib. MEMS-Based Storage Devices. In-
tegration in Energy-Constrained Mobile Systems.
Faculty of Electrical Engineering, Mathematics
& Computer Science, UT. 2009-12
S.G.M. Cornelissen. Evaluating Dynamic Anal-
ysis Techniques for Program Comprehension. Fac-
ulty of Electrical Engineering, Mathematics,
and Computer Science, TUD. 2009-13
D. Bolzoni. Revisiting Anomaly-based Network
Intrusion Detection Systems. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2009-14
H.L. Jonker. Security Matters: Privacy in Vot-
ing and Fairness in Digital Exchange. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2009-15
M.R. Czenko. TuLiP - Reshaping Trust Manage-
ment. Faculty of Electrical Engineering, Math-
ematics & Computer Science, UT. 2009-16
T. Chen. Clocks, Dice and Processes. Faculty of
Sciences, Division of Mathematics and Com-
puter Science, VUA. 2009-17
C. Kaliszyk. Correctness and Availability: Build-
ing Computer Algebra on top of Proof Assistants
and making Proof Assistants available over the
Web. Faculty of Science, Mathematics and
Computer Science, RU. 2009-18
R.S.S. O’Connor. Incompleteness & Complete-
ness: Formalizing Logic and Analysis in Type
Theory. Faculty of Science, Mathematics and
Computer Science, RU. 2009-19
B. Ploeger. Improved Verification Methods for
Concurrent Systems. Faculty of Mathematics
and Computer Science, TU/e. 2009-20
T. Han. Diagnosis, Synthesis and Analysis of
Probabilistic Models. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2009-21
R. Li. Mixed-Integer Evolution Strategies for Pa-
rameter Optimization and Their Applications to
Medical Image Analysis. Faculty of Mathemat-
ics and Natural Sciences, UL. 2009-22
J.H.P. Kwisthout. The Computational Complex-
ity of Probabilistic Networks. Faculty of Science,
UU. 2009-23
T.K. Cocx. Algorithmic Tools for Data-Oriented
Law Enforcement. Faculty of Mathematics and
Natural Sciences, UL. 2009-24
A.I. Baars. Embedded Compilers. Faculty of Sci-
ence, UU. 2009-25
M.A.C. Dekker. Flexible Access Control for Dy-
namic Collaborative Environments. Faculty of
Electrical Engineering, Mathematics & Com-
puter Science, UT. 2009-26
J.F.J. Laros. Metrics and Visualisation for Crime
Analysis and Genomics. Faculty of Mathematics
and Natural Sciences, UL. 2009-27
C.J. Boogerd. Focusing Automatic Code Inspec-
tions. Faculty of Electrical Engineering, Math-
ematics, and Computer Science, TUD. 2010-01
M.R. Neuhäußer. Model Checking Nondeter-
ministic and Randomly Timed Systems. Fac-
ulty of Electrical Engineering, Mathematics &
Computer Science, UT. 2010-02
J. Endrullis . Termination and Productivity. Fac-
ulty of Sciences, Division of Mathematics and
Computer Science, VUA. 2010-03
T. Staijen. Graph-Based Specification and Verifi-
cation for Aspect-Oriented Languages. Faculty of
Electrical Engineering, Mathematics & Com-
puter Science, UT. 2010-04
277
Y. Wang. Epistemic Modelling and Protocol Dy-
namics. Faculty of Science, UvA. 2010-05
J.K. Berendsen. Abstraction, Prices and Proba-
bility in Model Checking Timed Automata. Fac-
ulty of Science, Mathematics and Computer
Science, RU. 2010-06
A. Nugroho. The Effects of UML Modeling on
the Quality of Software. Faculty of Mathematics
and Natural Sciences, UL. 2010-07
A. Silva. Kleene Coalgebra. Faculty of Sci-
ence, Mathematics and Computer Science,
RU. 2010-08
J.S. de Bruin. Service-Oriented Discovery of
Knowledge - Foundations, Implementations and
Applications. Faculty of Mathematics and Nat-
ural Sciences, UL. 2010-09
D. Costa. Formal Models for Component Connec-
tors. Faculty of Sciences, Division of Mathe-
matics and Computer Science, VUA. 2010-10
M.M. Jaghoori. Time at Your Service: Schedula-
bility Analysis of Real-Time and Distributed Ser-
vices. Faculty of Mathematics and Natural Sci-
ences, UL. 2010-11
R. Bakhshi. Gossiping Models: Formal Analysis
of Epidemic Protocols. Faculty of Sciences, De-
partment of Computer Science, VUA. 2011-01
B.J. Arnoldus. An Illumination of the Template
Enigma: Software Code Generation with Tem-
plates. Faculty of Mathematics and Computer
Science, TU/e. 2011-02
E. Zambon. Towards Optimal IT Availability
Planning: Methods and Tools. Faculty of Elec-
trical Engineering, Mathematics & Computer
Science, UT. 2011-03
L. Astefanoaei. An Executable Theory of Multi-
Agent Systems Refinement. Faculty of Mathe-
matics and Natural Sciences, UL. 2011-04
J. Proença. Synchronous coordination of dis-
tributed components. Faculty of Mathematics
and Natural Sciences, UL. 2011-05
A. Moralı. IT Architecture-Based Confidential-
ity Risk Assessment in Networks of Organizations.
Faculty of Electrical Engineering, Mathematics
& Computer Science, UT. 2011-06
M. van der Bijl. On changing models in
Model-Based Testing. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2011-07
C. Krause. Reconfigurable Component Connec-
tors. Faculty of Mathematics and Natural Sci-
ences, UL. 2011-08
M.E. Andrés. Quantitative Analysis of Informa-
tion Leakage in Probabilistic and Nondeterministic
Systems. Faculty of Science, Mathematics and
Computer Science, RU. 2011-09
M. Atif. Formal Modeling and Verification of Dis-
tributed Failure Detectors. Faculty of Mathemat-
ics and Computer Science, TU/e. 2011-10
P.J.A. van Tilburg. From Computability to Exe-
cutability A process-theoretic view on automata
theory. Faculty of Mathematics and Computer
Science, TU/e. 2011-11
Z. Protic. Configuration management for models:
Generic methods for model comparison and model
co-evolution. Faculty of Mathematics and Com-
puter Science, TU/e. 2011-12
S. Georgievska. Probability and Hiding in Con-
current Processes. Faculty of Mathematics and
Computer Science, TU/e. 2011-13
S. Malakuti. Event Composition Model: Achiev-
ing Naturalness in Runtime Enforcement. Fac-
ulty of Electrical Engineering, Mathematics &
Computer Science, UT. 2011-14
M. Raffelsieper. Cell Libraries and Verifica-
tion. Faculty of Mathematics and Computer
Science, TU/e. 2011-15
C.P. Tsirogiannis. Analysis of Flow and Visibil-
ity on Triangulated Terrains. Faculty of Mathe-
matics and Computer Science, TU/e. 2011-16
Y.-J. Moon. Stochastic Models for Quality
of Service of Component Connectors. Fac-
ulty of Mathematics and Natural Sciences,
UL. 2011-17
R. Middelkoop. Capturing and Exploiting Ab-
stract Views of States in OO Verification. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2011-18
M.F. van Amstel. Assessing and Improving
the Quality of Model Transformations. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2011-19
A.N. Tamalet. Towards Correct Programs in
Practice. Faculty of Science, Mathematics and
Computer Science, RU. 2011-20
H.J.S. Basten. Ambiguity Detection for Program-
ming Language Grammars. Faculty of S cience,
UvA. 2011-21
278
M. Izadi. Model Checking of Component Con-
nectors. Faculty of Mathematics and Natural
Sciences, UL. 2011-22
L.C.L. Kats. Building Blocks for Language
Workbenches. Faculty of Electrical Engineer-
ing, Mathematics, and Computer Science,
TUD. 2011-23
S. Kemper. Modelling and Analysis of Real-Time
Coordination Patterns. Faculty of Mathematics
and Natural Sciences, UL. 2011-24
J. Wang. Spiking Neural P Systems. Fac-
ulty of Mathematics and Natural Sciences,
UL. 2011-25
A. Khosravi. Optimal Geometric Data Struc-
tures. Faculty of Mathematics and Computer
Science, TU/e. 2012-01
A. Middelkoop. Inference of Program Properties
with Attribute Grammars, Revisited. Faculty of
Science, UU. 2012-02
Z. Hemel. Methods and Techniques for the
Design and Implementation of Domain-Specific
Languages. Faculty of Electrical Engineer-
ing, Mathematics, and Computer Science,
TUD. 2012-03
T. Dimkov. Alignment of Organizational Secu-
rity Policies: Theory and Practice. Faculty of
Electrical Engineering, Mathematics & Com-
puter Science, UT. 2012-04
S. Sedghi. Towards Provably Secure Efficiently
Searchable Encryption. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2012-05
F. Heidar ian Dehkordi. Studies on Verifica-
tion of Wireless Sensor Networks and Abstrac-
tion Learning for System Inference. Faculty of
Science, Mathematics and Computer Science,
RU. 2012-06
K. Verbeek. Algorithms for Cartographic Visual-
ization. Faculty of Mathematics and Computer
Science, TU/e. 2012-07
D.E. Nadales Agut. A Compositional Inter-
change Format for Hybrid Systems: Design and
Implementation. Faculty of Mechanical Engi-
neering, TU/e. 2012-08
H. Rahmani. Analysis of Protein-Protein Interac-
tion Networks by Means of Annotated Graph Min-
ing Algorithms. Faculty of Mathematics and
Natural Sciences, UL. 2012-09
S.D. Vermolen. Software Language Evolution.
Faculty of Electrical Engineering, Mathemat-
ics, and Computer Science, TUD. 2012-10
L.J.P. Engelen. From Napkin Sketches to Reliable
Software. Faculty of Mathematics and Com-
puter Science, TU/e. 2012-11
F.P.M. Stappers. Bridging Formal Models An
Engineering Perspective. Faculty of Mathemat-
ics and Computer Science, TU/e. 2012-12
W. Heijstek. Software Architecture Design in
Global and Model-Centric Software Development.
Faculty of Mathematics and Natural Sciences,
UL. 2012-13
C. Kop. Higher Order Termination. Faculty
of Sciences, Department of Computer Science,
VUA. 2012-14
A. Osaiweran. Formal Development of Control
Software in the Medical Systems Domain. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2012-15
W. Kuijper. Compositional Synthesis of Safety
Controllers. Faculty of Electrical Engi-
neering, Mathematics & Computer Science,
UT. 2012-16
H. Beohar. Refinement of Communication and
States in Models of Embedded Systems. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2013-01
G. Igna. Performance Analysis of Real-Time
Task Systems using Timed Automata. Faculty of
Science, Mathematics and Computer Science,
RU. 2013-02
E. Zambon. Abstract Graph Transformation
Theory and Practice. Faculty of Electrical En-
gineering, Mathematics & Computer Science,
UT. 2013-03
B. Lijnse. TOP to the Rescue Task-Oriented
Programming for Incident Response Applications.
Faculty of Science, Mathematics and Com-
puter Science, RU. 2013-04
G.T. de Koning Gans. Outsmarting Smart
Cards. Faculty of Science, Mathematics and
Computer Science, RU. 2013-05
M.S. Greiler. Test Suite Comprehension for Mod-
ular and Dynamic Systems. Faculty of Electrical
Engineering, Mathematics, and Computer Sci-
ence, TUD. 2013-06
L.E. Mamane. Interactive mathematical docu-
ments: creation and presentation. Faculty of
279
Science, Mathematics and Computer Science,
RU. 2013-07
M.M.H.P. van den Heuvel. Composition and
synchronization of real-time components upon one
processor. Faculty of Mathematics and Com-
puter Science, TU/e. 2013-08
J. Businge. Co-evolution of the Eclipse Frame-
work and its Third-party Plug-ins. Fac-
ulty of Mathematics and Computer S cience,
TU/e. 2013-09
S. van der Burg. A Reference Architecture
for Distributed Software Deployment. Faculty
of Electrical Engineering, Mathematics, and
Computer Science, TUD. 2013-10
280